-
Notifications
You must be signed in to change notification settings - Fork 94
feat: add quantization #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two questions
return embeddings.astype(np.float64) | ||
elif quantize_to == DType.Int8: | ||
# Normalize to [-127, 127] range for int8 | ||
scale = np.max(np.abs(embeddings)) / 127.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever be 0 (zero division issues?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if all embeddings are 0
model2vec/quantization.py
Outdated
elif quantize_to == DType.Float64: | ||
return embeddings.astype(np.float64) | ||
elif quantize_to == DType.Int8: | ||
# Normalize to [-127, 127] range for int8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be [-128, 127] (the range of an 8-bit signed integer)? Not sure if it's relevant for the code though since it doesn't change the division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the symmetry is more important than making sure the 1 extra value is used. I updated the comment.
This PR adds quantization. Quantization can be applied during distillation, or during loading. Both are equivalent, except that distill-time quantization leads to smaller embedding sizes.