-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Enable native ModelOpt quantization support (2/3) #9991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable native ModelOpt quantization support (2/3) #9991
Conversation
hi @Edwardf0t1 can you help fix the conflicts? thanks |
2674259
to
aed7dd2
Compare
@zhyncs Just rebased and resolved the conflicts. Could you or @Qiaolin-Yu help review the PR? Thanks. |
I think we should add example code in this PR to demonstrate how to use |
f074579
to
c13b457
Compare
The usage is covered in unit tests: |
c118561
to
e75fbf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @Qiaolin-Yu for the review and approval. |
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
40fefb3
to
9bc99e7
Compare
This is the second PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and will be rebased once the first PR is merged.
Motivation
We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.
Modifications
_setup_modelopt_quantization()
and added calibration functionalities.modelopt_checkpoint_restore_path
andmodelopt_checkpoint_save_path
parameters to bothModelConfig
andServerArgs
. These allow users to save and restore quantized checkpoints, avoiding re-quantization on subsequent runstest_modelopt_loader.py
to verify the ModelOpt functionality.The 3rd PR are also ready for review: #10154
Accuracy Tests
Benchmarking and Profiling
Checklist