-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Enable native ModelOpt quantization support (3/3) #10154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable native ModelOpt quantization support (3/3) #10154
Conversation
e97069f
to
19fcedb
Compare
@zhyncs @Qiaolin-Yu Please help or find someone review this PR as well when you get a chance. Thank you! |
19fcedb
to
95fc54b
Compare
95fc54b
to
d25e5d1
Compare
d25e5d1
to
a9e4353
Compare
|
||
[project.optional-dependencies] | ||
decord = ["decord"] | ||
modelopt = ["nvidia-modelopt"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhyncs @Qiaolin-Yu Please let us know if it's okay to add modelopt as an optional dependency, or required dependency?
c5181b3
to
15dd13e
Compare
15dd13e
to
9c2eaac
Compare
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
7b27705
to
456a3f9
Compare
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
@classmethod | ||
def override_quantization_method(cls, hf_quant_config, user_quant): | ||
"""Override quantization method based on the model's config.""" | ||
if hf_quant_config is None: | ||
return None | ||
|
||
# Check if this is a ModelOpt config | ||
quant_algo = hf_quant_config.get("quant_algo", "").upper() | ||
|
||
# If user specified generic "modelopt", auto-detect the specific method | ||
if user_quant == "modelopt": | ||
if "FP8" in quant_algo: | ||
return "modelopt_fp8" | ||
elif "NVFP4" in quant_algo or "FP4" in quant_algo: | ||
return "modelopt_fp4" | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have the exact same duplicate codes?
def _is_already_quantized(self) -> bool: | ||
"""Check if the model is already quantized based on config files.""" | ||
# Check for HuggingFace quantization config | ||
if is_remote_url(self.model_path): | ||
try: | ||
from huggingface_hub import HfApi | ||
|
||
hf_api = HfApi() | ||
return hf_api.file_exists(self.model_path, "hf_quant_config.json") | ||
except Exception: | ||
return False | ||
else: | ||
return os.path.exists(os.path.join(self.model_path, "hf_quant_config.json")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically detects whether "hf_quant_config.json" exists in model file. Iirc there are similar helper functions. May you try to find and reuse? (If not, make one and put into "utils.py" and call here
# Export model if path provided | ||
if export_path: | ||
try: | ||
# Get the original model path from the model config | ||
original_model_path = getattr(self, "_original_model_path", None) | ||
self._export_modelopt_checkpoint( | ||
model, export_path, original_model_path | ||
) | ||
rank0_log( | ||
f"Quantized model exported to HuggingFace format at {export_path}" | ||
) | ||
except Exception as e: | ||
rank0_log( | ||
f"Warning: Failed to export quantized model to {export_path}: {e}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY: move to a helper function, like
def _maybe_export(self):
if export_path:
try:
# Get the original model path from the model config
original_model_path = getattr(self, "_original_model_path", None)
self._export_modelopt_checkpoint(
model, export_path, original_model_path
)
rank0_log(
f"Quantized model exported to HuggingFace format at {export_path}"
)
except Exception as e:
rank0_log(
f"Warning: Failed to export quantized model to {export_path}: {e}"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we could reuse loader to export the model.... cc @merrymercy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this comment, but let's discuss in our call tomorrow.
This is the third PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and second PR (#9991) and will be rebased once the first two PRs are merged.
Motivation
We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.
Modifications
modelopt_export_path
parameter to_setup_modelopt_quantization()
inModelOptModelLoader
._export_modelopt_checkpoint()
method using modelopt's unified hf export API.modelopt_export_path
parameter inModelConfig
and added--modelopt-export-path
command-line argument inServerArgs
.quantize-and-serve
mode for quantize + export + deployment with a single command.Accuracy Tests
Production Workflow:
Benchmarking and Profiling
Checklist
Summary by CodeRabbit
New Features
Documentation
Tests
Chores