-
Notifications
You must be signed in to change notification settings - Fork 195
Auto-Infer mappings
Argument for SmoothQuantModifier
Based on Model Architecture
#119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add more models, mistral and Qwen2
Should consider adding a sentence like "mappings will normally be automatically inferred, but here's how to create your own custom ones" to https://github.com/vllm-project/llm-compressor/pull/115/files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. As mentioned, including this in the documentation will ensure that this feature actually gets used by users
Point users to readme make mappings inference a static function to make it easily testable
Description:
This PR introduces a feature that automatically infers the
mappings
argument for theSmoothQuantModifier
based on the model architecture, eliminating the need for manual specification of layer mappings.Before:
In the prior implementation, users had to manually define layer mappings, as shown below:
Now:
With this update, the
SmoothQuantModifier
automatically infers the mappings based on the architecture, simplifying the configuration:Key Changes:
mappings
: TheSmoothQuantModifier
now automatically detects and applies appropriate layer mappings based on the model's architecture, making the modifier more user-friendly and reducing the risk of manual configuration errors.mappings
parameter removal: Themappings
parameter is no longer required in the configuration, as it is inferred dynamically.mappings
will still be supported, ensuring smooth transition and compatibility with older setups.Motivation:
These changes improve usability by automating configuration setup and reducing user overhead, as outlined in the design document: Link to Design Doc. This also ensures that the quantization recipes are adaptable to various model architectures without manual intervention.
The autoinference of mappings were tested using a Mixtral model:
Isotonic/TinyMixtral-4x248M-MoE