Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture #119

rahul-tuli · 2024-08-27T16:01:45Z

Description:

This PR introduces a feature that automatically infers the mappings argument for the SmoothQuantModifier based on the model architecture, eliminating the need for manual specification of layer mappings.

Before:

In the prior implementation, users had to manually define layer mappings, as shown below:

quantization_stage:
  quantization_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.5
      mappings: [
        [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
        [["re:.*gate"], "re:.*post_attention_layernorm"]
      ]
      ignore: ["lm_head"]

Now:

With this update, the SmoothQuantModifier automatically infers the mappings based on the architecture, simplifying the configuration:

quantization_stage:
  quantization_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.5
      ignore: ["lm_head"]

Key Changes:

Auto-inference of mappings: The SmoothQuantModifier now automatically detects and applies appropriate layer mappings based on the model's architecture, making the modifier more user-friendly and reducing the risk of manual configuration errors.
Manual mappings parameter removal: The mappings parameter is no longer required in the configuration, as it is inferred dynamically.
Backward Compatibility: Existing configurations that manually specify mappings will still be supported, ensuring smooth transition and compatibility with older setups.

Motivation:

These changes improve usability by automating configuration setup and reducing user overhead, as outlined in the design document: Link to Design Doc. This also ensures that the quantization recipes are adaptable to various model architectures without manual intervention.

The autoinference of mappings were tested using a Mixtral model: Isotonic/TinyMixtral-4x248M-MoE

src/llmcompressor/modifiers/smoothquant/base.py

src/llmcompressor/modifiers/smoothquant/utils.py

src/llmcompressor/modifiers/smoothquant/base.py

Add more models, mistral and Qwen2

kylesayrs · 2024-10-03T17:10:09Z

Should consider adding a sentence like "mappings will normally be automatically inferred, but here's how to create your own custom ones" to https://github.com/vllm-project/llm-compressor/pull/115/files

kylesayrs

Good stuff. As mentioned, including this in the documentation will ensure that this feature actually gets used by users

Point users to readme make mappings inference a static function to make it easily testable

Add AutoInference of SmoothQuant Mappings

399cae6

rahul-tuli requested review from Satrat, bfineran, kylesayrs, dsikka, horheynm and robertgshaw2-redhat August 27, 2024 16:03

rahul-tuli and others added 2 commits August 27, 2024 16:17

Fix test + style

8251888

Merge branch 'main' into smoothquant-mappings-ux

7126172

dsikka marked this pull request as ready for review September 3, 2024 22:18

mgoin reviewed Sep 4, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/smoothquant/utils.py Outdated Show resolved Hide resolved

rahul-tuli added 2 commits September 4, 2024 09:59

Merge branch 'main' into smoothquant-mappings-ux

8522ea0

Review Comments from @mgoin

5454b60

rahul-tuli self-assigned this Sep 4, 2024

kylesayrs previously approved these changes Sep 8, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/base.py Outdated Show resolved Hide resolved

Address review comments

e406784

Add more models, mistral and Qwen2

rahul-tuli dismissed kylesayrs’s stale review via e406784 October 3, 2024 13:25

Merge branch 'main' into smoothquant-mappings-ux

7baf4f8

kylesayrs previously approved these changes Oct 3, 2024

View reviewed changes

rahul-tuli added 2 commits October 4, 2024 16:01

Merge branch 'main' into smoothquant-mappings-ux

4d5e872

Update docstring

8bad021

Point users to readme make mappings inference a static function to make it easily testable

rahul-tuli dismissed kylesayrs’s stale review via 8bad021 October 4, 2024 20:54

rahul-tuli requested a review from kylesayrs October 4, 2024 20:55

kylesayrs approved these changes Oct 4, 2024

View reviewed changes

mgoin approved these changes Oct 4, 2024

View reviewed changes

mgoin merged commit 7c2ab3a into main Oct 4, 2024
6 of 7 checks passed

mgoin deleted the smoothquant-mappings-ux branch October 4, 2024 22:54

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

bump version to 0.5.0 (vllm-project#119)

a4c86dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture #119

Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture #119

Uh oh!

rahul-tuli commented Aug 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Oct 3, 2024

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture #119

Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture #119

Uh oh!

Conversation

rahul-tuli commented Aug 27, 2024

Description:

Before:

Now:

Key Changes:

Motivation:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Oct 3, 2024

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture #119

Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture #119