[ Docs ] Update `FP8` example to use dynamic per token #75

robertgshaw2-redhat · 2024-08-11T17:48:59Z

SUMMARY:

convert example to use FP8_DYNAMIC

TEST PLAN:

manually running examples

mgoin · 2024-08-11T18:10:49Z

examples/quantization_w8a8_fp8/README.md

-git clone https://github.com/vllm-project/llm-compressor.git
-cd llm-compressor
-pip install -e .
+pip install llmcompressor


I think we should pin the version so it is clear when this was made/updated

mgoin · 2024-08-11T18:12:18Z

examples/quantization_w8a8_fp8/README.md


 ```python
 from llmcompressor.transformers import oneshot
 from llmcompressor.modifiers.quantization import QuantizationModifier

 # Configure the quantization algorithm to run.
-recipe = QuantizationModifier(targets="Linear", scheme="FP8", ignore=["lm_head"])
+recipe = QuantizationModifier(targets="Linear", 
+                              scheme="FP8_Dynamic",


Is it fine to not use all caps? I thought the scheme was FP8_DYNAMIC

it seems to be working with FP8_Dynamic, but I can adjust it

mgoin · 2024-08-11T18:12:44Z

examples/quantization_w8a8_fp8/README.md


 # Save to disk compressed.
-SAVE_DIR = MODEL_ID.split("/")[1] + "-W8A8-FP8"
-model.save_pretrained(SAVE_DIR, save_compressed=True)


save_compressed=True is the default now?

mgoin · 2024-08-11T18:14:52Z

examples/quantization_w8a8_fp8/README.md

+Neural Magic's fork of `lm-evaluation-harness` implements the evaluation strategy used by Meta in the Llama3.1 launch. You can install this branch from source below:
+
+```bash
+pip install vllm
+pip install git+https://github.com/neuralmagic/lm-evaluation-harness.git@a0e54e5f1a0a52abaedced474854ae2ce4e68ded
+```


It may be best to use a task that doesn't require a fork of lm-eval to reproduce results. AFAIK it is only ARC-C and GSM8k that require these custom changes. Winogrande is pretty fast, so maybe use that with lm-eval==0.4.3

I like GSM because its easy to understand + its a good proof point for users that its working in a generative task

Do you need to have the reproduction of paper results then? I think we shouldn't push people towards our fork if possible. I think it also makes the most realistic example to show evals for both the unquantized and quantized checkpoint, so it shouldn't matter to get this specific COT setup.

* reduce appropriate dim * tests

[email protected] added 4 commits August 11, 2024 17:36

update for fp8 dyanmic

e8227e5

cleanup

3813849

format

42b9b15

fp8 example

8a8e497

robertgshaw2-redhat requested review from Satrat, mgoin and bfineran August 11, 2024 17:49

mgoin reviewed Aug 11, 2024

View reviewed changes

updated per michael's comments

5f80963

robertgshaw2-redhat changed the title ~~Switch readme to fp8 dynamic~~ [DOCS] Update FP8 Docs To Highlight Dynamic Per Token Aug 11, 2024

[email protected] added 4 commits August 11, 2024 21:14

update example

c98163e

update

3b67567

tweak fruther

5b333b3

updated

9c06e61

robertgshaw2-redhat changed the title ~~[DOCS] Update FP8 Docs To Highlight Dynamic Per Token~~ [DOCS] Update FP8 example to use dynamic per token Aug 11, 2024

robertgshaw2-redhat changed the title ~~[DOCS] Update FP8 example to use dynamic per token~~ [ Docs ] Update FP8 example to use dynamic per token Aug 11, 2024

Satrat approved these changes Aug 12, 2024

View reviewed changes

robertgshaw2-redhat merged commit 23587db into main Aug 12, 2024
8 of 12 checks passed

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

reduce appropriate dim (vllm-project#75)

a34618b

* reduce appropriate dim * tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ Docs ] Update `FP8` example to use dynamic per token #75

[ Docs ] Update `FP8` example to use dynamic per token #75

Uh oh!

robertgshaw2-redhat commented Aug 11, 2024 •

edited

Loading

Uh oh!

mgoin Aug 11, 2024

Uh oh!

robertgshaw2-redhat Aug 11, 2024

Uh oh!

mgoin Aug 11, 2024

Uh oh!

robertgshaw2-redhat Aug 11, 2024

Uh oh!

mgoin Aug 11, 2024

Uh oh!

robertgshaw2-redhat Aug 11, 2024

Uh oh!

mgoin Aug 11, 2024

Uh oh!

robertgshaw2-redhat Aug 11, 2024

Uh oh!

mgoin Aug 11, 2024

Uh oh!

Uh oh!

Uh oh!

[ Docs ] Update FP8 example to use dynamic per token #75

[ Docs ] Update FP8 example to use dynamic per token #75

Uh oh!

Conversation

robertgshaw2-redhat commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[ Docs ] Update `FP8` example to use dynamic per token #75

[ Docs ] Update `FP8` example to use dynamic per token #75

robertgshaw2-redhat commented Aug 11, 2024 •

edited

Loading