Skip to content

[ Docs ] Update FP8 example to use dynamic per token #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 12, 2024

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Aug 11, 2024

SUMMARY:

  • convert example to use FP8_DYNAMIC

TEST PLAN:

  • manually running examples

git clone https://github.com/vllm-project/llm-compressor.git
cd llm-compressor
pip install -e .
pip install llmcompressor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pin the version so it is clear when this was made/updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea


```python
from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Configure the quantization algorithm to run.
recipe = QuantizationModifier(targets="Linear", scheme="FP8", ignore=["lm_head"])
recipe = QuantizationModifier(targets="Linear",
scheme="FP8_Dynamic",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to not use all caps? I thought the scheme was FP8_DYNAMIC

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be working with FP8_Dynamic, but I can adjust it


# Save to disk compressed.
SAVE_DIR = MODEL_ID.split("/")[1] + "-W8A8-FP8"
model.save_pretrained(SAVE_DIR, save_compressed=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_compressed=True is the default now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 82 to 87
Neural Magic's fork of `lm-evaluation-harness` implements the evaluation strategy used by Meta in the Llama3.1 launch. You can install this branch from source below:

```bash
pip install vllm
pip install git+https://github.com/neuralmagic/lm-evaluation-harness.git@a0e54e5f1a0a52abaedced474854ae2ce4e68ded
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to use a task that doesn't require a fork of lm-eval to reproduce results. AFAIK it is only ARC-C and GSM8k that require these custom changes. Winogrande is pretty fast, so maybe use that with lm-eval==0.4.3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like GSM because its easy to understand + its a good proof point for users that its working in a generative task

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to have the reproduction of paper results then? I think we shouldn't push people towards our fork if possible. I think it also makes the most realistic example to show evals for both the unquantized and quantized checkpoint, so it shouldn't matter to get this specific COT setup.

@robertgshaw2-redhat robertgshaw2-redhat changed the title Switch readme to fp8 dynamic [DOCS] Update FP8 Docs To Highlight Dynamic Per Token Aug 11, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [DOCS] Update FP8 Docs To Highlight Dynamic Per Token [DOCS] Update FP8 example to use dynamic per token Aug 11, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [DOCS] Update FP8 example to use dynamic per token [ Docs ] Update FP8 example to use dynamic per token Aug 11, 2024
@robertgshaw2-redhat robertgshaw2-redhat merged commit 23587db into main Aug 12, 2024
8 of 12 checks passed
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
* reduce appropriate dim

* tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants