Fix for issue 83 #85

horheynm · 2024-08-14T15:09:05Z

SUMMARY:
Fixing issue 83. #83

Issue is that when a Phi-3-medium-128k-instruct model is generated, and lm_eval is run, lm_eval gives error bc some files that are expected are not included. These are python files that can be found in the model cache folder. lm_eval will fail if not provided.

Solution was to add the logic in Trainer, a common place for oneshot, etc. - a shared pathway.
So far only Phi-3-medium-128k-instruct has python files in its cache folder, so if they exist, copy them to the output directory.

TEST PLAN:
Ran

import torch
from transformers import AutoTokenizer

from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.transformers.compression.helpers import (  # noqa
    calculate_offload_device_map,
    custom_offload_device_map,
)

# define a llmcompressor recipe for FP8 quantization
# this recipe requires no calibration data since inputs are dynamically quantized
recipe = """
quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: channel
                        dynamic: false
                        symmetric: true
                    input_activations:
                        num_bits: 8
                        type: float
                        strategy: token
                        dynamic: true
                        symmetric: true
                    targets: ["Linear"]
"""

model_stub = "meta-llama/Meta-Llama-3-70B-Instruct"

# determine which layers to offload to cpu based on available resources
device_map = calculate_offload_device_map(
    model_stub, reserve_for_hessians=False, num_gpus=1, torch_dtype=torch.float16
)

# alternatively, specify the maximum memory to allocate per GPU directly
# device_map = custom_offload_device_map(
#    model_stub, max_memory_per_gpu="10GB", num_gpus=2, torch_dtype=torch.float16
# )

model = SparseAutoModelForCausalLM.from_pretrained(
    model_stub, torch_dtype=torch.float16, device_map=device_map
)

output_dir = "./test_output_llama3b_70b_fp8"


oneshot(
    model=model,
    recipe=recipe,
    output_dir=output_dir,
    save_compressed=True,
    tokenizer=AutoTokenizer.from_pretrained(model_stub),
)

with and without the commit. python should be included and flow should be the same.

src/llmcompressor/transformers/finetune/trainer.py

Satrat

I don't love saving the python files on init rather than save. This seems like in should belong in src/llmcompressor/transformers/finetune/session_mixin.py:save_model instead, where we save the model itself and tokenizer. Or I suppose it could also go in src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py:modify_save_pretrained

horheynm · 2024-08-14T19:47:48Z

I don't love saving the python files on init rather than save. This seems like in should belong in src/llmcompressor/transformers/finetune/session_mixin.py:save_model instead, where we save the model itself and tokenizer

I dont mind doing that, but with the current flow it doesnt go to save_model

horheynm · 2024-08-14T19:58:26Z

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

Ok it does go to modify_save_pretrained, can save it there

…x-issue-83

* import is_release from version.py * bug * fix * comments * comment

Fix for issue 83

075fdae

horheynm requested review from Satrat and Lin-K76 August 14, 2024 15:09

robertgshaw2-redhat reviewed Aug 14, 2024

View reviewed changes

src/llmcompressor/transformers/finetune/trainer.py Outdated Show resolved Hide resolved

horheynm self-assigned this Aug 14, 2024

Satrat suggested changes Aug 14, 2024

View reviewed changes

horheynm added 3 commits August 15, 2024 14:59

Merge branch 'main' of github.com:vllm-project/llm-compressor into fi…

954df0d

…x-issue-83

comments

2cea9df

clean up

e6be648

Satrat approved these changes Aug 20, 2024

View reviewed changes

horheynm merged commit a446c81 into main Aug 20, 2024
7 of 12 checks passed

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

import is_release from version.py (vllm-project#85)

4f0bd6a

* import is_release from version.py * bug * fix * comments * comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for issue 83 #85

Fix for issue 83 #85

Uh oh!

horheynm commented Aug 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

Satrat left a comment •

edited

Loading

Uh oh!

horheynm commented Aug 14, 2024

Uh oh!

horheynm commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

Fix for issue 83 #85

Fix for issue 83 #85

Uh oh!

Conversation

horheynm commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Satrat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

horheynm commented Aug 14, 2024

Uh oh!

horheynm commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

horheynm commented Aug 14, 2024 •

edited

Loading

Satrat left a comment •

edited

Loading