Skip to content

Fix for issue 83 #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 20, 2024
Merged

Fix for issue 83 #85

merged 4 commits into from
Aug 20, 2024

Conversation

horheynm
Copy link
Contributor

@horheynm horheynm commented Aug 14, 2024

SUMMARY:
Fixing issue 83. #83

Issue is that when a Phi-3-medium-128k-instruct model is generated, and lm_eval is run, lm_eval gives error bc some files that are expected are not included. These are python files that can be found in the model cache folder. lm_eval will fail if not provided.

Solution was to add the logic in Trainer, a common place for oneshot, etc. - a shared pathway.
So far only Phi-3-medium-128k-instruct has python files in its cache folder, so if they exist, copy them to the output directory.

TEST PLAN:
Ran

import torch
from transformers import AutoTokenizer

from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.transformers.compression.helpers import (  # noqa
    calculate_offload_device_map,
    custom_offload_device_map,
)

# define a llmcompressor recipe for FP8 quantization
# this recipe requires no calibration data since inputs are dynamically quantized
recipe = """
quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: channel
                        dynamic: false
                        symmetric: true
                    input_activations:
                        num_bits: 8
                        type: float
                        strategy: token
                        dynamic: true
                        symmetric: true
                    targets: ["Linear"]
"""

model_stub = "meta-llama/Meta-Llama-3-70B-Instruct"

# determine which layers to offload to cpu based on available resources
device_map = calculate_offload_device_map(
    model_stub, reserve_for_hessians=False, num_gpus=1, torch_dtype=torch.float16
)

# alternatively, specify the maximum memory to allocate per GPU directly
# device_map = custom_offload_device_map(
#    model_stub, max_memory_per_gpu="10GB", num_gpus=2, torch_dtype=torch.float16
# )

model = SparseAutoModelForCausalLM.from_pretrained(
    model_stub, torch_dtype=torch.float16, device_map=device_map
)

output_dir = "./test_output_llama3b_70b_fp8"


oneshot(
    model=model,
    recipe=recipe,
    output_dir=output_dir,
    save_compressed=True,
    tokenizer=AutoTokenizer.from_pretrained(model_stub),
)

with and without the commit. python should be included and flow should be the same.

@horheynm horheynm requested review from Satrat and Lin-K76 August 14, 2024 15:09
@horheynm horheynm self-assigned this Aug 14, 2024
Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love saving the python files on init rather than save. This seems like in should belong in src/llmcompressor/transformers/finetune/session_mixin.py:save_model instead, where we save the model itself and tokenizer. Or I suppose it could also go in src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py:modify_save_pretrained

@horheynm
Copy link
Contributor Author

I don't love saving the python files on init rather than save. This seems like in should belong in src/llmcompressor/transformers/finetune/session_mixin.py:save_model instead, where we save the model itself and tokenizer

I dont mind doing that, but with the current flow it doesnt go to save_model

@horheynm
Copy link
Contributor Author

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

Ok it does go to modify_save_pretrained, can save it there

@horheynm horheynm merged commit a446c81 into main Aug 20, 2024
7 of 12 checks passed
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
* import is_release from version.py

* bug

* fix

* comments

* comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants