[StageRunner] Stage Runner entrypoint and pipeline #1202

horheynm · 2025-02-26T21:07:44Z

SUMMARY:

Remove from_args and reorg logic in Oneshot. Previously used to be compatible with class StageRunner, which is removed.
Remove StageRunType and its logic from StageRunner and related files.
Remove class StageRuneer and its file.
Remove stage runner logic from transformers/text_geneneration.py
Remove tests/llmcompressor/entrypoints/test_oneshot.py, which is a test for Oneshot.from_args, which is removed.
Remove tests/llmcompressor/recipe/test_stage.py which is a test to select the stage and run_type.
Add logic in train to return a PretrainedModel, and also to change output_dir if stage is passed in to oneshot or train. If stage is passed in the new directory changes from ./out -> ./out/{stage}.
Add stage in RecipeArguments
Modify saving logic in trainer. Use self.trainer.save instead of post_process save. post_process will still be called, but will only reset session.
Modify logic in post_process to save + reset or only reset, if no model_args or output_dir is passed in. (need model_args for model, output_dir for save dir)

TEST PLAN:
Pass tests
Check examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py runs and generates the same output as main

SCRIPT:

from transformers import AutoModelForCausalLM

from llmcompressor import oneshot, train

recipe = r"""
sparsity_stage:
    sparsity_modifiers:
        SparseGPTModifier:
            sparsity: 0.5
            mask_structure: "2:4"
            targets: ["Linear"]
            ignore: ["re:.*lm_head"]
finetuning_stage:
    finetuning_modifiers:
        ConstantPruningModifier:
            targets: [
                're:.*q_proj.weight',
                're:.*k_proj.weight', 
                're:.*v_proj.weight',
                're:.*o_proj.weight',
                're:.*gate_proj.weight',
                're:.*up_proj.weight',
                're:.*down_proj.weight',
            ]
            start: 0
quantization_stage:
    quantization_modifiers:
        GPTQModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 4
                        type: "int"
                        symmetric: true
                        strategy: "channel"
                    targets: ["Linear"]
            
"""


import torch
from loguru import logger
from transformers import AutoModelForCausalLM

from llmcompressor import oneshot, train

# load the model in as bfloat16 to save on memory and compute
model_stub = "neuralmagic/Llama-2-7b-ultrachat200k"
model = AutoModelForCausalLM.from_pretrained(
    model_stub, torch_dtype=torch.bfloat16, device_map="auto"
)

# uses LLM Compressor's built-in preprocessing for ultra chat
dataset = "ultrachat-200k"

# save location of quantized model
output_dir = "output_llama7b_2of4_w4a16_channel"

# set dataset config parameters
splits = {"calibration": "train_gen[:5%]", "train": "train_gen"}
max_seq_length = 512
num_calibration_samples = 512

# set training parameters for finetuning
num_train_epochs = 0.01
logging_steps = 500
save_steps = 5000
gradient_checkpointing = True  # saves memory during training
learning_rate = 0.0001
bf16 = False  # using full precision for training
lr_scheduler_type = "cosine"
warmup_ratio = 0.1
preprocessing_num_workers = 64 * 6

# this will run the recipe stage by stage:
# oneshot sparsification -> finetuning -> oneshot quantization

oneshot_kwargs = dict(
    dataset=dataset,
    recipe=recipe,
    num_calibration_samples=num_calibration_samples,
    preprocessing_num_workers=preprocessing_num_workers,
    splits=splits,
    output_dir=output_dir
)

training_kwargs = dict(
    bf16=bf16,
    max_seq_length=max_seq_length,
    num_train_epochs=num_train_epochs,
    logging_steps=logging_steps,
    save_steps=save_steps,
    gradient_checkpointing=gradient_checkpointing,
    learning_rate=learning_rate,
    lr_scheduler_type=lr_scheduler_type,
    warmup_ratio=warmup_ratio,
)

oneshot_applied_model = oneshot(
    model=model,
    **oneshot_kwargs,
    stage="sparsity_stage",
)

finetune_applied_model = train(
    model=oneshot_applied_model,
    **oneshot_kwargs,
    **training_kwargs,
    stage="finetuning_stage",
)

oneshot_applied_model = oneshot(
    model=finetune_applied_model,
    **oneshot_kwargs,
    stage="quantization_stage",
)

…to stage-run

src/llmcompressor/entrypoints/utils.py

tests/llmcompressor/transformers/finetune/test_oneshot_and_finetune.py

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

kylesayrs

Looks great, I much prefer this interface

src/llmcompressor/transformers/finetune/text_generation.py

brian-dellabetta

few clarifying questions but i think is ready to roll!

src/llmcompressor/entrypoints/train.py

src/llmcompressor/entrypoints/utils.py

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

dsikka

This is currently failing the multi-stage example: llama7b_sparse_w4a16.py under llm-compressor/examples/quantization_2of4_sparse_w4a16 - seems to get get past the sparsity stage and then fails during finetune.

@kylesayrs can you take a look?

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Signed-off-by: Kyle Sayers <[email protected]>

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

horheynm changed the title ~~Stage run~~ [StageRunner] Stage Runner entrypoint and pipeline Feb 28, 2025

rebase

e22b52e

horheynm force-pushed the stage-run branch from 1d94d00 to e22b52e Compare March 13, 2025 15:18

horheynm added 9 commits March 13, 2025 11:20

Merge branch 'main' into stage-run

5467845

rebase

1162f85

draft, run code

2ae696a

pass

e96fa99

Merge branch 'main' into stage-run

d2ed028

delete stage-run-infer tests

49397af

Merge branch 'stage-run' of github.com:vllm-project/llm-compressor in…

f02a79f

…to stage-run

refac Oneshot, delete from_args

9cc369e

update:

2979f90

horheynm marked this pull request as ready for review March 14, 2025 11:40

horheynm added the ready When a PR is ready for review label Mar 14, 2025

horheynm added 3 commits March 14, 2025 09:04

pass tests

c87892b

fix

f9bd56e

update example

235d2ea

horheynm removed the ready When a PR is ready for review label Mar 14, 2025

brian-dellabetta previously approved these changes Mar 14, 2025

View reviewed changes

src/llmcompressor/entrypoints/utils.py Outdated Show resolved Hide resolved

use model output, not get_session

7e3c27a

horheynm dismissed brian-dellabetta’s stale review via 7e3c27a March 14, 2025 19:14

horheynm added 4 commits March 15, 2025 17:14

reset session if provided

9248ef3

temp

d9cb1ca

pass test_oneshot_and_finetune -- potential bug

96be449

Merge branch 'main' into stage-run

7394a47

horheynm added the ready When a PR is ready for review label Mar 15, 2025

brian-dellabetta reviewed Mar 17, 2025

View reviewed changes

tests/llmcompressor/transformers/finetune/test_oneshot_and_finetune.py Outdated Show resolved Hide resolved

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Show resolved Hide resolved

Merge branch 'main' into stage-run

b51f984

kylesayrs previously approved these changes Mar 18, 2025

View reviewed changes

src/llmcompressor/transformers/finetune/text_generation.py Show resolved Hide resolved

src/llmcompressor/transformers/finetune/text_generation.py Show resolved Hide resolved

kyle comments

db7dbe1

horheynm dismissed kylesayrs’s stale review via db7dbe1 March 24, 2025 15:22

Merge branch 'main' into stage-run

dd375d5

dsikka requested review from brian-dellabetta and kylesayrs March 26, 2025 00:18

kylesayrs previously approved these changes Mar 26, 2025

View reviewed changes

brian-dellabetta previously approved these changes Mar 26, 2025

View reviewed changes

src/llmcompressor/entrypoints/train.py Show resolved Hide resolved

src/llmcompressor/entrypoints/utils.py Show resolved Hide resolved

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Show resolved Hide resolved

lint

bbdd8c4

horheynm dismissed stale reviews from brian-dellabetta and kylesayrs via bbdd8c4 March 26, 2025 16:21

kylesayrs mentioned this pull request Mar 27, 2025

Support user-defined batch size for one shot #1117

Closed

dsikka added 2 commits March 30, 2025 21:16

temporarily pin transformers

73de4e2

Merge branch 'main' into stage-run

8cee6c4

dsikka enabled auto-merge (squash) March 30, 2025 21:24

dsikka requested review from brian-dellabetta and kylesayrs March 30, 2025 21:31

dsikka requested changes Mar 30, 2025

View reviewed changes

dsikka reviewed Mar 30, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Show resolved Hide resolved

brian-dellabetta previously approved these changes Mar 31, 2025

View reviewed changes

reset sessions in between stages

6af8e3c

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs dismissed brian-dellabetta’s stale review via 6af8e3c March 31, 2025 18:08

kylesayrs and others added 2 commits April 1, 2025 10:15

make style

1a28bed

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into stage-run

34a0225

brian-dellabetta approved these changes Apr 1, 2025

View reviewed changes

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py Show resolved Hide resolved

dsikka approved these changes Apr 1, 2025

View reviewed changes

dsikka merged commit 1acf393 into main Apr 1, 2025
8 checks passed

dsikka deleted the stage-run branch April 1, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[StageRunner] Stage Runner entrypoint and pipeline #1202

[StageRunner] Stage Runner entrypoint and pipeline #1202

Uh oh!

horheynm commented Feb 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[StageRunner] Stage Runner entrypoint and pipeline #1202

[StageRunner] Stage Runner entrypoint and pipeline #1202

Uh oh!

Conversation

horheynm commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

horheynm commented Feb 26, 2025 •

edited

Loading

dsikka left a comment •

edited

Loading