Fix: GRPO with Mistral and importing #1831

oKatanaaa · 2025-02-25T18:04:30Z

This pull request solves two issues:

MistralForCausalLM_fast_forward method does not account for 'UNSLOTH_RETURN_HIDDEN_STATES' env variable causing GRPO trainer to break.
unsloth must be imported before trl/transformers/peft since it patches their source code (specifically trl). If either of those libraries imported first, the patching won't have any effect and that (a) may break training in some cases and (b) lead to OOM due to absence of some optimizations.

Mistral GRPO

Current implementation of MistralForCausalLM_fast_forward method always returns logits which is incompatible with GRPOTrainer. Running training leads to the following exception:

TorchRuntimeError: Failed running call_function <built-in method matmul of type object at 0x74393685f240>(*(GradTrackingTensor(lvl=1, value=
    FakeTensor(..., device='cuda:0', size=(1, s0, 32001), dtype=torch.bfloat16,
               requires_grad=True)
), GradTrackingTensor(lvl=1, value=
    FakeTensor(..., device='cuda:0', size=(4096, 32001), dtype=torch.bfloat16)
)), **{}):
a and b must have same reduction dim, but got [s0, 32001] X [4096, 32001].

from user code:
   File "/workspace/research/grpo/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 100, in accumulate_chunk
    (chunk_grad_input,), (chunk_loss, (unscaled_loss, chunk_completion_length, chunk_mean_kl,)) = torch.func.grad_and_value(
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/apis.py", line 442, in wrapper
    return eager_transforms.grad_and_value_impl(
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/vmap.py", line 48, in fn
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/eager_transforms.py", line 1407, in grad_and_value_impl
    output = func(*args, **kwargs)
  File "/workspace/research/grpo/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 80, in compute_loss
    new_logits = torch.matmul(new_hidden_states, lm_head.t())

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

I've implemented the necessary functionality following the example of CausalLM_fast_forward in llama.py. With that fix training runs as expected.

Importing order

Importing trl (or any other dependencies with trl) before unsloth makes trainer patching lose its effect.

I've spend a whole day trying to understand why my code either breaks or OOMs (while colab notebooks with essentially the same functionality work just fine) mid training till I had an AHA! moment to realize importing order is the issue.

To avoid such cases, I've implemented a simple check for imports in the top-level __init__.py that warns users if unsloth was imported after trl.

oKatanaaa · 2025-02-25T18:11:39Z

Found the issue that describes exactly this problem #1790
This PR solves it

danielhanchen · 2025-02-25T23:04:34Z

Oh fantastic!! Weird how I missed this!

oKatanaaa added 2 commits February 25, 2025 17:43

fix: mistral and importing

e5e97b2

minor change

4e09899

oKatanaaa mentioned this pull request Feb 25, 2025

New GRPO doesnt support models besides LLAMA - (Mistral) #1790

Closed

danielhanchen changed the base branch from main to nightly February 25, 2025 23:04

danielhanchen changed the base branch from nightly to main February 25, 2025 23:05

danielhanchen added 4 commits February 25, 2025 15:17

Style :)

2edb462

Update mistral.py

f6f9948

Update mistral.py

d7dbeeb

Update mistral.py

c877c41

danielhanchen merged commit 42cbe1f into unslothai:main Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix: GRPO with Mistral and importing #1831

Fix: GRPO with Mistral and importing #1831

Uh oh!

oKatanaaa commented Feb 25, 2025

Uh oh!

oKatanaaa commented Feb 25, 2025

Uh oh!

danielhanchen commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

Fix: GRPO with Mistral and importing #1831

Fix: GRPO with Mistral and importing #1831

Uh oh!

Conversation

oKatanaaa commented Feb 25, 2025

Mistral GRPO

Importing order

Uh oh!

oKatanaaa commented Feb 25, 2025

Uh oh!

danielhanchen commented Feb 25, 2025

Uh oh!

Uh oh!