Fix inaccurate eval and train loss computation with variable batch sizes #41904

jameslovespancakes · 2025-10-27T23:48:42Z

When drop_last=False (default), the last batch may contain fewer samples than per_device_eval_batch_size. Using a fixed batch_size to repeat the scalar loss causes the last batch to be over-represented in the final average loss calculation.

Changes:

Trainer: Use observed_batch_size instead of fixed batch_size when repeating eval loss for gather_for_metrics
no_trainer examples: Use actual batch size from input_ids.shape[0] for both eval and train loss computation
Train loss: Weight by actual batch size and divide by total samples instead of number of batches

This ensures accurate loss computation regardless of batch size variability while maintaining backward compatibility (identical behavior when all batches are uniform size).

Fixes huggingface#41898 When drop_last=False (default), the last batch may contain fewer samples than per_device_eval_batch_size. Using a fixed batch_size to repeat the scalar loss causes the last batch to be over-represented in the final average loss calculation. Changes: - Trainer: Use observed_batch_size instead of fixed batch_size when repeating eval loss for gather_for_metrics - no_trainer examples: Use actual batch size from input_ids.shape[0] for both eval and train loss computation - Train loss: Weight by actual batch size and divide by total samples instead of number of batches This ensures accurate loss computation regardless of batch size variability while maintaining backward compatibility (identical behavior when all batches are uniform size).

Rocketknight1 · 2025-10-28T13:32:11Z

The updates to the no_trainer examples look okay, but I'd like @SunMarc's confirmation about the change in trainer.py!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix inaccurate eval and train loss computation with variable batch sizes #41904

Fix inaccurate eval and train loss computation with variable batch sizes #41904

jameslovespancakes commented Oct 27, 2025

Uh oh!

Rocketknight1 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix inaccurate eval and train loss computation with variable batch sizes #41904

Are you sure you want to change the base?

Fix inaccurate eval and train loss computation with variable batch sizes #41904

Conversation

jameslovespancakes commented Oct 27, 2025

Uh oh!

Rocketknight1 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants