-
Notifications
You must be signed in to change notification settings - Fork 2.1k
📚 Accumulate completions for logging #3217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -136,6 +136,9 @@ class GRPOConfig(TrainingArguments): | |||
installed, it prints the sample. If `wandb` logging is enabled, it logs it to `wandb`. | |||
num_completions_to_print (`int` or `None`, *optional*, defaults to `None`): | |||
Number of completions to print with `rich`. If `None`, all completions are logged. | |||
wandb_log_unique_prompts (`bool`, *optional*, defaults to `False`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed from #3191
# Log completions when we complete a full gradient accumulation cycle | ||
# For logging across the gradient accumulation steps, we need to accumulate the data | ||
is_last_step_in_grad_accum = ( | ||
self._step % self.args.gradient_accumulation_steps == self.args.gradient_accumulation_steps - 1 | ||
) | ||
should_log_completions = ( | ||
self.log_completions | ||
and self.state.global_step % self.args.logging_steps == 0 | ||
and is_last_step_in_grad_accum | ||
) | ||
|
||
# Collect data for logging throughout the accumulation steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think self.control.should_log
achieves the same goal. Let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes it achieves the same goal but this flag is only updated after the loss.backward
so we can't use it here. I'm starting to think that it would make more sense to have all this code in self.log
. I'm checking if it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes It's possible, I'll do it in a follow-up PR
What does this PR do?
This PR fixes a bug where the
rich
andwandb
tables were not logging the complete set of unique prompts forgradient_accumulation_steps > 1
. Previously what happened is that the tables we logged for each gradient accumulation step, which is confusing when inspecting the logs for unique prompts.Here's two runs with/without the fix for:
main
: https://wandb.ai/huggingface/huggingface/runs/j3vwwb5t/workspace?nw=nwuserlewtunthis_pr
: https://wandb.ai/huggingface/huggingface/runs/78q63hev/overviewThese runs have:
gradient_accumulation_steps=4
num_generations=16
per_device_train_batch_size=4
num_gpus = 8
so we expect
num_unique_prompts=8
:We see that without the fix,
wandb
only displays 2 unique prompts in the table (i.e. once per gradient accumulation step). With the fix, the prompts are grouped together per step, which incidentally makes the table scrolling faster in the UI as we have fewer steps to query.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.