-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🏗️ Add test for training with multiple dataloader workers and update worker initialization for compatibility with transformers 4.52.0 #3568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…worker initialization for compatibility with transformers 4.52.0
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For spawn
method (the default on Windows and macOS), method requires that all objects (including functions) passed between processes need to be serialized (pickled).
Pickle only serializes objects that are defined at the top-level of a module.
So we need to move local function outwards to support MultiProcessingDataLoader under spawn
context
Thanks for the pointers @Tavish9! Added in solved issues |
@@ -275,6 +278,11 @@ def nanmax(tensor: torch.Tensor) -> torch.Tensor: | |||
return torch.max(tensor[~torch.isnan(tensor)]) | |||
|
|||
|
|||
def identity(x): | |||
"""Do we really need docs for this?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Do we really need docs for this?""" | |
"""GRPO does not need data_collator, to avoid crash, this simple function will be used as data_collator when initializing the GRPOTrainer""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already document it below, I think it's not necessary
The def data_collator(features): # No data collation is needed in GRPO
return features also now that this is defines in global scope, I'm curios to see does this solve issue #3567 |
I prefer to keep the name
@Tavish9 gave some clue on this. Basically we need the class to be pickable, which is not the case when function are defined within methods |
@qgallouedec Since this pr fixed two things: pickle and transformers update, it should also be linked to the following issues: |
…worker initialization for compatibility with transformers 4.52.0 (#3568)
What does this PR do?
This PR fixes two issues related to distributed setting:
seed_worker
now requires two new arguments:num_workers
andrank
, see update seed_worker to set seed based on worker_id and rank transformers#37980GRPOTrainer
, see Data collator not found during pickling with trl 0.18.1 and pytorch 2.7 #3567Fixes #2779 #2979 #3567
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.