Skip to content

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Apr 26, 2022

What does this PR do?

Checkpoint sharding introduced a bug in save_pretrained for distributed setups, where the function is called on every process (TPU training for instance, or the scripts with Accelerate (see this issue).
This changes the logic to only remove files when:

  • they are different from existing ones (which should handle almost all cases since we can expect the save in a folder with existing weights to use the same model)
  • we are on process 0.

Except save_pretrained does not know if we hare on process zero or not, so I use save_config to detect that. It looks like that argument was not aptly named and should be is_main_process instead. I can go ahead and deprecate/rename in this PR if you agree.

@sgugger sgugger requested a review from LysandreJik April 26, 2022 14:34
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 26, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_main_process sounds good to me, as long as there is a deprecation cycle :)

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@sgugger sgugger merged commit c79bbc3 into main Apr 27, 2022
@sgugger sgugger deleted the delete_files_save_pretrained branch April 27, 2022 16:28
chamidullinr pushed a commit to chamidullinr/transformers that referenced this pull request Apr 28, 2022
…ace#16947)

* Fix multiple deletions of the same files in save_pretrained

* Add is_main_process argument
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
…ace#16947)

* Fix multiple deletions of the same files in save_pretrained

* Add is_main_process argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants