Skip to content

Conversation

allanj
Copy link
Contributor

@allanj allanj commented Apr 14, 2022

What does this PR do?

1st Issues: OOM when saving the optimizer, #14542
This issue happens in consolidating the optimizer, we add an argument save_optimizer_state to give an option on whether we want to save it.

2nd issue: Using a custom optimizer has a problem with Fairscale and Deepspeed, #15784

We simply raise an error and warn the user to proceed with different solution.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

trainer: @sgugger

@sgugger
Copy link
Collaborator

sgugger commented Apr 14, 2022

I'm not really in favor of adding an argument to not save the optimizer as there is no point checkpointing if the optimizer is not saved. For the fairscale problem of OOM, there is an option that was detailed in #14542 to use force_broadcast_object=True with the newest version of fairscale.

@allanj
Copy link
Contributor Author

allanj commented Apr 14, 2022

Oh, I just saw that solution. Maybe the second part where we raise the error is enough for this PR? Do you think it is necessary

if (self.sharded_ddp is not None or args.deepspeed) and (self.optimizer is not None or self.lr_scheduler is not None):
  raise RuntimeError(
      "Passing `optimizers` is not allowed if Fairscale or Deepspeed is enabled."
      "You should subclass `Trainer` and override the `create_optimizer_and_scheduler` method."
  )

@sgugger
Copy link
Collaborator

sgugger commented Apr 14, 2022

This change I agree with :-) If you want to remove the others, we can merge the PR.
Make sure to run make style after so that the quality check passes.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 14, 2022

The documentation is not available anymore as the PR was closed or merged.

@allanj
Copy link
Contributor Author

allanj commented Apr 14, 2022

Yeah, I will remove the others and make a check

@sgugger
Copy link
Collaborator

sgugger commented Apr 15, 2022

There is still one error of quality script, so you'll need to run make style on your branch :-)
Could you also rename the PR so that the title reflects the changes you actually did?

@allanj allanj changed the title Options to save optimizer and raise error when using custom optimizer Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed Apr 15, 2022
@sgugger
Copy link
Collaborator

sgugger commented Apr 16, 2022

Thanks! You have to mark the PR as ready for review, as GitHub won't let me merge a draft PR :-)

@allanj allanj marked this pull request as ready for review April 16, 2022 03:07
@allanj
Copy link
Contributor Author

allanj commented Apr 16, 2022

Thanks. It is ready for review now.

@sgugger sgugger merged commit 4381448 into huggingface:main Apr 18, 2022
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
… or Deepspeed (huggingface#16786)

* optimizer issues related to saving

* remove the "optimizer saving" option

* reformat using make style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants