- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.1k
[Unified Checkpoint] update async save logic #9274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Thanks for your contribution! | 
ddb768e    to
    992063c      
    Compare
  
    | Codecov ReportAttention: Patch coverage is  
 Additional details and impacted files@@             Coverage Diff             @@
##           develop    #9274      +/-   ##
===========================================
+ Coverage    52.95%   53.08%   +0.13%     
===========================================
  Files          657      657              
  Lines       106478   106521      +43     
===========================================
+ Hits         56383    56547     +164     
+ Misses       50095    49974     -121     ☔ View full report in Codecov by Sentry. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix --------- Co-authored-by: Weiguo Zhu <[email protected]>
* [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix * bug fix * [Trainer] fix save_model (#9286) * bug fix * bug fix --------- Co-authored-by: Weiguo Zhu <[email protected]>
PR types
Others
PR changes
Others
Description
output_signal_dir, which is used to save asynchronous saving signal.