Skip to content

Conversation

@zhangbo9674
Copy link
Contributor

@zhangbo9674 zhangbo9674 commented Jul 31, 2024

PR Category

Auto Parallel

PR Types

New features

Description

为自动并行 save_state_dict 接口扩展支持异步存储能力。
性能测试:

  • 测试条件:Llama2-7b(num_hidden_layers=10),单机8卡,pp2-mp2-sharding_stage1,训练100step、每20step存储一次 ckpt
  • 测试结果:
    • 单步存储耗时情况:同步存储13s、异步存储3s
      image
    • 100step 端到端存储耗时情况:同步存储137s、异步存储102s
      image

Pcard-73145

@paddle-bot
Copy link

paddle-bot bot commented Jul 31, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu
zhiqiu previously approved these changes Aug 5, 2024
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

path: str,
process_group: Group | None = None,
coordinator_rank: int = 0,
async_save: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面的 docstring 部分也同步增加参数说明,另外中文文档也同步修改

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,单独提交 pr 完善

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单独提交 PR

@zhangbo9674 zhangbo9674 merged commit 7311ff7 into PaddlePaddle:develop Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants