Skip to content

Conversation

@xingmingyyj
Copy link
Contributor

PR Category

Auto Parallel

PR Types

Improvements

Description

修改load_state_dict

  1. 支持多node
  2. 支持offload

@paddle-bot
Copy link

paddle-bot bot commented Aug 14, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Aug 14, 2024
import copy
import os
from dataclasses import dataclass
from typing import TYPE_CHECKING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么删除了 TYPE_CHECKING

]

if offload:
storage_local_tensor = storage_local_tensor.cuda()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该要考虑到各个平台的问题,建议通过 _current_expected_place 判断一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

path,
process_group=None,
coordinator_rank=0,
offload=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认为 False 吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return rank_to_files, missing_keys


def get_local_load_files_for_multiple_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个名字建议换一下,可以用 rank_to_file 表示 rank 可见的文件,rank_to_read_files 表示rank要读的文件

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API, adding parameter with default value is compatibility upgrade

@zhangbo9674 zhangbo9674 merged commit 5a965a5 into PaddlePaddle:develop Aug 20, 2024
@xingmingyyj xingmingyyj deleted the upgrad_load_state_dict branch August 22, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants