-
Couldn't load subscription status.
- Fork 5.9k
[Fleet Executor] Construct runtime graph #37158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
d3910f3 to
c7b7dca
Compare
91ed45e to
26c9f7b
Compare
|
在关键的地方加一写VLOG(3)的输出用来debug吧。比如推导依赖的部分,比如interceptor_id 与 task_id 、rank等映射的部分。 |
python/paddle/fluid/tests/unittests/test_fleet_executor_multi_devices.py
Outdated
Show resolved
Hide resolved
26c9f7b to
b4e48fd
Compare
b4e48fd to
1bdb86e
Compare
1bdb86e to
451a1ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面复用distributed_strategy是不是更好些,可能还会有sharding_degree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为distributed_strategy.proto在framework目录下,和这个proto不在一个文件夹,在当前文件夹下的CMakeList里调用generic.cmake里定义的proto_library函数,会将protobuf的搜索路径设置为当前文件夹,同时protobuf的import不支持相对路径,所以暂时没想到怎么直接引用distributed_strategy.proto里的定义。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dp、pp、mp以后的顺序可能会变
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议把dp_degree、pp_degree、mp_degree这几个封装为一个结构体,当做笛卡尔坐标系,然后加上进程rank号和笛卡尔坐标系的相互转换,可能简洁一点点。然后顺序的问题加个映射也很容易解决
{x, y, z} = rank2coord(pid);
left_x = (x - 1 + xranks) % xranks; left_rank = coord2rank({left_x, y, z})
2a44f1f to
849eb85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New Features
PR changes
Others
Describe
创建运行时图