Skip to content

Some dist train hacks to address soon #11593

@panyx0718

Description

@panyx0718

bcast_params should be properly implemented and called. Currently, a hack is used to test if ParallelExecutor is running distributed training and call it

num_trainers argument in ParallelExecutor is only valid for nccl-based distributed training. It's meaningless for pserver-based distributed training

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions