Skip to content

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Nov 12, 2021

Improved DDP reporting of total worker count and safe limiting of total worker count. Partially addresses #5628

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Optimized data loading in distributed training environments for YOLOv5.

📊 Key Changes

  • Adjusted the number of dataloader workers in train.py to consider the WORLD_SIZE variable.
  • Modified the --workers command-line argument description to reflect its relationship with distributed training (DDP mode).
  • Updated utils/datasets.py to factor in WORLD_SIZE when calculating the number of worker threads.

🎯 Purpose & Impact

  • The changes ensure better utilization of system resources during distributed training by dynamically adjusting the number of dataloader workers based on the number of nodes (WORLD_SIZE).
  • Users will likely experience more efficient data loading and potentially faster training times in multi-node environments.
  • The update clarifies to users how worker counts will be affected when training in a distributed fashion.

@glenn-jocher glenn-jocher self-assigned this Nov 12, 2021
@glenn-jocher glenn-jocher linked an issue Nov 12, 2021 that may be closed by this pull request
2 tasks
@glenn-jocher glenn-jocher changed the title WORLD_SIZE-safe dataloader workers DDP WORLD_SIZE-safe dataloader workers Nov 12, 2021
@glenn-jocher glenn-jocher merged commit 7473f0f into master Nov 12, 2021
@glenn-jocher glenn-jocher deleted the update/workers branch November 12, 2021 13:48
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* WORLD_SIZE-safe workers

* Update with DDP comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected behavior in DDP mode with dataloader workers

1 participant