-
Notifications
You must be signed in to change notification settings - Fork 3.6k
let _get_default_process_group_backend_for_device
support more hardware platforms
#21057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_get_default_process_group_backend_for_device
support more hardware platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just please add a test with mock if needed :)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21057 +/- ##
=======================================
- Coverage 87% 87% -0%
=======================================
Files 268 268
Lines 23367 23370 +3
=======================================
+ Hits 20379 20381 +2
- Misses 2988 2989 +1 |
I have added the test, and it seems that the last CI failure was caused by the running environment, could you please help trigger it again? Thank you |
…l _get_default_process_group_backend_for_device Signed-off-by: taozhiwei <[email protected]>
_get_default_process_group_backend_for_device
support more hardware platforms_get_default_process_group_backend_for_device
support more hardware platforms
Co-authored-by: Nicki Skafte Detlefsen <[email protected]>
It seems this CI failure is not related to this change. Can you help trigger another CI? Thank you |
https://github.com/Lightning-AI/pytorch-lightning/pull/21057/checks?check_run_id=48055959528 |
@taozhiwei yes it seems unrelated to this issue. When we get it fixed the PR will be merged. |
Thank you very much ! |
needs to be reverted and added again as this breaks DDP fork... |
…1057 (#21092) * debug failing tests for Fabric with `ddp_fork` on PT 2.8 * Revert "let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057)" This reverts commit 119a640. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit 119a640)
…ware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit 119a640)
…1057 (#21092) * debug failing tests for Fabric with `ddp_fork` on PT 2.8 * Revert "let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057)" This reverts commit 119a640. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 9ca360b)
…ware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit 119a640)
…1057 (#21092) * debug failing tests for Fabric with `ddp_fork` on PT 2.8 * Revert "let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057)" This reverts commit 119a640. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 9ca360b)
* let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- * try it * chlog --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]>
* let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- * try it * chlog --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> (cherry picked from commit 3c81316)
* let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- * try it * chlog --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> (cherry picked from commit 3c81316)
This change is to enable
_get_default_process_group_backend_for_device
to support more hardware platforms and no longer hard code for cuda📚 Documentation preview 📚: https://pytorch-lightning--21057.org.readthedocs.build/en/21057/