Skip to content

Conversation

@liulehui
Copy link
Contributor

@liulehui liulehui commented Oct 30, 2025

Description

  1. this PR added multihost GPU support for Ray Train JaxTrainer
  2. Following Jax GPU distributed doc: if ScalingConfig.use_gpu == True, we add "cuda" as JAX_PLATFORMS.
  3. if cuda is the jax platform, add CUDA_VISIBLE_DEVICES and initialize jax distributed with https://docs.jax.dev/en/latest/_autosummary/jax.distributed.initialize.html#jax.distributed.initialize

Related issues

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

  1. Tested with script here: https://gist.github.com/liulehui/b0b25065d48b730f2898b712aa92e06e

@liulehui liulehui added the go add ONLY when ready to merge, run all tests label Oct 30, 2025
@liulehui
Copy link
Contributor Author

jax gpu image build on anyscale platform: https://gist.github.com/liulehui/bda2419e1b3245d40d8027053a8dd26c

Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
@liulehui liulehui marked this pull request as ready for review November 22, 2025 01:58
@liulehui liulehui requested review from a team, matthewdeng and richardliaw as code owners November 22, 2025 01:58
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Nov 22, 2025
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant