Skip to content

Commit b539ee9

Browse files
authored
[Dependencies] Upgrade to torch 2.7, CUDA to 12.8 (#73)
# What does this PR do? Upgrades to torch 2.7. This PR also makes the torch versions used explicit for different inference backends. (vllm uses torch 2.7.0 and sglang uses 2.7.1). Deepspeed performs jit compilation and is magically not dependent on a torch version. This PR also upgrades CUDA to 12.8. TODO: - [x] Test sglang after upgrade - [x] Publish new docker image to dockerhub --------- Signed-off-by: SumanthRH <[email protected]>
1 parent 3e0efc0 commit b539ee9

File tree

25 files changed

+552
-477
lines changed

25 files changed

+552
-477
lines changed

docker/Dockerfile

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,24 @@
1-
FROM anyscale/ray:2.44.0-slim-py312-cu124
1+
FROM anyscale/ray:2.44.0-slim-py312-cu128
22

33
RUN sudo apt-get update -y && sudo apt-get install -y wget kmod libxml2 build-essential libnuma-dev
44

55
# the cuda compiler here is needed for deepspeed
6-
RUN wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
7-
RUN sudo sh cuda_12.4.0_550.54.14_linux.run --silent --toolkit
6+
RUN wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run \
7+
&& sudo sh cuda_12.8.0_570.86.10_linux.run --silent --toolkit && rm -rf cuda_12.8.0_570.86.10_linux.run
88

99
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
1010
RUN echo "export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook" >> /home/ray/.bashrc
11+
1112
RUN sudo apt-get update \
1213
&& sudo apt-get install -y openssh-server iputils-ping net-tools iproute2 traceroute netcat \
13-
libopenexr-dev libxi-dev libglfw3-dev libglew-dev libomp-dev libxinerama-dev libxcursor-dev tzdata
14-
RUN sudo apt update && sudo apt install --fix-broken && sudo apt install -y default-jre-headless openjdk-8-jdk
14+
libopenexr-dev libxi-dev libglfw3-dev libglew-dev libomp-dev libxinerama-dev libxcursor-dev tzdata \
15+
&& sudo apt-get clean && sudo rm -rf /var/lib/apt/lists/*
16+
17+
RUN sudo apt update && sudo apt install --fix-broken && sudo apt install -y default-jre-headless openjdk-8-jdk \
18+
&& sudo apt-get clean \
19+
&& sudo rm -rf /var/lib/apt/lists/*
20+
1521
# NOTE: vllm installation in base environment is needed for uv + vLLM to work
16-
RUN pip install vllm==0.8.5
17-
RUN pip install ray==2.44.0
22+
RUN pip install vllm==0.9.2 \
23+
&& pip install ray==2.44.0 \
24+
&& rm -rf ~/.cache/pip

skyrl-train/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ A quick start guide for installation and your first training run is provided bel
4242

4343
The only requirements are:
4444

45-
- CUDA version >=12.4
45+
- CUDA version 12.8
4646
- [uv](https://docs.astral.sh/uv/)
4747

4848
If you're running on an existing Ray cluster, make sure to use Ray 2.44.0 and Python 3.12. If not, proceed with the installation instructions below.

skyrl-train/docs/configuration/config.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ Generator Configuration
388388
min_p: 0.0
389389
top_k: -1
390390
391-
use_conversation_multi_turn: false
391+
use_conversation_multi_turn: true
392392
393393
# sampling params for evaluation
394394
eval_sampling_params:

skyrl-train/docs/examples/multi_turn_text2sql.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ Now that we have our dataset and database files, let's walk through the some of
146146
- Chat templating and loss masking for multi-turn conversations are handled by the ``SkyRLGymGenerator`` class.
147147

148148
- In the above example, we set ``use_conversation_multi_turn=false`` to enforce that the multi-turn conversation is formatted as a single assistant response.
149-
- If you want to use a conversation-based format, you can set ``use_conversation_multi_turn=true`` and the model will generate a separate assistant response for each turn.
149+
- If you want to use a conversation-based format, you can set ``use_conversation_multi_turn=true`` and the model will generate a separate assistant response for each turn. This is supported only with ``backend="vllm"`` as of now.
150150
- See :code_link:`skyrl_train/generators/skyrl_gym_generator.py` for more details on both options!
151151

152152
Launching Your Training Run

skyrl-train/docs/getting-started/installation.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Installation
33

44
Requirements
55
------------
6-
- CUDA version >=12.4
6+
- CUDA version 12.8
77
- `uv <https://docs.astral.sh/uv/>`_
88

99
We use `uv <https://docs.astral.sh/uv/>`_ to manage dependencies. We also make use of the `uv` and `ray` integration to manage dependencies for ray workers.
@@ -14,15 +14,15 @@ If you're running on an existing Ray cluster, make sure to use Ray 2.44.0 and Py
1414
Docker (recommended)
1515
---------------------
1616

17-
We provide a docker image with the base dependencies ``sumanthrh/skyrl-train-ray-2.44.0-py3.12-cu12.4`` for quick setup.
17+
We provide a docker image with the base dependencies ``sumanthrh/skyrl-train-ray-2.44.0-py3.12-cu12.8`` for quick setup.
1818

1919
1. Make sure to have `NVIDIA Container Runtime <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ installed.
2020

2121
2. You can launch the container using the following command:
2222

2323
.. code-block:: bash
2424
25-
docker run -it --runtime=nvidia --gpus all --name skyrl-train sumanthrh/skyrl-train-ray-2.44.0-py3.12-cu12.4 /bin/bash
25+
docker run -it --runtime=nvidia --gpus all --name skyrl-train sumanthrh/skyrl-train-ray-2.44.0-py3.12-cu12.8 /bin/bash
2626
2727
3. Inside the launched container, setup the latest version of the project:
2828

@@ -39,7 +39,7 @@ Install without Dockerfile
3939

4040
For installation without the Dockerfile, make sure you meet the pre-requisities:
4141

42-
- CUDA 12.4
42+
- CUDA 12.8
4343
- `uv <https://docs.astral.sh/uv/>`_
4444
- `ray <https://docs.ray.io/en/latest/>`_ 2.44.0
4545

skyrl-train/examples/remote_inference_engine/run_remote.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,17 @@ set -x
99

1010
DATA_DIR="$HOME/data/gsm8k"
1111

12+
BACKEND="vllm" # or "sglang"
13+
TP=4
14+
1215
uv run --isolated --extra vllm -m skyrl_train.entrypoints.main_base \
1316
data.train_data="['$DATA_DIR/train.parquet']" \
1417
data.val_data="['$DATA_DIR/validation.parquet']" \
1518
trainer.policy.model.path="Qwen/Qwen2.5-1.5B-Instruct" \
1619
generator.run_engines_locally=False \
1720
generator.remote_inference_engine_urls="['127.0.0.1:8001']" \
18-
generator.override_existing_update_group=True \
21+
generator.inference_engine_tensor_parallel_size="$TP" \
22+
generator.backend="$BACKEND" \
1923
generator.sampling_params.temperature=0.6 \
2024
generator.sampling_params.top_p=0.95 \
2125
trainer.algorithm.advantage_estimator="grpo" \
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Launches sglang server for Qwen2.5-1.5B-Instruct on 4 GPUs.
2+
# bash examples/remote_inference_engine/run_sglang_server.sh
3+
set -x
4+
5+
CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --isolated --extra sglang -m \
6+
skyrl_train.inference_engines.sglang.sglang_server \
7+
--model-path Qwen/Qwen2.5-1.5B-Instruct \
8+
--tp 4 \
9+
--host 127.0.0.1 \
10+
--port 8001 \
11+
--context-length 4096 \
12+
--dtype bfloat16

skyrl-train/examples/remote_inference_engine/run_vllm_server.sh

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
# bash examples/remote_inference_engine/run_vllm_server.sh
33
set -x
44

5-
uv run --isolated --extra vllm -m skyrl_train.inference_engines.vllm.vllm_server \
5+
# NOTE (sumanthrh): Currently, there's an issue with distributed executor backend ray for vllm 0.9.2.
6+
# For standalone server, we use mp for now.
7+
CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --isolated --extra vllm -m skyrl_train.inference_engines.vllm.vllm_server \
68
--model Qwen/Qwen2.5-1.5B-Instruct \
79
--tensor-parallel-size 4 \
810
--host 127.0.0.1 \
@@ -17,5 +19,5 @@ uv run --isolated --extra vllm -m skyrl_train.inference_engines.vllm.vllm_server
1719
--max-num_batched_tokens 8192 \
1820
--max-num-seqs 1024 \
1921
--trust-remote-code \
20-
--distributed-executor-backend ray \
22+
--distributed-executor-backend mp \
2123
--worker-extension-cls skyrl_train.inference_engines.vllm.vllm_engine.WorkerWrap

skyrl-train/examples/search/run_search.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ uv run --isolated --frozen --extra vllm -m skyrl_train.entrypoints.main_base \
4242
generator.sampling_params.max_generate_length=500 \
4343
generator.async_engine=true \
4444
generator.batched=false \
45+
generator.use_conversation_multi_turn=false \
4546
generator.n_samples_per_prompt=5 \
4647
generator.max_turns=4 \
4748
generator.use_conversation_multi_turn=false \

skyrl-train/examples/text_to_sql/run_skyrl_sql.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ uv run --isolated --extra vllm -m skyrl_train.entrypoints.main_base \
5252
generator.async_engine=true \
5353
generator.batched=false \
5454
environment.env_class=text2sql \
55+
generator.use_conversation_multi_turn=false \
5556
generator.n_samples_per_prompt=5 \
5657
generator.gpu_memory_utilization=0.7 \
5758
generator.max_turns=6 \

0 commit comments

Comments
 (0)