Skip to content

Commit b375451

Browse files
authored
[Feature] vLLM revamp (#3158)
1 parent cc8abe7 commit b375451

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+6365
-1214
lines changed

.github/unittest/linux/scripts/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ dependencies:
2727
- imageio==2.26.0
2828
- wandb
2929
- dm_control
30-
- mujoco
30+
- mujoco<3.3.6
3131
- mlflow
3232
- av
3333
- coverage

.github/unittest/linux/scripts/run_all.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,6 @@ if [[ "$PYTHON_VERSION" == "3.12" ]]; then
107107
else
108108
pip3 install "gymnasium[atari,mujoco]>=1.1" mo-gymnasium[mujoco]
109109
fi
110-
pip3 install "mujoco" -U
111110

112111
# sanity check: remove?
113112
python3 -c """

.github/unittest/linux_distributed/scripts/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ dependencies:
2626
- imageio==2.26.0
2727
- wandb
2828
- dm_control
29-
- mujoco
29+
- mujoco<3.3.6
3030
- mlflow
3131
- av
3232
- coverage

.github/unittest/linux_libs/scripts_envpool/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,5 @@ dependencies:
2121
- pyyaml
2222
- scipy
2323
- dm_control
24-
- mujoco
24+
- mujoco<3.3.6
2525
- coverage

.github/unittest/linux_libs/scripts_minari/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@ dependencies:
2525
- gymnasium-robotics
2626
- minari[create]
2727
- jax>=0.7.0
28-
- mujoco
28+
- mujoco<3.3.6
2929
- mujoco-py<2.2,>=2.1
3030
- minigrid

.github/unittest/linux_olddeps/scripts_gym_0_13/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ dependencies:
2121
- pyyaml
2222
- scipy
2323
- hydra-core
24-
- mujoco
24+
- mujoco<3.3.6
2525
- patchelf
2626
- pyopengl==3.1.4
2727
- ray

.github/unittest/linux_sota/scripts/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ dependencies:
2323
- hydra-core
2424
- imageio==2.26.0
2525
- dm_control
26-
- mujoco
26+
- mujoco<3.3.6
2727
- mlflow
2828
- av
2929
- coverage

README.md

Lines changed: 72 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<img src="docs/source/_static/img/icon.png" width="200" >
1818
</p>
1919

20-
[**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) |
20+
[**What's New**](#-whats-new) | [**LLM API**](#llm-api---complete-framework-for-language-model-fine-tuning) | [**Getting Started**](#getting-started) | [**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) |
2121
[**Features**](#features) | [**Examples, tutorials and demos**](#examples-tutorials-and-demos) | [**Citation**](#citation) | [**Installation**](#installation) |
2222
[**Asking a question**](#asking-a-question) | [**Contributing**](#contributing)
2323

@@ -49,54 +49,37 @@ pip install hydra-core omegaconf
4949

5050
Check out the [complete CLI documentation](https://github.com/pytorch/rl/tree/main/sota-implementations/ppo_trainer) to get started!
5151

52-
### LLM API - Complete Framework for Language Model Fine-tuning
52+
### 🚀 **vLLM Revamp** - Major Enhancement to LLM Infrastructure (v0.10)
5353

54-
TorchRL also includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This new framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
54+
This release introduces a comprehensive revamp of TorchRL's vLLM integration, delivering significant improvements in performance, scalability, and usability for large language model inference and training workflows:
5555

56-
- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines - more to come!
57-
- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection
58-
- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms
59-
- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models
60-
-**High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support
61-
- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation
62-
63-
The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started!
64-
65-
<details>
66-
<summary>Quick LLM API Example</summary>
56+
- 🔥 **AsyncVLLM Service**: Production-ready distributed vLLM inference with multi-replica scaling and automatic Ray actor management
57+
- ⚖️ **Multiple Load Balancing Strategies**: Routing strategies including prefix-aware, request-based, and KV-cache load balancing for optimal performance
58+
- 🏗️ **Unified vLLM Architecture**: New `RLvLLMEngine` interface standardizing all vLLM backends with simplified `vLLMUpdaterV2` for seamless weight updates
59+
- 🌐 **Distributed Data Loading**: New `RayDataLoadingPrimer` for shared, distributed data loading across multiple environments
60+
- 📈 **Enhanced Performance**: Native vLLM batching, concurrent request processing, and optimized resource allocation via Ray placement groups
6761

6862
```python
69-
from torchrl.envs.llm import ChatEnv
70-
from torchrl.modules.llm import TransformersWrapper
71-
from torchrl.objectives.llm import GRPOLoss
72-
from torchrl.collectors.llm import LLMCollector
73-
74-
# Create environment with Python tool execution
75-
env = ChatEnv(
76-
tokenizer=tokenizer,
77-
system_prompt="You are an assistant that can execute Python code.",
78-
batch_size=[1]
79-
).append_transform(PythonInterpreter())
80-
81-
# Wrap your language model
82-
llm = TransformersWrapper(
83-
model=model,
84-
tokenizer=tokenizer,
85-
input_mode="history"
63+
# Simple AsyncVLLM usage - production ready!
64+
from torchrl.modules.llm import AsyncVLLM, vLLMWrapper
65+
66+
# Create distributed vLLM service with load balancing
67+
service = AsyncVLLM.from_pretrained(
68+
"Qwen/Qwen2.5-7B",
69+
num_devices=2, # Tensor parallel across 2 GPUs
70+
num_replicas=4, # 4 replicas for high throughput
71+
max_model_len=4096
8672
)
8773

88-
# Set up GRPO training
89-
loss_fn = GRPOLoss(llm, critic, gamma=0.99)
90-
collector = LLMCollector(env, llm, frames_per_batch=100)
74+
# Use with TorchRL's LLM wrappers
75+
wrapper = vLLMWrapper(service, input_mode="history")
9176

92-
# Training loop
93-
for data in collector:
94-
loss = loss_fn(data)
95-
loss.backward()
96-
optimizer.step()
77+
# Simplified weight updates
78+
from torchrl.collectors.llm import vLLMUpdaterV2
79+
updater = vLLMUpdaterV2(service) # Auto-configures from engine
9780
```
9881

99-
</details>
82+
This revamp positions TorchRL as the leading platform for scalable LLM inference and training, providing production-ready tools for both research and deployment scenarios.
10083

10184
### 🧪 PPOTrainer (Experimental) - High-Level Training Interface
10285

@@ -159,6 +142,55 @@ python sota-implementations/ppo_trainer/train.py --help
159142

160143
**Future Plans**: Additional algorithm trainers (SAC, TD3, DQN) and full integration of all TorchRL components within the configuration system are planned for upcoming releases.
161144

145+
## LLM API - Complete Framework for Language Model Fine-tuning
146+
147+
TorchRL includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
148+
149+
- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines
150+
- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection
151+
- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms
152+
- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models
153+
-**High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support
154+
- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation
155+
156+
The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started!
157+
158+
<details>
159+
<summary>Quick LLM API Example</summary>
160+
161+
```python
162+
from torchrl.envs.llm import ChatEnv
163+
from torchrl.modules.llm import TransformersWrapper
164+
from torchrl.objectives.llm import GRPOLoss
165+
from torchrl.collectors.llm import LLMCollector
166+
167+
# Create environment with Python tool execution
168+
env = ChatEnv(
169+
tokenizer=tokenizer,
170+
system_prompt="You are an assistant that can execute Python code.",
171+
batch_size=[1]
172+
).append_transform(PythonInterpreter())
173+
174+
# Wrap your language model
175+
llm = TransformersWrapper(
176+
model=model,
177+
tokenizer=tokenizer,
178+
input_mode="history"
179+
)
180+
181+
# Set up GRPO training
182+
loss_fn = GRPOLoss(llm, critic, gamma=0.99)
183+
collector = LLMCollector(env, llm, frames_per_batch=100)
184+
185+
# Training loop
186+
for data in collector:
187+
loss = loss_fn(data)
188+
loss.backward()
189+
optimizer.step()
190+
```
191+
192+
</details>
193+
162194
## Key features
163195

164196
- 🐍 **Python-first**: Designed with Python as the primary language for ease of use and flexibility

docs/requirements.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ sphinx_design
1515

1616
torchvision
1717
dm_control
18-
mujoco
18+
mujoco<3.3.6
1919
gym[classic_control,accept-rom-license,ale-py,atari]
2020
pygame
2121
tqdm
@@ -29,3 +29,6 @@ onnxscript
2929
onnxruntime
3030
onnx
3131
psutil
32+
hydra-core>=1.1
33+
omegaconf
34+
hydra-submitit-launcher

docs/source/reference/envs.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1115,7 +1115,6 @@ to be able to create this other composition:
11151115
ConditionalPolicySwitch
11161116
ConditionalSkip
11171117
Crop
1118-
DataLoadingPrimer
11191118
DTypeCastTransform
11201119
DeviceCastTransform
11211120
DiscreteActionProjection

0 commit comments

Comments
 (0)