|
17 | 17 | <img src="docs/source/_static/img/icon.png" width="200" >
|
18 | 18 | </p>
|
19 | 19 |
|
20 |
| -[**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) | |
| 20 | +[**What's New**](#-whats-new) | [**LLM API**](#llm-api---complete-framework-for-language-model-fine-tuning) | [**Getting Started**](#getting-started) | [**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) | |
21 | 21 | [**Features**](#features) | [**Examples, tutorials and demos**](#examples-tutorials-and-demos) | [**Citation**](#citation) | [**Installation**](#installation) |
|
22 | 22 | [**Asking a question**](#asking-a-question) | [**Contributing**](#contributing)
|
23 | 23 |
|
@@ -49,54 +49,37 @@ pip install hydra-core omegaconf
|
49 | 49 |
|
50 | 50 | Check out the [complete CLI documentation](https://github.com/pytorch/rl/tree/main/sota-implementations/ppo_trainer) to get started!
|
51 | 51 |
|
52 |
| -### LLM API - Complete Framework for Language Model Fine-tuning |
| 52 | +### 🚀 **vLLM Revamp** - Major Enhancement to LLM Infrastructure (v0.10) |
53 | 53 |
|
54 |
| -TorchRL also includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This new framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training: |
| 54 | +This release introduces a comprehensive revamp of TorchRL's vLLM integration, delivering significant improvements in performance, scalability, and usability for large language model inference and training workflows: |
55 | 55 |
|
56 |
| -- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines - more to come! |
57 |
| -- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection |
58 |
| -- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms |
59 |
| -- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models |
60 |
| -- ⚡ **High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support |
61 |
| -- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation |
62 |
| - |
63 |
| -The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started! |
64 |
| - |
65 |
| -<details> |
66 |
| - <summary>Quick LLM API Example</summary> |
| 56 | +- 🔥 **AsyncVLLM Service**: Production-ready distributed vLLM inference with multi-replica scaling and automatic Ray actor management |
| 57 | +- ⚖️ **Multiple Load Balancing Strategies**: Routing strategies including prefix-aware, request-based, and KV-cache load balancing for optimal performance |
| 58 | +- 🏗️ **Unified vLLM Architecture**: New `RLvLLMEngine` interface standardizing all vLLM backends with simplified `vLLMUpdaterV2` for seamless weight updates |
| 59 | +- 🌐 **Distributed Data Loading**: New `RayDataLoadingPrimer` for shared, distributed data loading across multiple environments |
| 60 | +- 📈 **Enhanced Performance**: Native vLLM batching, concurrent request processing, and optimized resource allocation via Ray placement groups |
67 | 61 |
|
68 | 62 | ```python
|
69 |
| -from torchrl.envs.llm import ChatEnv |
70 |
| -from torchrl.modules.llm import TransformersWrapper |
71 |
| -from torchrl.objectives.llm import GRPOLoss |
72 |
| -from torchrl.collectors.llm import LLMCollector |
73 |
| - |
74 |
| -# Create environment with Python tool execution |
75 |
| -env = ChatEnv( |
76 |
| - tokenizer=tokenizer, |
77 |
| - system_prompt="You are an assistant that can execute Python code.", |
78 |
| - batch_size=[1] |
79 |
| -).append_transform(PythonInterpreter()) |
80 |
| - |
81 |
| -# Wrap your language model |
82 |
| -llm = TransformersWrapper( |
83 |
| - model=model, |
84 |
| - tokenizer=tokenizer, |
85 |
| - input_mode="history" |
| 63 | +# Simple AsyncVLLM usage - production ready! |
| 64 | +from torchrl.modules.llm import AsyncVLLM, vLLMWrapper |
| 65 | + |
| 66 | +# Create distributed vLLM service with load balancing |
| 67 | +service = AsyncVLLM.from_pretrained( |
| 68 | + "Qwen/Qwen2.5-7B", |
| 69 | + num_devices=2, # Tensor parallel across 2 GPUs |
| 70 | + num_replicas=4, # 4 replicas for high throughput |
| 71 | + max_model_len=4096 |
86 | 72 | )
|
87 | 73 |
|
88 |
| -# Set up GRPO training |
89 |
| -loss_fn = GRPOLoss(llm, critic, gamma=0.99) |
90 |
| -collector = LLMCollector(env, llm, frames_per_batch=100) |
| 74 | +# Use with TorchRL's LLM wrappers |
| 75 | +wrapper = vLLMWrapper(service, input_mode="history") |
91 | 76 |
|
92 |
| -# Training loop |
93 |
| -for data in collector: |
94 |
| - loss = loss_fn(data) |
95 |
| - loss.backward() |
96 |
| - optimizer.step() |
| 77 | +# Simplified weight updates |
| 78 | +from torchrl.collectors.llm import vLLMUpdaterV2 |
| 79 | +updater = vLLMUpdaterV2(service) # Auto-configures from engine |
97 | 80 | ```
|
98 | 81 |
|
99 |
| -</details> |
| 82 | +This revamp positions TorchRL as the leading platform for scalable LLM inference and training, providing production-ready tools for both research and deployment scenarios. |
100 | 83 |
|
101 | 84 | ### 🧪 PPOTrainer (Experimental) - High-Level Training Interface
|
102 | 85 |
|
@@ -159,6 +142,55 @@ python sota-implementations/ppo_trainer/train.py --help
|
159 | 142 |
|
160 | 143 | **Future Plans**: Additional algorithm trainers (SAC, TD3, DQN) and full integration of all TorchRL components within the configuration system are planned for upcoming releases.
|
161 | 144 |
|
| 145 | +## LLM API - Complete Framework for Language Model Fine-tuning |
| 146 | + |
| 147 | +TorchRL includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training: |
| 148 | + |
| 149 | +- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines |
| 150 | +- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection |
| 151 | +- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms |
| 152 | +- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models |
| 153 | +- ⚡ **High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support |
| 154 | +- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation |
| 155 | + |
| 156 | +The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started! |
| 157 | + |
| 158 | +<details> |
| 159 | + <summary>Quick LLM API Example</summary> |
| 160 | + |
| 161 | +```python |
| 162 | +from torchrl.envs.llm import ChatEnv |
| 163 | +from torchrl.modules.llm import TransformersWrapper |
| 164 | +from torchrl.objectives.llm import GRPOLoss |
| 165 | +from torchrl.collectors.llm import LLMCollector |
| 166 | + |
| 167 | +# Create environment with Python tool execution |
| 168 | +env = ChatEnv( |
| 169 | + tokenizer=tokenizer, |
| 170 | + system_prompt="You are an assistant that can execute Python code.", |
| 171 | + batch_size=[1] |
| 172 | +).append_transform(PythonInterpreter()) |
| 173 | + |
| 174 | +# Wrap your language model |
| 175 | +llm = TransformersWrapper( |
| 176 | + model=model, |
| 177 | + tokenizer=tokenizer, |
| 178 | + input_mode="history" |
| 179 | +) |
| 180 | + |
| 181 | +# Set up GRPO training |
| 182 | +loss_fn = GRPOLoss(llm, critic, gamma=0.99) |
| 183 | +collector = LLMCollector(env, llm, frames_per_batch=100) |
| 184 | + |
| 185 | +# Training loop |
| 186 | +for data in collector: |
| 187 | + loss = loss_fn(data) |
| 188 | + loss.backward() |
| 189 | + optimizer.step() |
| 190 | +``` |
| 191 | + |
| 192 | +</details> |
| 193 | + |
162 | 194 | ## Key features
|
163 | 195 |
|
164 | 196 | - 🐍 **Python-first**: Designed with Python as the primary language for ease of use and flexibility
|
|
0 commit comments