pytorch
diff --git a/‎.github/unittest/linux/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/unittest/linux/scripts/run_all.sh‎
Lines changed: 0 additions & 1 deletion b/‎.github/unittest/linux/scripts/run_all.sh‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎.github/unittest/linux_distributed/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux_distributed/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/unittest/linux_libs/scripts_envpool/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux_libs/scripts_envpool/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/unittest/linux_libs/scripts_minari/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux_libs/scripts_minari/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/unittest/linux_olddeps/scripts_gym_0_13/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux_olddeps/scripts_gym_0_13/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/unittest/linux_sota/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/unittest/linux_sota/scripts/environment.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 72 additions & 40 deletions b/‎README.md‎
Lines changed: 72 additions & 40 deletions
diff --git a/‎docs/requirements.txt‎
Lines changed: 4 additions & 1 deletion b/‎docs/requirements.txt‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/source/reference/envs.rst‎
Lines changed: 0 additions & 1 deletion b/‎docs/source/reference/envs.rst‎
Lines changed: 0 additions & 1 deletion
@@ -27,7 +27,7 @@ dependencies:
     - imageio==2.26.0
     - wandb
     - dm_control
-    - mujoco
+    - mujoco<3.3.6
     - mlflow
     - av
     - coverage
 
@@ -107,7 +107,6 @@ if [[ "$PYTHON_VERSION" == "3.12" ]]; then
 else
   pip3 install "gymnasium[atari,mujoco]>=1.1" mo-gymnasium[mujoco]
 fi
-pip3 install "mujoco" -U
 
 # sanity check: remove?
 python3 -c """
 
@@ -26,7 +26,7 @@ dependencies:
     - imageio==2.26.0
     - wandb
     - dm_control
-    - mujoco
+    - mujoco<3.3.6
     - mlflow
     - av
     - coverage
 
@@ -21,5 +21,5 @@ dependencies:
     - pyyaml
     - scipy
     - dm_control
-    - mujoco
+    - mujoco<3.3.6
     - coverage
@@ -25,6 +25,6 @@ dependencies:
     - gymnasium-robotics
     - minari[create]
     - jax>=0.7.0
-    - mujoco               
+    - mujoco<3.3.6      
     - mujoco-py<2.2,>=2.1
     - minigrid
@@ -21,7 +21,7 @@ dependencies:
     - pyyaml
     - scipy
     - hydra-core
-    - mujoco
+    - mujoco<3.3.6
     - patchelf
     - pyopengl==3.1.4
     - ray
 
@@ -23,7 +23,7 @@ dependencies:
     - hydra-core
     - imageio==2.26.0
     - dm_control
-    - mujoco
+    - mujoco<3.3.6
     - mlflow
     - av
     - coverage
 
@@ -17,7 +17,7 @@
   <img src="docs/source/_static/img/icon.png"  width="200" >
 </p>
 
-[**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) |
+[**What's New**](#-whats-new) | [**LLM API**](#llm-api---complete-framework-for-language-model-fine-tuning) | [**Getting Started**](#getting-started) | [**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) |
 [**Features**](#features) | [**Examples, tutorials and demos**](#examples-tutorials-and-demos) | [**Citation**](#citation) | [**Installation**](#installation) |
 [**Asking a question**](#asking-a-question) | [**Contributing**](#contributing)
 
@@ -49,54 +49,37 @@ pip install hydra-core omegaconf
 
 Check out the [complete CLI documentation](https://github.com/pytorch/rl/tree/main/sota-implementations/ppo_trainer) to get started!
 
-### LLM API - Complete Framework for Language Model Fine-tuning
+### 🚀 **vLLM Revamp** - Major Enhancement to LLM Infrastructure (v0.10)
 
-TorchRL also includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This new framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
+This release introduces a comprehensive revamp of TorchRL's vLLM integration, delivering significant improvements in performance, scalability, and usability for large language model inference and training workflows:
 
-- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines - more to come!
-- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection
-- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms
-- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models
-- ⚡ **High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support
-- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation
-
-The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started!
-
-<details>
-  <summary>Quick LLM API Example</summary>
+- 🔥 **AsyncVLLM Service**: Production-ready distributed vLLM inference with multi-replica scaling and automatic Ray actor management
+- ⚖️ **Multiple Load Balancing Strategies**: Routing strategies including prefix-aware, request-based, and KV-cache load balancing for optimal performance
+- 🏗️ **Unified vLLM Architecture**: New `RLvLLMEngine` interface standardizing all vLLM backends with simplified `vLLMUpdaterV2` for seamless weight updates
+- 🌐 **Distributed Data Loading**: New `RayDataLoadingPrimer` for shared, distributed data loading across multiple environments
+- 📈 **Enhanced Performance**: Native vLLM batching, concurrent request processing, and optimized resource allocation via Ray placement groups
 
 ```python
-from torchrl.envs.llm import ChatEnv
-from torchrl.modules.llm import TransformersWrapper
-from torchrl.objectives.llm import GRPOLoss
-from torchrl.collectors.llm import LLMCollector
-
-# Create environment with Python tool execution
-env = ChatEnv(
-    tokenizer=tokenizer,
-    system_prompt="You are an assistant that can execute Python code.",
-    batch_size=[1]
-).append_transform(PythonInterpreter())
-
-# Wrap your language model
-llm = TransformersWrapper(
-    model=model,
-    tokenizer=tokenizer,
-    input_mode="history"
+# Simple AsyncVLLM usage - production ready!
+from torchrl.modules.llm import AsyncVLLM, vLLMWrapper
+
+# Create distributed vLLM service with load balancing
+service = AsyncVLLM.from_pretrained(
+    "Qwen/Qwen2.5-7B",
+    num_devices=2,      # Tensor parallel across 2 GPUs
+    num_replicas=4,     # 4 replicas for high throughput
+    max_model_len=4096
 )
 
-# Set up GRPO training
-loss_fn = GRPOLoss(llm, critic, gamma=0.99)
-collector = LLMCollector(env, llm, frames_per_batch=100)
+# Use with TorchRL's LLM wrappers
+wrapper = vLLMWrapper(service, input_mode="history")
 
-# Training loop
-for data in collector:
-    loss = loss_fn(data)
-    loss.backward()
-    optimizer.step()
+# Simplified weight updates
+from torchrl.collectors.llm import vLLMUpdaterV2
+updater = vLLMUpdaterV2(service)  # Auto-configures from engine
 ```
 
-</details>
+This revamp positions TorchRL as the leading platform for scalable LLM inference and training, providing production-ready tools for both research and deployment scenarios.
 
 ### 🧪 PPOTrainer (Experimental) - High-Level Training Interface
 
@@ -159,6 +142,55 @@ python sota-implementations/ppo_trainer/train.py --help
 
 **Future Plans**: Additional algorithm trainers (SAC, TD3, DQN) and full integration of all TorchRL components within the configuration system are planned for upcoming releases.
 
+## LLM API - Complete Framework for Language Model Fine-tuning
+
+TorchRL includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
+
+- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines
+- 💬 **Conversation Management**: Advanced [`History`](torchrl/data/llm/history.py) class for multi-turn dialogue with automatic chat template detection
+- 🛠️ **Tool Integration**: [Built-in support](torchrl/envs/llm/transforms/) for Python code execution, function calling, and custom tool transforms
+- 🎯 **Specialized Objectives**: [GRPO](torchrl/objectives/llm/grpo.py) (Group Relative Policy Optimization) and [SFT](torchrl/objectives/llm/sft.py) loss functions optimized for language models
+- ⚡ **High-Performance Collectors**: [Async data collection](torchrl/collectors/llm/) with distributed training support
+- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation
+
+The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started!
+
+<details>
+  <summary>Quick LLM API Example</summary>
+
+```python
+from torchrl.envs.llm import ChatEnv
+from torchrl.modules.llm import TransformersWrapper
+from torchrl.objectives.llm import GRPOLoss
+from torchrl.collectors.llm import LLMCollector
+
+# Create environment with Python tool execution
+env = ChatEnv(
+    tokenizer=tokenizer,
+    system_prompt="You are an assistant that can execute Python code.",
+    batch_size=[1]
+).append_transform(PythonInterpreter())
+
+# Wrap your language model
+llm = TransformersWrapper(
+    model=model,
+    tokenizer=tokenizer,
+    input_mode="history"
+)
+
+# Set up GRPO training
+loss_fn = GRPOLoss(llm, critic, gamma=0.99)
+collector = LLMCollector(env, llm, frames_per_batch=100)
+
+# Training loop
+for data in collector:
+    loss = loss_fn(data)
+    loss.backward()
+    optimizer.step()
+```
+
+</details>
+
 ## Key features
 
 - 🐍 **Python-first**: Designed with Python as the primary language for ease of use and flexibility
 
@@ -15,7 +15,7 @@ sphinx_design
 
 torchvision
 dm_control
-mujoco
+mujoco<3.3.6
 gym[classic_control,accept-rom-license,ale-py,atari]
 pygame
 tqdm
@@ -29,3 +29,6 @@ onnxscript
 onnxruntime
 onnx
 psutil
+hydra-core>=1.1
+omegaconf
+hydra-submitit-launcher
@@ -1115,7 +1115,6 @@ to be able to create this other composition:
     ConditionalPolicySwitch
     ConditionalSkip
     Crop
-    DataLoadingPrimer
     DTypeCastTransform
     DeviceCastTransform
     DiscreteActionProjection