-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Note to Community
This roadmap is a living document, and we welcome community input and feedback to guide the SkyRL project. We will update this roadmap to link to specific Issues and PRs for each sub-task.
Overview
We plan to continue making SkyRL the easiest high-performance RL framework to modify and extend. We will focus primarily on improving the ease-of-use and the performance of RL for agentic tasks.
Objectives Breakdown
Agents
Overview: Improving the performance and ease-of-use for agentic RL by making it easy to train with existing agentic harnesses on SkyRL, implement new agentic loops and harnesses, and scale the RL environments.
- Provide integrated support for popular agentic tasks (e.g., SWE-Bench, Terminal Bench, KernelBench)
- Provide infra for scaling environments across widely distributed compute clusters
- Provide examples and guides of integrating agent harnesses on top of SkyRL
- Provide OpenAI API HTTP endpoint to agent stack, instead of only the
InferenceEngineInterface
Provide HTTP Endpoint for inference engine client #96
Asynchrony and Disaggregation
Overview: SkyRL provides a trainer for one-step off-policy async training (link). We will continue improving its performance and ease-of-use, and will introduce support for fully asynchronous training and generation
- Improved distributed weight sync, potentially via a more generic transfer engine
- Introduce a fully asynchronous trainer
- Introduce support for streaming trajectories
- Improve interface for launching widely distributed data-parallel inference engines
- Improve load balancing across widely distributed inference engines
- Improve prefix caching, especially for multi-step tasks across widely distributed inference engines
Training Stack
Overview: Broadening feature and recipe support (with a focus on agentic tasks) and improving ease-of-use.
- Support efficient large-scale MoE training
- Implement important recipes and algorithms: Dr GRPO, DAPO, others. Add DAPO Recipe #88
- Support token-level rewards
- Support dynamic micro batch sizing feat: dynamic bsz #170
- Support LoRA training [Trainer] Add support for LoRA training #110
Ease-of-use
- Expand our documentation and out-of-the-box examples for faster ramp-up
- Expand performance tuning guide
- Improve trajectory generation logging and visualization Add generic trajectory logger #122
- Improve flexibility of metric logging Support exporting environment-specific metrics #139