[Roadmap] SkyRL Roadmap

# Note to Community
This roadmap is a living document, and we welcome community input and feedback to guide the SkyRL project. We will update this roadmap to link to specific Issues and PRs for each sub-task. 

# Overview
We plan to continue making SkyRL the easiest high-performance RL framework to modify and extend. We will focus primarily on improving the ease-of-use and the performance of RL for agentic tasks. 

# Objectives Breakdown

## Agents
Overview:  Improving the performance and ease-of-use for agentic RL by making it easy to train with existing agentic harnesses on SkyRL, implement new agentic loops and harnesses, and scale the RL environments. 
- [ ] Provide integrated support for popular agentic tasks (e.g., SWE-Bench, Terminal Bench, KernelBench)
- [ ] Provide infra for scaling environments across widely distributed compute clusters
- [ ] Provide examples and guides of integrating agent harnesses on top of SkyRL
- [ ] Provide OpenAI API HTTP endpoint to agent stack, instead of only the `InferenceEngineInterface` #96

## Asynchrony and Disaggregation
Overview: SkyRL provides a trainer for one-step off-policy async training ([link](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/async)). We will continue improving its performance and ease-of-use, and will introduce support for fully asynchronous training and generation
- [ ] Improved distributed weight sync, potentially via a more generic transfer engine
- [ ] Introduce a fully asynchronous trainer
- [ ] Introduce support for streaming trajectories 
- [ ] Improve interface for launching widely distributed data-parallel inference engines
- [ ] Improve load balancing across widely distributed inference engines
- [ ] Improve prefix caching, especially for multi-step tasks across widely distributed inference engines

## Training Stack
Overview: Broadening feature and recipe support (with a focus on agentic tasks) and improving ease-of-use.
- [ ] Support efficient large-scale MoE training
- [x] Implement important recipes and algorithms: Dr GRPO, DAPO, others. #88 
- [x] Support token-level rewards
- [ ] Support dynamic micro batch sizing #170 
- [ ] Support LoRA training #110 

## Ease-of-use
- [ ] Expand our documentation and out-of-the-box examples for faster ramp-up
- [ ] Expand performance tuning guide
- [ ] Improve trajectory generation logging and visualization  #122
- [ ] Improve flexibility of metric logging #139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] SkyRL Roadmap #83

Note to Community

Overview

Objectives Breakdown

Agents

Asynchrony and Disaggregation

Training Stack

Ease-of-use

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] SkyRL Roadmap #83

Description

Note to Community

Overview

Objectives Breakdown

Agents

Asynchrony and Disaggregation

Training Stack

Ease-of-use

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions