AgentFly: Scalable and Extensible Reinforcement Learning for LLM Agents

AgentFly is an extensible framework for building LLM agents with reinforcement learning. It supports multi-turn training by adapting traditional RL methods with token-level masking. It features a decorator-based interface for defining tools and reward functions, enabling seamless extension and ease of use. To support high-throughput training, it implemented asynchronous execution of tool calls and reward computations, and design a centralized resource management system for scalable environment coordination. A suite of prebuilt tools and environments are provided.

🆕 News

Multi-Modal (Vision) Agent Training Support - Thanks to the powerful template system, AgentFly now supports training vision-language agents! 🎉

Train agents that can see and understand visual content, including GUI automation and image-based QA. See our predefined training examples for ready-to-use scripts.

New: Chat Template System - A flexible framework for creating conversation templates with multi-model support, vision capabilities, and tool integration. Learn more →

Installation

Clone and initialize the project:

git clone https://github.com/Agent-One-Lab/AgentFly
cd AgentFly
git submodule init
git submodule update

Basic python packages installation:

pip install -e .
pip install -e '.[verl]' --no-build-isolation

Optionally, some tools actually require some additional dependencies:

Some of our tools & environments are managed by enroot backend. To use them, please install enroot (sudo required). Such tools include code_interpreter, retrieval, webshop, alfworld, sciencworld.

Search requires redis to cache results, an optional way to install with conda:

conda install conda-forge::redis-server==7.4.0

Features

1. Multi-Chain Agent Rollout and Multi-Turn Training

To support algorithms like GRPO, Reinforce++, we design multi-chain inference, enabling agents to solve one task with multiple paths at the same time. We build RL computation and update LLMs in multi-turn manner by applying token masks. The training is based on verl.

2. Simple Tool and Reward Integration

Define tools and rewards, which can be used directly by agents.

@tool(name=...)
def customized_tool(...):
    ...

def custmozed_reward(...):
    ...

agent = ReactAgent(
    model_name,
    tools=[customized_tool],
    reward=customized_reward
)

3. Easy Development

Decoupled agent and training module. Simply customize your own agent, which can directly be applied to training.

Training

Run Example Training

Suppose you are in a compute node (with 8 gpus). We have prepared some training scripts for different tasks and tools in verl/examples/run_agents/. The script will try to download our prepared datasets and run training.

Run RL training of code_interpreter:

cd verl
bash examples/run_agents/run_code_agent.sh

Customized Training

To customize your own training, you need to prepare: 1. Datasets. 2. Define or use existing tools. 3. Define or use existing rewards. 3. Define your own agents or use an existing type of agent.

1. Data Format:

Data should be a json file, which contain a list of dicts with the following keys:

[
    {
        "question": ...
        "optional_field1": ...
        "optional_field2": ...
        ...
    }
]

During training, question will be used to format the input messages, while other fields can be used in reward function. An example message that are put into the agent looks like this:

{
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": question}]}
    ]
    "optional_field1": ...
    "optional_field2": ...
    ...
}

2. Tools & Rewards

You can use any existing tool, which is in documentation, or define a tool by decorating it with @tool. The output should eighther be a string, or a dictionary containing observation as a key.

@reward(name="customized_tool")
def customized_tool(arg1, arg2):
    # tool logic here

Define your reward function or use an existing one. The reward function can accept prediction and trajectory as the argument, which is the agent's final response and the whole trajectory. Other fields will also be given if you defined them in dataset. To use them, simply put these fields as arguments in reward function.

@reward(name="customized_reward")
def customized_reward(prediction, trajectory, optional_field1, optional_field2):
    # calculate reward
    ...

For stateful tools and rewards that hold environment instances, please refer to documentation.

3. Agents

You can use existing code agent, react agent, or customize an agent. To customize an agent, the agent class must inherit BaseAgent, which handles tool calling, chain rollout. You can custom the generate and parse function. Refer to documentation for more details.

class CustomizedAgent(BaseAgent):
    def __init__(self,
        **kwargs
    )
        super().__init__(**kwargs)

    async def generate_async(self, messages_list: List[List[Dict]], **args):
        return await self.llm_engine.generate_async(messages_list, **args)

    def parse(self, responses: List(str), tools):
        # parse responses into tool calls
        ...

Demo

The following shows an example of WebShop agent.
What does the training look like. During training, the resource system will dynamically allocate environments.
Monitoring training on WANDB. Items include number of turns for each step, numer of tool calls, allocated environments.

demo.mp4

Contribute & Discussion

WeChat|微信

Discord

Training Curves

Reward curves on Qwen2.5-Instruct 3B and 7B models.

Cite

If you used our code or find it helpful, please cite:

@misc{wang2025agentfly,
      title={AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents}, 
      author={Renxi Wang and Rifo Ahmad Genadi and Bilal El Bouardi and Yongxin Wang and Fajri Koto and Zhengzhong Liu and Timothy Baldwin and Haonan Li},
      year={2025},
      eprint={2507.14897},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.14897}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
agents		agents
assets		assets
docs		docs
verl @ 1f3ca13		verl @ 1f3ca13
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgentFly: Scalable and Extensible Reinforcement Learning for LLM Agents

🆕 News

Installation

Features

1. Multi-Chain Agent Rollout and Multi-Turn Training

2. Simple Tool and Reward Integration

3. Easy Development

Training

Run Example Training

Customized Training

1. Data Format:

2. Tools & Rewards

3. Agents

Demo

Contribute & Discussion

Training Curves

Cite

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

Agent-One-Lab/AgentFly

Folders and files

Latest commit

History

Repository files navigation

AgentFly: Scalable and Extensible Reinforcement Learning for LLM Agents

🆕 News

Installation

Features

1. Multi-Chain Agent Rollout and Multi-Turn Training

2. Simple Tool and Reward Integration

3. Easy Development

Training

Run Example Training

Customized Training

1. Data Format:

2. Tools & Rewards

3. Agents

Demo

Contribute & Discussion

Training Curves

Cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages