Create run_rl.py with ART RL loop #161

saum7800 · 2025-06-24T07:25:18Z

A new file, dev/tau-bench/run_rl.py, was created, mirroring run.py but implementing an ART RL training loop.

Key changes and additions include:

ART RL Training Loop: The core train function orchestrates the RL process, utilizing art.TrainableModel, art.gather_trajectory_groups, and model.train().
Configuration: New Pydantic models, TauBenchTrainingConfig and TauBenchPolicyConfig, were introduced to manage RL-specific hyperparameters and integrate tau-bench's RunConfig.
Trajectory Generation: The rollout_tau_bench_task function adapts tau-bench's task evaluation to generate art.Trajectory objects, converting agent interactions and rewards into a format suitable for ART.
Agent Integration: The agent_factory from tau-bench.run is used, with the agent's internal model dynamically overridden to point to the art.TrainableModel's inference endpoint during rollouts.
Argument Parsing: parse_args was extended to include RL-specific command-line arguments while retaining most run.py arguments for tau-bench compatibility.
Evaluation: Periodic evaluation is performed using evaluate_model on a validation set, leveraging the same rollout_tau_bench_task mechanism.

The implementation reuses tau-bench abstractions like get_env and agent_factory to maintain consistency and minimize changes to the existing codebase.

Add RL training script for tau-bench with ART framework integration

3728680

saum7800 merged commit b889e7f into tau_bench Jun 24, 2025
1 check passed

saum7800 deleted the cursor/create-run-rl-py-with-art-rl-loop-78ca branch June 24, 2025 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create run_rl.py with ART RL loop #161

Create run_rl.py with ART RL loop #161

Uh oh!

saum7800 commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Create run_rl.py with ART RL loop #161

Create run_rl.py with ART RL loop #161

Uh oh!

Conversation

saum7800 commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!