Skip to content

Create run_rl.py with ART RL loop #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2025

Conversation

saum7800
Copy link
Collaborator

A new file, dev/tau-bench/run_rl.py, was created, mirroring run.py but implementing an ART RL training loop.

Key changes and additions include:

  • ART RL Training Loop: The core train function orchestrates the RL process, utilizing art.TrainableModel, art.gather_trajectory_groups, and model.train().
  • Configuration: New Pydantic models, TauBenchTrainingConfig and TauBenchPolicyConfig, were introduced to manage RL-specific hyperparameters and integrate tau-bench's RunConfig.
  • Trajectory Generation: The rollout_tau_bench_task function adapts tau-bench's task evaluation to generate art.Trajectory objects, converting agent interactions and rewards into a format suitable for ART.
  • Agent Integration: The agent_factory from tau-bench.run is used, with the agent's internal model dynamically overridden to point to the art.TrainableModel's inference endpoint during rollouts.
  • Argument Parsing: parse_args was extended to include RL-specific command-line arguments while retaining most run.py arguments for tau-bench compatibility.
  • Evaluation: Periodic evaluation is performed using evaluate_model on a validation set, leveraging the same rollout_tau_bench_task mechanism.

The implementation reuses tau-bench abstractions like get_env and agent_factory to maintain consistency and minimize changes to the existing codebase.

@saum7800 saum7800 merged commit b889e7f into tau_bench Jun 24, 2025
1 check passed
@saum7800 saum7800 deleted the cursor/create-run-rl-py-with-art-rl-loop-78ca branch June 24, 2025 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants