Skip to content

v0.4.0

Compare
Choose a tag to compare
@github-actions github-actions released this 11 Jul 06:34
· 95 commits to main since this release

🚀 Introducing RULER: Relative Universal LLM-Elicited Rewards

We're excited to announce ART v0.4.0, featuring RULER - a groundbreaking general-purpose reward function that makes agent training dramatically easier and faster!

📏 What is RULER?

RULER (Relative Universal LLM-Elicited Rewards) uses an LLM-as-judge to rank agent trajectories, eliminating the need for:

  • ❌ Labeled training data
  • ❌ Expert feedback
  • ❌ Hand-crafted reward functions

Yet it often matches or exceeds the performance of carefully designed reward functions!

🎯 Key Benefits

  • 2-3x faster development: Skip the tedious reward engineering phase
  • Universal application: Works across diverse RL tasks without modification
  • Production-ready: Battle-tested on real tasks with impressive results
  • Simple integration: Just a few lines of code to get started

📖 Learn More

Check out the RULER documentation to see how easy it is to use:

from art.rewards import ruler_score_group

# Score your trajectories with one line
judged_group = await ruler_score_group(group, "openai/gpt-4o-mini")

Read the full launch announcement for detailed performance comparisons and insights.

What's Changed

Major Features

  • Add RULER reward function (#218) 🎉
  • RULER documentation (#221)

Other Improvements

  • Update README (#223)
  • fix python version in art-e (#222)
  • Add setproctitle as dep in colab notebooks (#220)
  • Move plotting dependencies to optional group (#217)
  • feat: tau-bench brad 003 (#216)
  • Allow validation_loader argument to train method (#215)
  • Update swe-bench example docs (#214)
  • chore: Remove workaround for torch-compile and use --torch-compile flag (#212)
  • Adds option to use padding with --torch-compile (#211)
  • Fix tau-bench example (#210)
  • art-2048: update qwen model identifier (#209)
  • Allow Unsloth to use --pad_token when tokenizer has no pad token (#208)
  • Allow using specific wandb projects in the CLI (#207) (#207)
  • feat: Allow using get_peft_model to re-initialize trainer state (#206)
  • chore: Add art_trainer module with ART's TRL Trainer (#205)
  • Art tau bench example (#204)
  • 🔊 Improve noisy startup (#203) (#203)
  • feat: SWE-Bench Example (#201)
  • Update to 0.3.13, pin accelerate (#197)

Full Changelog: v0.3.13...v0.4.0
EOF < /dev/null