v0.4.0
🚀 Introducing RULER: Relative Universal LLM-Elicited Rewards
We're excited to announce ART v0.4.0, featuring RULER - a groundbreaking general-purpose reward function that makes agent training dramatically easier and faster!
📏 What is RULER?
RULER (Relative Universal LLM-Elicited Rewards) uses an LLM-as-judge to rank agent trajectories, eliminating the need for:
- ❌ Labeled training data
- ❌ Expert feedback
- ❌ Hand-crafted reward functions
Yet it often matches or exceeds the performance of carefully designed reward functions!
🎯 Key Benefits
- 2-3x faster development: Skip the tedious reward engineering phase
- Universal application: Works across diverse RL tasks without modification
- Production-ready: Battle-tested on real tasks with impressive results
- Simple integration: Just a few lines of code to get started
📖 Learn More
Check out the RULER documentation to see how easy it is to use:
from art.rewards import ruler_score_group
# Score your trajectories with one line
judged_group = await ruler_score_group(group, "openai/gpt-4o-mini")
Read the full launch announcement for detailed performance comparisons and insights.
What's Changed
Major Features
Other Improvements
- Update README (#223)
- fix python version in art-e (#222)
- Add setproctitle as dep in colab notebooks (#220)
- Move plotting dependencies to optional group (#217)
- feat: tau-bench brad 003 (#216)
- Allow validation_loader argument to train method (#215)
- Update swe-bench example docs (#214)
- chore: Remove workaround for torch-compile and use --torch-compile flag (#212)
- Adds option to use padding with --torch-compile (#211)
- Fix tau-bench example (#210)
- art-2048: update qwen model identifier (#209)
- Allow Unsloth to use --pad_token when tokenizer has no pad token (#208)
- Allow using specific wandb projects in the CLI (#207) (#207)
- feat: Allow using
get_peft_model
to re-initialize trainer state (#206) - chore: Add
art_trainer
module with ART's TRL Trainer (#205) - Art tau bench example (#204)
- 🔊 Improve noisy startup (#203) (#203)
- feat: SWE-Bench Example (#201)
- Update to 0.3.13, pin accelerate (#197)
Full Changelog: v0.3.13...v0.4.0
EOF < /dev/null