Release v0.4.0 · OpenPipe/ART

🚀 Introducing RULER: Relative Universal LLM-Elicited Rewards

We're excited to announce ART v0.4.0, featuring RULER - a groundbreaking general-purpose reward function that makes agent training dramatically easier and faster!

📏 What is RULER?

RULER (Relative Universal LLM-Elicited Rewards) uses an LLM-as-judge to rank agent trajectories, eliminating the need for:

❌ Labeled training data
❌ Expert feedback
❌ Hand-crafted reward functions

Yet it often matches or exceeds the performance of carefully designed reward functions!

🎯 Key Benefits

2-3x faster development: Skip the tedious reward engineering phase
Universal application: Works across diverse RL tasks without modification
Production-ready: Battle-tested on real tasks with impressive results
Simple integration: Just a few lines of code to get started

📖 Learn More

Check out the RULER documentation to see how easy it is to use:

from art.rewards import ruler_score_group

# Score your trajectories with one line
judged_group = await ruler_score_group(group, "openai/gpt-4o-mini")

Read the full launch announcement for detailed performance comparisons and insights.

What's Changed

Major Features

Add RULER reward function (#218) 🎉
RULER documentation (#221)

Other Improvements

Update README (#223)
fix python version in art-e (#222)
Add setproctitle as dep in colab notebooks (#220)
Move plotting dependencies to optional group (#217)
feat: tau-bench brad 003 (#216)
Allow validation_loader argument to train method (#215)
Update swe-bench example docs (#214)
chore: Remove workaround for torch-compile and use --torch-compile flag (#212)
Adds option to use padding with --torch-compile (#211)
Fix tau-bench example (#210)
art-2048: update qwen model identifier (#209)
Allow Unsloth to use --pad_token when tokenizer has no pad token (#208)
Allow using specific wandb projects in the CLI (#207) (#207)
feat: Allow using get_peft_model to re-initialize trainer state (#206)
chore: Add art_trainer module with ART's TRL Trainer (#205)
Art tau bench example (#204)
🔊 Improve noisy startup (#203) (#203)
feat: SWE-Bench Example (#201)
Update to 0.3.13, pin accelerate (#197)

Full Changelog: v0.3.13...v0.4.0
EOF < /dev/null

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0