Pinned Loading
-
terminal-bench-rl
terminal-bench-rl PublicGRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
-
calculator_agent_rl
calculator_agent_rl PublicTraining an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
-
tbench-agentic-data-pipeline
tbench-agentic-data-pipeline PublicMulti-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training
-
agentic_environments
agentic_environments PublicA mini-framework to build agentic LLM environments.
Python 2
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.