Skip to content
View Danau5tin's full-sized avatar

Block or report Danau5tin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. terminal-bench-rl terminal-bench-rl Public

    GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

    Python 230 14

  2. calculator_agent_rl calculator_agent_rl Public

    Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.

    Python 45 3

  3. tbench-agentic-data-pipeline tbench-agentic-data-pipeline Public

    Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training

    Python 13 2

  4. agentic_environments agentic_environments Public

    A mini-framework to build agentic LLM environments.

    Python 2