You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are an open-source initiative to democratize reinforcement learning (RL) techniques and develop scalable systems for large language models (LLMs) and agents.
39
+
We are a open-source initiative spawning from Berkeley Sky Computing Lab to democratize reinforcement learning (RL) techniques and develop scalable systems for large language models (LLMs) and agents.
<h3>rLLM: Reinforcement Learning for Language Agents</h3>
62
+
<pclass="main-paragraph">
63
+
We release rLLM, an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build their custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.
64
+
</p>
65
+
<divclass="post-meta">Date: July 1, 2025 | Estimated Reading Time: 10 min | Author: Sijun Tan, Michael Luo, Colin Cai </div>
<h3>DeepSWE: Training a Fully Open-sourced, State-of-the-Art Coding Agent by Scaling RL</h3>
69
+
<pclass="main-paragraph">
70
+
We release DeepSWE-Preview, a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. </p>
71
+
<divclass="post-meta">Date: July 1, 2025 | Estimated Reading Time: 20 min | Author: Agentica x Together AI </div>
<h3>DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level</h3>
62
75
<pclass="main-paragraph">
63
76
We release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters.
64
77
</p>
65
-
<divclass="post-meta">Date: April 8, 2025 | Estimated Reading Time: 15 min | Author: Agentica x Together AI </div>
78
+
<divclass="post-meta">Date: July 1, 2025 | Estimated Reading Time: 15 min | Author: Agentica x Together AI </div>
0 commit comments