-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Description
The Problem
There are many different ways to run OpenHands today:
- The UI
- The CLI
- Headless mode
- Python code (sorta)
- The evaluation pipeline
Worse, each of these has its own way of configuring OpenHands
- config.toml (for headless, CLI, and some of the UI)
- Settings page (for UI)
- Python args (for python)
- Flags (for CLI)
- Environment variables (for all of the above)
Even worse, all of our specific implementations for things like runtimes and web browsing are tightly coupled to the existing project, so our dependency tree is out of control.
The Solution
We need a single, deterministic, lightweight way of running OpenHands. Something that doesn’t pull from global state (like env vars and external files). Something you can install in 2 minutes. Something that Does What It Says.
Claude Code is a good example here—it’s a tightly packaged, self-contained agent, which can be deployed and run in a wide variety of contexts.
To fix this, we’ll publish an official Python SDK—a set of APIs for working with OpenHands primitives: Agents, Runtimes, LLMs, Conversations, etc. Different clients can then manage configuration and state as they please, pulling in heavy-weight things like docker as necessary.
This will not only simplify our own development—it will empower end-users to build amazing new things.
Minimum Requirements
- Lightweight python package
- Say, < 1GB of dependencies
- No docker dependency
- No browser dependency
- No reliance on global state
- no config files
- no environment variables
- No
async
- We’ll put each conversation in its own thread
- Callbacks will be made in threads as well
- MCP-first
- all tool calls are driven in MCP format
- we can add stronger types (e.g. CmdOutputObservation) after-the-fact
- Runtime-agnostic
- it takes in a pre-canned set of tools
- doesn’t try to edit files or run commands itself—everything goes through tools
- Accepts the following inputs:
- LLM configuration
- microagents (as strings? as a directory?)
- system prompt
- extensions to default system prompt
- list of tools
- MCP server config
- gets converted into tools
- Manages LLM interactions
- Manages the conversation state and control loop
- Most of what’s in agent_controller.py should get pulled in here
- Manages all features related to agent behavior
- Condensation
- Planning
- Security analyzer (?)
Example Code
Here’s a hello world example:
from openhands.core import LLM, Agent, Tool, ToolResult, Conversation, ConversationStatus
llm = new LLM(
model="claude-sonnet-4",
api_key="your_api_key_here",
)
def hello_tool(name) -> ToolResult:
print("Hello, " + name)
return new ToolResult()
tools = [
new Tool(
name="hello",
description="Says hello",
callback=hello_tool,
inputSchema={'type': 'string', 'description': 'name to greet'}
),
]
agent = new Agent(llm=llm, tools=tools)
conversation = new Conversation(agent)
conversation.start_thread()
conversation.send_message("Say hello to 'world'")
while conversation.state.status == ConversationStatus.RUNNING:
sleep(1)
if conversation.state.status == ConversationStatus.ERROR:
raise Exception("Error!")
else:
print("Agent finished")
Similar example, but with a LocalRuntime:
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus
from openhands.runtime import LocalRuntime
llm = new LLM(
model="claude-sonnet-4",
api_key="your_api_key_here",
)
runtime = LocalRuntime()
runtime.clone_repo("All-Hands-AI/OpenHands")
runtime.run_command("echo 'do a flip' > /tmp/Plan.md")
microagents = runtime.load_microagents()
def run_lint():
return runtime.run_command("npm run lint:fix")
tools = runtime.tools + [
new Tool(
name="lint",
description="Lints files and makes automatic changes where possible",
callback=run_lint,
),
]
mcp_tools = MCPClient(load_mcp_config())
tools += mcp_tools
agent = new Agent(llm=llm, tools=tools, microagents=microagents)
def print_event(evt):
print(evt)
conversation = new Conversation(agent)
conversation.register_callback(print_event)
conversation.start_thread()
conversation.send_message("Follow /tmp/Plan.md")
while conversation.state.status == ConversationStatus.RUNNING:
sleep(1)
if conversation.state.status == ConversationStatus.ERROR:
raise Exception("Error!")
else:
print("Agent finished")
With delegation
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus
llm = new LLM(
model="claude-sonnet-4",
api_key="your_api_key_here",
)
delegate_agent = new Agent(llm=llm, tools=[])
delegate_conversations = []
def create_conversation(msg):
conversation = new Conversation(delegate_agent)
conversation.start_thread()
conversation.send_message("Say hello to 'world'")
delegate_conversations.push(conversation)
tools = [
new Tool(
name="create_conversation",
description="Create new conversation",
callback=create_conversation,
inputSchema={'type': 'string', 'description': 'Create a new OpenHands conversation to do a subtask'}
),
]
base_agent = new Agent(llm=llm, tools=tools)
base_conversation = new Conversation(agent)
base_conversation.start_thread()
base_conversation.send_message("Create 3 OpenHands conversations that each flips a coin and report the results")
while conversation.state.status == ConversationStatus.RUNNING:
sleep(1)
if conversation.state.status == ConversationStatus.ERROR:
raise Exception("Error!")
else:
print("Agent finished")
With message passing
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus
llm = new LLM(
model="claude-sonnet-4",
api_key="your_api_key_here",
)
implementer_agent = new Agent(llm=llm, tools=[...])
implementer_convo = new Conversation(implementer_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Build a hello world react app")
def pass_message_to_implementer(msg):
implementer_convo.send_message(msg)
pass_message_tool = new Tool(
name="pass_message",
description="Says hello",
callback=pass_message_to_implementer,
inputSchema={'type': 'string', 'description': 'send a message to the implementer'}
)
testing_agent = new Agent(llm=llm, tools=tools + [pass_message_to_implementer_tool])
testing_convo = new Conversation(testing_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Test the react app and tell the implementer about any issues.")
while implementer_convo.state.status == ConversationStatus.RUNNING or testing_convo.state.status == ConversationStatus.RUNNING:
sleep(1)
if implementer_convo.state.status == ConversationStatus.ERROR:
raise Exception("Error!")
else:
print("Agent finished")
Classes
LLM
Maps very closely to (exact copy of?) our current LLM class.
Handles completions based on configured LLM settings.
We will keep LLMRegistry in OpenHands Server and just pass one of the LLMs into the Agent.
Methods:
completion()
Agent
Maps very closely to (exact copy of?) the current CodeActAgent.
Contains all prompt text, and manages prompt extensions/customizations (e.g. microagents, custom system prompt)
Manages the conversion of ConversationState → Prompt → ToolCall
Methods:
step()
Conversation
This maps pretty closely to the current AgentController class.
Probably removes delegation support.
The loop should be more iterative than event-driven. I.e. we have a while True:
instead of calling step()
inside of on_event()
At each step, calls agent.step()
, then executes the corresponding tool call.
Methods:
start_thread()
starts a new thread which will drive the agent loop forwardpause()
prevents the agent from stepping forward further (but keeps thread alive)resume()
allows the agent to step forward againclose()
kills the threadget_state()
get_history()
register_event_callback()
- callbacks are in a new thread
- events include:
- ToolCall (fka Action)
- ToolResult (fka Observation)
ConversationState
Exact copy of current State
class.
Fully serializable and deserializable. Can rehydrate an entire Conversation from this data (possibly with a different agent/LLM/toolset/etc)
Tool
An abstraction that allows any tool to be plumbed into an agent. Tools are defined in MCP format with these attributes:
name
description
inputSchema
(JSON Schema)callback
The callback is a synchronous python function which must return a ToolResult
ToolCall
ToolCall represents a particular instance of using a tool. It corresponds most closely to the generic Action
class in OpenHands, or possibly MCPAction
We’ll probably need to pull in the existing Action class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.
ToolResult
ToolResult represents the output of a ToolCall. It corresponds most closely to the generic Observation
class in OpenHands, or possibly MCPObservation
We’ll probably need to pull in the existing Observation class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.
ToolResults use the MCP output format, with an additional meta
field for extra data.
Example:
{
"content": [
{
"type": "text",
"text": "Hello world"
}
],
"meta": {
"observationType": "CmdOutputObservation",
"exitCode": 0
},
"isError": false
}
Shimming into OpenHands
All of this is for nought if we don’t actually start using it! But the transition will take some work.
Runtimes
Runtimes themselves stay exactly as they are. We’ll have a new method on base.py
called get_tools()
which returns a list of Tool
which wrap methods like run_command
and read_file
Actions and Observations
We’ll need a shim to convert Actions to ToolCalls and back.
We’ll need a shim to convert Observations to ToolResults and back.
EventStream
The EventStream goes away in the new world. Instead, the agent makes ToolCalls directly, and gets the ToolResults synchronously.
Clients can still snoop on the events by using register_event_callback()
on the Conversation object.
We’ll need to continue serializing and saving the events to FileStore as the conversation progresses. But we may decide to break the old format in favor of the new ToolCall/ToolResult serialization.
Agent Loop
This will be the hardest part.
We can start with the CLI. We can probably rewrite a lot of the ugly logic we have for managing the event loop.
On the server, we’ll probably shim things in at the AgentSession level. This is where we create a Runtime and an AgentController; instead we’ll create a Runtime and a Conversation.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status