Skip to content

Proposal: Minimal Python SDK #10577

@rbren

Description

@rbren

The Problem

There are many different ways to run OpenHands today:

  • The UI
  • The CLI
  • Headless mode
  • Python code (sorta)
  • The evaluation pipeline

Worse, each of these has its own way of configuring OpenHands

  • config.toml (for headless, CLI, and some of the UI)
  • Settings page (for UI)
  • Python args (for python)
  • Flags (for CLI)
  • Environment variables (for all of the above)

Even worse, all of our specific implementations for things like runtimes and web browsing are tightly coupled to the existing project, so our dependency tree is out of control.

The Solution

We need a single, deterministic, lightweight way of running OpenHands. Something that doesn’t pull from global state (like env vars and external files). Something you can install in 2 minutes. Something that Does What It Says.

Claude Code is a good example here—it’s a tightly packaged, self-contained agent, which can be deployed and run in a wide variety of contexts.

To fix this, we’ll publish an official Python SDK—a set of APIs for working with OpenHands primitives: Agents, Runtimes, LLMs, Conversations, etc. Different clients can then manage configuration and state as they please, pulling in heavy-weight things like docker as necessary.

This will not only simplify our own development—it will empower end-users to build amazing new things.

Minimum Requirements

  • Lightweight python package
    • Say, < 1GB of dependencies
    • No docker dependency
    • No browser dependency
  • No reliance on global state
    • no config files
    • no environment variables
  • No async
    • We’ll put each conversation in its own thread
    • Callbacks will be made in threads as well
  • MCP-first
    • all tool calls are driven in MCP format
    • we can add stronger types (e.g. CmdOutputObservation) after-the-fact
  • Runtime-agnostic
    • it takes in a pre-canned set of tools
    • doesn’t try to edit files or run commands itself—everything goes through tools
  • Accepts the following inputs:
    • LLM configuration
    • microagents (as strings? as a directory?)
    • system prompt
    • extensions to default system prompt
    • list of tools
    • MCP server config
      • gets converted into tools
  • Manages LLM interactions
  • Manages the conversation state and control loop
    • Most of what’s in agent_controller.py should get pulled in here
  • Manages all features related to agent behavior
    • Condensation
    • Planning
    • Security analyzer (?)

Example Code

Here’s a hello world example:

from openhands.core import LLM, Agent, Tool, ToolResult, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

def hello_tool(name) -> ToolResult:
  print("Hello, " + name)
  return new ToolResult()

tools = [
  new Tool(
    name="hello",
    description="Says hello",
    callback=hello_tool,
    inputSchema={'type': 'string', 'description': 'name to greet'}
  ),
]

agent = new Agent(llm=llm, tools=tools)

conversation = new Conversation(agent)
conversation.start_thread()
conversation.send_message("Say hello to 'world'")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")

Similar example, but with a LocalRuntime:

from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus
from openhands.runtime import LocalRuntime

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

runtime = LocalRuntime()
runtime.clone_repo("All-Hands-AI/OpenHands")
runtime.run_command("echo 'do a flip' > /tmp/Plan.md")
microagents = runtime.load_microagents()

def run_lint():
  return runtime.run_command("npm run lint:fix")

tools = runtime.tools + [
  new Tool(
    name="lint",
    description="Lints files and makes automatic changes where possible",
    callback=run_lint,
  ),
]

mcp_tools = MCPClient(load_mcp_config())
tools += mcp_tools

agent = new Agent(llm=llm, tools=tools, microagents=microagents)

def print_event(evt):
  print(evt)

conversation = new Conversation(agent)
conversation.register_callback(print_event)
conversation.start_thread()
conversation.send_message("Follow /tmp/Plan.md")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")

With delegation

from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

delegate_agent = new Agent(llm=llm, tools=[])
delegate_conversations = []
def create_conversation(msg):
	conversation = new Conversation(delegate_agent)
	conversation.start_thread()
	conversation.send_message("Say hello to 'world'")
	delegate_conversations.push(conversation)

tools = [
  new Tool(
    name="create_conversation",
    description="Create new conversation",
    callback=create_conversation,
    inputSchema={'type': 'string', 'description': 'Create a new OpenHands conversation to do a subtask'}
  ),
]

base_agent = new Agent(llm=llm, tools=tools)
base_conversation = new Conversation(agent)
base_conversation.start_thread()
base_conversation.send_message("Create 3 OpenHands conversations that each flips a coin and report the results")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")

With message passing

from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

implementer_agent = new Agent(llm=llm, tools=[...])
implementer_convo = new Conversation(implementer_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Build a hello world react app")

def pass_message_to_implementer(msg):
  implementer_convo.send_message(msg)

pass_message_tool = new Tool(
    name="pass_message",
    description="Says hello",
    callback=pass_message_to_implementer,
    inputSchema={'type': 'string', 'description': 'send a message to the implementer'}
)

testing_agent = new Agent(llm=llm, tools=tools + [pass_message_to_implementer_tool])
testing_convo = new Conversation(testing_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Test the react app and tell the implementer about any issues.")

while implementer_convo.state.status == ConversationStatus.RUNNING or testing_convo.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if implementer_convo.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")

Classes

LLM

Maps very closely to (exact copy of?) our current LLM class.

Handles completions based on configured LLM settings.

We will keep LLMRegistry in OpenHands Server and just pass one of the LLMs into the Agent.

Methods:

  • completion()

Agent

Maps very closely to (exact copy of?) the current CodeActAgent.

Contains all prompt text, and manages prompt extensions/customizations (e.g. microagents, custom system prompt)

Manages the conversion of ConversationState → Prompt → ToolCall

Methods:

  • step()

Conversation

This maps pretty closely to the current AgentController class.

Probably removes delegation support.

The loop should be more iterative than event-driven. I.e. we have a while True: instead of calling step() inside of on_event()

At each step, calls agent.step(), then executes the corresponding tool call.

Methods:

  • start_thread() starts a new thread which will drive the agent loop forward
  • pause() prevents the agent from stepping forward further (but keeps thread alive)
  • resume() allows the agent to step forward again
  • close() kills the thread
  • get_state()
  • get_history()
  • register_event_callback()
    • callbacks are in a new thread
    • events include:
      • ToolCall (fka Action)
      • ToolResult (fka Observation)

ConversationState

Exact copy of current State class.

Fully serializable and deserializable. Can rehydrate an entire Conversation from this data (possibly with a different agent/LLM/toolset/etc)

Tool

An abstraction that allows any tool to be plumbed into an agent. Tools are defined in MCP format with these attributes:

  • name
  • description
  • inputSchema (JSON Schema)
  • callback

The callback is a synchronous python function which must return a ToolResult

ToolCall

ToolCall represents a particular instance of using a tool. It corresponds most closely to the generic Action class in OpenHands, or possibly MCPAction

We’ll probably need to pull in the existing Action class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.

ToolResult

ToolResult represents the output of a ToolCall. It corresponds most closely to the generic Observation class in OpenHands, or possibly MCPObservation

We’ll probably need to pull in the existing Observation class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.

ToolResults use the MCP output format, with an additional meta field for extra data.

Example:

{
  "content": [
    {
      "type": "text",
      "text": "Hello world"
    }
  ],
  "meta": {
    "observationType": "CmdOutputObservation",
    "exitCode": 0
  },
  "isError": false
}

Shimming into OpenHands

All of this is for nought if we don’t actually start using it! But the transition will take some work.

Runtimes

Runtimes themselves stay exactly as they are. We’ll have a new method on base.py called get_tools() which returns a list of Tool which wrap methods like run_command and read_file

Actions and Observations

We’ll need a shim to convert Actions to ToolCalls and back.

We’ll need a shim to convert Observations to ToolResults and back.

EventStream

The EventStream goes away in the new world. Instead, the agent makes ToolCalls directly, and gets the ToolResults synchronously.

Clients can still snoop on the events by using register_event_callback() on the Conversation object.

We’ll need to continue serializing and saving the events to FileStore as the conversation progresses. But we may decide to break the old format in favor of the new ToolCall/ToolResult serialization.

Agent Loop

This will be the hardest part.

We can start with the CLI. We can probably rewrite a lot of the ugly logic we have for managing the event loop.

On the server, we’ll probably shim things in at the AgentSession level. This is where we create a Runtime and an AgentController; instead we’ll create a Runtime and a Conversation.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions