Proposal: Minimal Python SDK

# The Problem

There are many different ways to run OpenHands today:

- The UI
- The CLI
- Headless mode
- Python code (sorta)
- The evaluation pipeline

Worse, each of these has its own way of configuring OpenHands

- config.toml (for headless, CLI, and some of the UI)
- Settings page (for UI)
- Python args (for python)
- Flags (for CLI)
- Environment variables (for all of the above)

Even worse, all of our specific implementations for things like runtimes and web browsing are tightly coupled to the existing project, so our dependency tree is out of control.

# The Solution

We need a single, deterministic, lightweight way of running OpenHands. Something that doesn’t pull from global state (like env vars and external files). Something you can install in 2 minutes. Something that Does What It Says.

Claude Code is a good example here—it’s a tightly packaged, self-contained agent, which can be deployed and run in a wide variety of contexts.

To fix this, we’ll publish an official Python SDK—a set of APIs for working with OpenHands primitives: Agents, Runtimes, LLMs, Conversations, etc. Different clients can then manage configuration and state as they please, pulling in heavy-weight things like docker as necessary. 

This will not only simplify our own development—it will empower end-users to build amazing new things.

# Minimum Requirements

- Lightweight python package
    - Say, < 1GB of dependencies
    - No docker dependency
    - No browser dependency
- No reliance on global state
    - no config files
    - no environment variables
- No `async`
    - We’ll put each conversation in its own thread
    - Callbacks will be made in threads as well
- MCP-first
    - all tool calls are driven in MCP format
    - we can add stronger types (e.g. CmdOutputObservation) after-the-fact
- Runtime-agnostic
    - it takes in a pre-canned set of tools
    - doesn’t try to edit files or run commands itself—everything goes through tools
- Accepts the following inputs:
    - LLM configuration
    - microagents (as strings? as a directory?)
    - system prompt
    - extensions to default system prompt
    - list of tools
    - MCP server config
        - gets converted into tools
- Manages LLM interactions
- Manages the conversation state and control loop
    - Most of what’s in agent_controller.py should get pulled in here
- Manages all features related to agent behavior
    - Condensation
    - Planning
    - Security analyzer (?)

# Example Code

Here’s a hello world example:

```python
from openhands.core import LLM, Agent, Tool, ToolResult, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

def hello_tool(name) -> ToolResult:
  print("Hello, " + name)
  return new ToolResult()

tools = [
  new Tool(
    name="hello",
    description="Says hello",
    callback=hello_tool,
    inputSchema={'type': 'string', 'description': 'name to greet'}
  ),
]

agent = new Agent(llm=llm, tools=tools)

conversation = new Conversation(agent)
conversation.start_thread()
conversation.send_message("Say hello to 'world'")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")
```

Similar example, but with a LocalRuntime:

```python
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus
from openhands.runtime import LocalRuntime

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

runtime = LocalRuntime()
runtime.clone_repo("All-Hands-AI/OpenHands")
runtime.run_command("echo 'do a flip' > /tmp/Plan.md")
microagents = runtime.load_microagents()

def run_lint():
  return runtime.run_command("npm run lint:fix")

tools = runtime.tools + [
  new Tool(
    name="lint",
    description="Lints files and makes automatic changes where possible",
    callback=run_lint,
  ),
]

mcp_tools = MCPClient(load_mcp_config())
tools += mcp_tools

agent = new Agent(llm=llm, tools=tools, microagents=microagents)

def print_event(evt):
  print(evt)

conversation = new Conversation(agent)
conversation.register_callback(print_event)
conversation.start_thread()
conversation.send_message("Follow /tmp/Plan.md")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")
```

With delegation

```python
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

delegate_agent = new Agent(llm=llm, tools=[])
delegate_conversations = []
def create_conversation(msg):
	conversation = new Conversation(delegate_agent)
	conversation.start_thread()
	conversation.send_message("Say hello to 'world'")
	delegate_conversations.push(conversation)

tools = [
  new Tool(
    name="create_conversation",
    description="Create new conversation",
    callback=create_conversation,
    inputSchema={'type': 'string', 'description': 'Create a new OpenHands conversation to do a subtask'}
  ),
]

base_agent = new Agent(llm=llm, tools=tools)
base_conversation = new Conversation(agent)
base_conversation.start_thread()
base_conversation.send_message("Create 3 OpenHands conversations that each flips a coin and report the results")

while conversation.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if conversation.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")
```

With message passing

```python
from openhands.core import LLM, Agent, Tool, Conversation, ConversationStatus

llm = new LLM(
  model="claude-sonnet-4",
  api_key="your_api_key_here",
)

implementer_agent = new Agent(llm=llm, tools=[...])
implementer_convo = new Conversation(implementer_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Build a hello world react app")

def pass_message_to_implementer(msg):
  implementer_convo.send_message(msg)

pass_message_tool = new Tool(
    name="pass_message",
    description="Says hello",
    callback=pass_message_to_implementer,
    inputSchema={'type': 'string', 'description': 'send a message to the implementer'}
)

testing_agent = new Agent(llm=llm, tools=tools + [pass_message_to_implementer_tool])
testing_convo = new Conversation(testing_agent)
implementer_convo.start_thread()
implementer_convo.send_message("Test the react app and tell the implementer about any issues.")

while implementer_convo.state.status == ConversationStatus.RUNNING or testing_convo.state.status == ConversationStatus.RUNNING:
  sleep(1)
 
if implementer_convo.state.status == ConversationStatus.ERROR:
  raise Exception("Error!")
else:
  print("Agent finished")
```

# Classes

## LLM

Maps very closely to (exact copy of?) our current LLM class.

Handles completions based on configured LLM settings.

We will keep LLMRegistry in OpenHands Server and just pass one of the LLMs into the Agent.

**Methods:**

- `completion()`

## Agent

Maps very closely to (exact copy of?) the current CodeActAgent.

Contains all prompt text, and manages prompt extensions/customizations (e.g. microagents, custom system prompt)

Manages the conversion of ConversationState → Prompt → ToolCall

**Methods:**

- `step()`

## Conversation

This maps pretty closely to the current AgentController class.

Probably removes delegation support.

The loop should be more iterative than event-driven. I.e. we have a `while True:` instead of calling `step()` inside of `on_event()` 

At each step, calls `agent.step()`, then executes the corresponding tool call.

**Methods:**

- `start_thread()` starts a new thread which will drive the agent loop forward
- `pause()` prevents the agent from stepping forward further (but keeps thread alive)
- `resume()`  allows the agent to step forward again
- `close()` kills the thread
- `get_state()`
- `get_history()`
- `register_event_callback()`
    - callbacks are in a new thread
    - events include:
        - ToolCall (fka Action)
        - ToolResult (fka Observation)

## ConversationState

Exact copy of current `State` class.

Fully serializable and deserializable. Can rehydrate an entire Conversation from this data (possibly with a different agent/LLM/toolset/etc)

## Tool

An abstraction that allows any tool to be plumbed into an agent. Tools are defined in MCP format with these attributes:

- `name`
- `description`
- `inputSchema` (JSON Schema)
- `callback`

The callback is a synchronous python function which must return a ToolResult

## ToolCall

ToolCall represents a particular instance of using a tool. It corresponds most closely to the generic `Action` class in OpenHands, or possibly `MCPAction` 

We’ll probably need to pull in the existing Action class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.

## ToolResult

ToolResult represents the output of a ToolCall. It corresponds most closely to the generic `Observation` class in OpenHands, or possibly `MCPObservation` 

We’ll probably need to pull in the existing Observation class and subclasses into this package for compatibility w/ e.g. existing Condenser logic.

ToolResults use the MCP output format, with an additional `meta` field for extra data.

Example:

```python
{
  "content": [
    {
      "type": "text",
      "text": "Hello world"
    }
  ],
  "meta": {
    "observationType": "CmdOutputObservation",
    "exitCode": 0
  },
  "isError": false
}
```

# Shimming into OpenHands

All of this is for nought if we don’t actually start using it! But the transition will take some work.

## Runtimes

Runtimes themselves stay exactly as they are. We’ll have a new method on [`base.py`](http://base.py) called `get_tools()` which returns a list of `Tool` which wrap methods like `run_command` and `read_file` 

## Actions and Observations

We’ll need a shim to convert Actions to ToolCalls and back.

We’ll need a shim to convert Observations to ToolResults and back.

## EventStream

The EventStream goes away in the new world. Instead, the agent makes ToolCalls directly, and gets the ToolResults synchronously.

Clients can still snoop on the events by using `register_event_callback()` on the Conversation object.

We’ll need to continue serializing and saving the events to FileStore as the conversation progresses. But we may decide to break the old format in favor of the new ToolCall/ToolResult serialization.

## Agent Loop

This will be the hardest part.

We can start with the CLI. We can probably rewrite a lot of the ugly logic we have for managing the event loop.

On the server, we’ll probably shim things in at the AgentSession level. This is where we create a Runtime and an AgentController; instead we’ll create a Runtime and a Conversation.

- https://github.com/openai/openai-agents-python?tab=readme-ov-file#quick-start
-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Minimal Python SDK #10577

The Problem

The Solution

Minimum Requirements

Example Code

Classes

LLM

Agent

Conversation

ConversationState

Tool

ToolCall

ToolResult

Shimming into OpenHands

Runtimes

Actions and Observations

EventStream

Agent Loop

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Minimal Python SDK #10577

Description

The Problem

The Solution

Minimum Requirements

Example Code

Classes

LLM

Agent

Conversation

ConversationState

Tool

ToolCall

ToolResult

Shimming into OpenHands

Runtimes

Actions and Observations

EventStream

Agent Loop

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions