Skip to content

Commit 3e0efc0

Browse files
pcmoritzpcmoritz
andauthored
Very simple local coding sandbox example (#80)
**This PR adds a simple sandbox example using the Guix package manager** This change introduces a basic sandbox environment powered by Guix (https://guix.gnu.org/), a package manager that comes with thousands of pre-built software packages that you can easily customize and compile. Note that the integration with SkyRL-Train will land in a future PR. **How it works** Guix makes it easy to set up development environments with all the right dependencies. For example, if you want to work on a Linux package like Inkscape, you can run `guix shell --development inkscape` and it automatically gives you a shell with everything needed to build that software. You can also create custom environments by running `guix shell -m manifest.scm`, where the `manifest.scm` file lists exactly which packages and dependencies you want. Think of this like Python's `uv run` command, but instead of just handling Python projects, it can set up complete Linux development environments for any type of software: `manifest.scm` (in Guix) = `pyproject.toml` (in Python) `guix shell` (for any Linux software) = `uv run` (for Python only) This gives developers a consistent way to create isolated, reproducible development environments for any kind of project. **How to use it** The docker image / environment you are working in needs to have guix installed, you can e.g. install it by running ```shell wget https://guix.gnu.org/install.sh chmod +x install.sh sudo ./install.sh ``` For the following commands, we assume that you are in the `sky-train/examples/simplecoder` directory. It is worth running the following command once to initialize the packages (this ensures the following commands won't time out): ```shell guix shell -m manifest.scm -- sh ``` To run the example, first clone the test repository ```shell git clone https://github.com/SWE-agent/test-repo ``` and then you can run the "agent" using ```shell python simplecoder.py ``` --------- Co-authored-by: pcmoritz <[email protected]>
1 parent 6aa073e commit 3e0efc0

File tree

3 files changed

+354
-0
lines changed

3 files changed

+354
-0
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
(use-modules (guix packages)
2+
(guix gexp)
3+
(gnu packages bash)
4+
(gnu packages version-control)
5+
(gnu packages virtualization)
6+
(gnu packages certs)
7+
(gnu packages check)
8+
(gnu packages python))
9+
10+
(packages->manifest
11+
(list coreutils
12+
bubblewrap
13+
bash
14+
grep
15+
sed
16+
findutils
17+
git
18+
python
19+
python-pytest
20+
nss-certs))
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# SimpleCoder
2+
3+
This is a simple coding environment that allows solving SWE bench like coding challenges.
4+
5+
It uses a basic sandbox environment powered by Guix
6+
(https://guix.gnu.org/), a package manager that comes with thousands
7+
of pre-built software packages that you can easily customize and
8+
compile.
9+
10+
## How it works
11+
12+
Guix makes it easy to set up development environments with all the right dependencies. For example, if you want to work on a Linux package like Inkscape, you can run `guix shell --development inkscape` and it automatically gives you a shell with everything needed to build that software.
13+
14+
You can also create custom environments by running `guix shell -m manifest.scm`, where the `manifest.scm` file lists exactly which packages and dependencies you want. Think of this like Python's `uv run` command, but instead of just handling Python projects, it can set up complete Linux development environments for any type of software.
15+
16+
## How to use it
17+
18+
The docker image / environment you are working in needs to have guix installed, you can
19+
e.g. install it by running
20+
21+
```shell
22+
wget https://guix.gnu.org/install.sh
23+
chmod +x install.sh
24+
sudo ./install.sh
25+
```
26+
27+
For the following commands, we assume that you are in the `sky-train/examples/simplecoder` directory. It is worth running the following command once to initialize the packages (this ensures the following commands won't time out):
28+
29+
```shell
30+
guix shell -m manifest.scm -- sh
31+
```
32+
33+
To run the example, first clone the test repository
34+
```shell
35+
git clone https://github.com/SWE-agent/test-repo
36+
```
37+
38+
and then run
39+
```shell
40+
python simplecoder.py
41+
```
42+
43+
*Disclaimer*: The integration with SkyRL Train is still ongoing.
44+
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
from abc import ABC, abstractmethod
2+
from dataclasses import dataclass
3+
import json
4+
import subprocess
5+
import tempfile
6+
from typing import Optional, Dict, Any, List
7+
8+
from openai import OpenAI
9+
10+
11+
@dataclass
12+
class ExecutionResult:
13+
"""Result of a command execution."""
14+
15+
output: str
16+
error: Optional[str] = None
17+
return_code: int = 0
18+
19+
20+
class Executor(ABC):
21+
22+
@abstractmethod
23+
def execute(self, command: str, timeout: int = 30) -> ExecutionResult:
24+
"""Execute a command and return the result.
25+
26+
Args:
27+
command: The command to execute
28+
timeout: Timeout in seconds (default: 30)
29+
30+
Returns:
31+
ExecutionResult containing the command output and status
32+
"""
33+
pass
34+
35+
36+
class GuixExecutor(Executor):
37+
"""Guix-based executor that runs commands in a sandboxed Guix shell environment."""
38+
39+
def __init__(self, working_dir: str, manifest_file: Optional[str] = None):
40+
"""Initialize the Guix executor.
41+
42+
Args:
43+
working_dir: Working directory of the execution
44+
manifest_file: Path to a Guix manifest file specifying packages
45+
"""
46+
self.working_dir = working_dir
47+
self.manifest_file = manifest_file
48+
self.current_env = ""
49+
50+
def execute(
51+
self,
52+
command: str,
53+
timeout: int = 30,
54+
) -> ExecutionResult:
55+
"""Execute a command in a sandboxed Guix shell."""
56+
57+
guix_cmd = ["guix", "shell"]
58+
59+
if self.manifest_file:
60+
guix_cmd.extend(["-m", self.manifest_file])
61+
62+
with tempfile.NamedTemporaryFile(mode="w", suffix="_env.sh", delete=False) as env_file:
63+
env_file.write(self.current_env)
64+
65+
with tempfile.NamedTemporaryFile(mode="w", suffix="_script.sh", delete=False) as script_file:
66+
script_file.write(f"source {env_file.name}\n")
67+
script_file.write("cd $PWD\n")
68+
script_file.write(f"{command}\n")
69+
script_file.write(f"export -p > {env_file.name}\n")
70+
71+
# Add a very lightweight sandbox using https://github.com/containers/bubblewrap.
72+
# Originally we were using the guix shell --container sandbox for this, but there
73+
# are environments where that does not work (e.g. mounting the /proc filesystem
74+
# can fail in a GPU container). We might want to revisit this.
75+
guix_cmd.extend(
76+
# fmt: off
77+
[
78+
"--",
79+
"bwrap",
80+
"--ro-bind", "/bin", "/bin",
81+
"--ro-bind", "/gnu", "/gnu",
82+
"--proc", "/proc",
83+
"--dev", "/dev",
84+
"--tmpfs", "/tmp",
85+
"--new-session",
86+
"--ro-bind", script_file.name, script_file.name,
87+
"--bind", env_file.name, env_file.name,
88+
"--ro-bind", "/etc/resolv.conf", "/etc/resolv.conf",
89+
"--bind", self.working_dir, "/home/skyrl",
90+
"--setenv", "HOME", "/home/skyrl/",
91+
"sh",
92+
script_file.name,
93+
]
94+
# fmt: on
95+
)
96+
97+
try:
98+
result = subprocess.run(
99+
guix_cmd,
100+
shell=False,
101+
capture_output=True,
102+
text=True,
103+
timeout=timeout,
104+
cwd=self.working_dir,
105+
)
106+
except Exception as e:
107+
return ExecutionResult(
108+
output="",
109+
error=f"Execution failed: {str(e)}",
110+
return_code=-1,
111+
)
112+
113+
with open(env_file.name, "r") as f:
114+
self.current_env = f.read()
115+
116+
return ExecutionResult(
117+
output=result.stdout or "",
118+
error=result.stderr if result.stderr else None,
119+
return_code=result.returncode,
120+
)
121+
122+
123+
@dataclass
124+
class ToolResult:
125+
"""Result from executing a tool"""
126+
127+
success: bool
128+
output: str
129+
error: Optional[str] = None
130+
131+
132+
class Tool(ABC):
133+
"""Base class for all tools"""
134+
135+
@abstractmethod
136+
def name(self) -> str:
137+
pass
138+
139+
@abstractmethod
140+
def description(self) -> str:
141+
pass
142+
143+
@abstractmethod
144+
def parameters(self) -> Dict[str, Any]:
145+
pass
146+
147+
@abstractmethod
148+
def execute(self, **kwargs) -> ToolResult:
149+
pass
150+
151+
152+
class ShellCommandTool(Tool):
153+
"""Tool for executing shell commands"""
154+
155+
def __init__(self, executor: Executor):
156+
"""Initialize the shell command tool with an executor.
157+
158+
Args:
159+
executor: The executor to use for running commands.
160+
"""
161+
self.executor = executor
162+
163+
def name(self) -> str:
164+
return "execute_bash"
165+
166+
def description(self) -> str:
167+
return "Execute a shell command and return the output"
168+
169+
def parameters(self) -> Dict[str, Any]:
170+
return {
171+
"type": "object",
172+
"properties": {"command": {"type": "string", "description": "The shell command to execute"}},
173+
"required": ["command"],
174+
}
175+
176+
def execute(self, command: str, timeout: int = 30) -> ToolResult:
177+
"""Execute a shell command using the configured executor."""
178+
execution_result = self.executor.execute(command, timeout=timeout)
179+
180+
return ToolResult(
181+
success=execution_result.return_code == 0, output=execution_result.output, error=execution_result.error
182+
)
183+
184+
185+
class SimpleCoder:
186+
187+
def __init__(self, api_key: str, model: str, executor: Executor):
188+
self.client = OpenAI(api_key=api_key)
189+
self.model = model
190+
self.executor = executor
191+
self.tools = {
192+
tool.name(): tool
193+
for tool in [
194+
ShellCommandTool(executor=self.executor),
195+
]
196+
}
197+
self.conversation_history = []
198+
199+
def _get_tool_definitions(self) -> List[Dict[str, Any]]:
200+
"""Get OpenAI function definitions for all tools"""
201+
return [
202+
{
203+
"type": "function",
204+
"function": {"name": tool.name(), "description": tool.description(), "parameters": tool.parameters()},
205+
}
206+
for tool in self.tools.values()
207+
]
208+
209+
def _execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> ToolResult:
210+
"""Execute a tool with given arguments"""
211+
if tool_name not in self.tools:
212+
return ToolResult(success=False, output="", error=f"Unknown tool: {tool_name}")
213+
214+
tool = self.tools[tool_name]
215+
return tool.execute(**arguments)
216+
217+
def run(self, task: str, max_iterations: int = 30):
218+
219+
self.conversation_history = [
220+
{
221+
"role": "system",
222+
"content": """You are a Software Engineering Agent. You can:
223+
1. Execute shell commands using execute_shell
224+
2. Read, write, or append to files using edit_file
225+
226+
Break down complex tasks into steps and use the appropriate tools to complete them.
227+
Always check the results of your actions and adapt your approach if needed.""",
228+
},
229+
{"role": "user", "content": task},
230+
]
231+
232+
for i in range(max_iterations):
233+
response = self.client.chat.completions.create(
234+
model=self.model,
235+
messages=self.conversation_history,
236+
tools=self._get_tool_definitions(),
237+
tool_choice="auto",
238+
)
239+
assistant_message = response.choices[0].message
240+
self.conversation_history.append(assistant_message.model_dump())
241+
242+
# Check if the assistant wants to use tools
243+
if assistant_message.tool_calls:
244+
# Execute each tool call
245+
for tool_call in assistant_message.tool_calls:
246+
function_name = tool_call.function.name
247+
arguments = json.loads(tool_call.function.arguments)
248+
249+
print(f"\n🔧 Executing {function_name} with args: {arguments}")
250+
251+
# Execute the tool
252+
result = self._execute_tool(function_name, arguments)
253+
254+
# Add tool result to conversation
255+
tool_message = {
256+
"role": "tool",
257+
"tool_call_id": tool_call.id,
258+
"content": json.dumps(
259+
{"success": result.success, "output": result.output, "error": result.error}
260+
),
261+
}
262+
self.conversation_history.append(tool_message)
263+
264+
print(f"✅ Result: {result.output}..." if result.success else f"❌ Error: {result.error}")
265+
else:
266+
print(f"\n🤖 Agent: {assistant_message.content}")
267+
return
268+
269+
270+
if __name__ == "__main__":
271+
import os
272+
import simplecoder
273+
274+
manifest = os.path.abspath("manifest.scm")
275+
working_dir = os.path.abspath("test-repo")
276+
executor = simplecoder.GuixExecutor(working_dir, manifest)
277+
278+
coder = simplecoder.SimpleCoder(os.environ["OPENAI_API_KEY"], "o4-mini", executor)
279+
task = """
280+
I'm running missing_colon.py as follows:
281+
282+
division(23, 0)
283+
but I get the following error:
284+
285+
File "/Users/fuchur/Documents/24/git_sync/swe-agent-test-repo/tests/./missing_colon.py", line 4
286+
def division(a: float, b: float) -> float
287+
^
288+
SyntaxError: invalid syntax
289+
"""
290+
coder.run(task)

0 commit comments

Comments
 (0)