Very simple local coding sandbox example (#80)

pcmoritz · pcmoritz · web-flow · commit 3e0efc020214 · 2025-07-13T19:42:49.000-07:00
**This PR adds a simple sandbox example using the Guix package manager** This change introduces a basic sandbox environment powered by Guix (https://guix.gnu.org/), a package manager that comes with thousands of pre-built software packages that you can easily customize and compile. Note that the integration with SkyRL-Train will land in a future PR. **How it works** Guix makes it easy to set up development environments with all the right dependencies. For example, if you want to work on a Linux package like Inkscape, you can run `guix shell --development inkscape` and it automatically gives you a shell with everything needed to build that software. You can also create custom environments by running `guix shell -m manifest.scm`, where the `manifest.scm` file lists exactly which packages and dependencies you want. Think of this like Python's `uv run` command, but instead of just handling Python projects, it can set up complete Linux development environments for any type of software: `manifest.scm` (in Guix) = `pyproject.toml` (in Python) `guix shell` (for any Linux software) = `uv run` (for Python only) This gives developers a consistent way to create isolated, reproducible development environments for any kind of project. **How to use it** The docker image / environment you are working in needs to have guix installed, you can e.g. install it by running ```shell wget https://guix.gnu.org/install.sh chmod +x install.sh sudo ./install.sh ``` For the following commands, we assume that you are in the `sky-train/examples/simplecoder` directory. It is worth running the following command once to initialize the packages (this ensures the following commands won't time out): ```shell guix shell -m manifest.scm -- sh ``` To run the example, first clone the test repository ```shell git clone https://github.com/SWE-agent/test-repo ``` and then you can run the "agent" using ```shell python simplecoder.py ``` --------- Co-authored-by: pcmoritz <pcmoritz@anyscale.com>
diff --git a/skyrl-train/examples/simplecoder/manifest.scm b/skyrl-train/examples/simplecoder/manifest.scm
@@ -0,0 +1,20 @@
+(use-modules (guix packages)
+	     (guix gexp)
+	     (gnu packages bash)
+	     (gnu packages version-control)
+	     (gnu packages virtualization)
+	     (gnu packages certs)
+	     (gnu packages check)
+	     (gnu packages python))
+
+(packages->manifest 
+ (list coreutils
+       bubblewrap
+       bash
+       grep
+       sed
+       findutils
+       git
+       python
+       python-pytest
+       nss-certs))
diff --git a/skyrl-train/examples/simplecoder/simplecoder.md b/skyrl-train/examples/simplecoder/simplecoder.md
@@ -0,0 +1,44 @@
+# SimpleCoder
+
+This is a simple coding environment that allows solving SWE bench like coding challenges.
+
+It uses a basic sandbox environment powered by Guix
+(https://guix.gnu.org/), a package manager that comes with thousands
+of pre-built software packages that you can easily customize and
+compile.
+
+## How it works
+
+Guix makes it easy to set up development environments with all the right dependencies. For example, if you want to work on a Linux package like Inkscape, you can run `guix shell --development inkscape` and it automatically gives you a shell with everything needed to build that software.
+
+You can also create custom environments by running `guix shell -m manifest.scm`, where the `manifest.scm` file lists exactly which packages and dependencies you want. Think of this like Python's `uv run` command, but instead of just handling Python projects, it can set up complete Linux development environments for any type of software.
+
+## How to use it
+
+The docker image / environment you are working in needs to have guix installed, you can
+e.g. install it by running
+
+```shell
+wget https://guix.gnu.org/install.sh
+chmod +x install.sh
+sudo ./install.sh
+```
+
+For the following commands, we assume that you are in the `sky-train/examples/simplecoder` directory. It is worth running the following command once to initialize the packages (this ensures the following commands won't time out):
+
+```shell
+guix shell -m manifest.scm -- sh
+```
+
+To run the example, first clone the test repository
+```shell
+git clone https://github.com/SWE-agent/test-repo
+```
+
+and then run
+```shell
+python simplecoder.py
+```
+
+*Disclaimer*: The integration with SkyRL Train is still ongoing.
+
diff --git a/skyrl-train/examples/simplecoder/simplecoder.py b/skyrl-train/examples/simplecoder/simplecoder.py
@@ -0,0 +1,290 @@
+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+import json
+import subprocess
+import tempfile
+from typing import Optional, Dict, Any, List
+
+from openai import OpenAI
+
+
+@dataclass
+class ExecutionResult:
+    """Result of a command execution."""
+
+    output: str
+    error: Optional[str] = None
+    return_code: int = 0
+
+
+class Executor(ABC):
+
+    @abstractmethod
+    def execute(self, command: str, timeout: int = 30) -> ExecutionResult:
+        """Execute a command and return the result.
+
+        Args:
+            command: The command to execute
+            timeout: Timeout in seconds (default: 30)
+
+        Returns:
+            ExecutionResult containing the command output and status
+        """
+        pass
+
+
+class GuixExecutor(Executor):
+    """Guix-based executor that runs commands in a sandboxed Guix shell environment."""
+
+    def __init__(self, working_dir: str, manifest_file: Optional[str] = None):
+        """Initialize the Guix executor.
+
+        Args:
+            working_dir: Working directory of the execution
+            manifest_file: Path to a Guix manifest file specifying packages
+        """
+        self.working_dir = working_dir
+        self.manifest_file = manifest_file
+        self.current_env = ""
+
+    def execute(
+        self,
+        command: str,
+        timeout: int = 30,
+    ) -> ExecutionResult:
+        """Execute a command in a sandboxed Guix shell."""
+
+        guix_cmd = ["guix", "shell"]
+
+        if self.manifest_file:
+            guix_cmd.extend(["-m", self.manifest_file])
+
+        with tempfile.NamedTemporaryFile(mode="w", suffix="_env.sh", delete=False) as env_file:
+            env_file.write(self.current_env)
+
+        with tempfile.NamedTemporaryFile(mode="w", suffix="_script.sh", delete=False) as script_file:
+            script_file.write(f"source {env_file.name}\n")
+            script_file.write("cd $PWD\n")
+            script_file.write(f"{command}\n")
+            script_file.write(f"export -p > {env_file.name}\n")
+
+        # Add a very lightweight sandbox using https://github.com/containers/bubblewrap.
+        # Originally we were using the guix shell --container sandbox for this, but there
+        # are environments where that does not work (e.g. mounting the /proc filesystem
+        # can fail in a GPU container). We might want to revisit this.
+        guix_cmd.extend(
+            # fmt: off
+            [
+                "--",
+                "bwrap",
+                "--ro-bind", "/bin", "/bin",
+                "--ro-bind", "/gnu", "/gnu",
+                "--proc", "/proc",
+                "--dev", "/dev",
+                "--tmpfs", "/tmp",
+                "--new-session",
+                "--ro-bind", script_file.name, script_file.name,
+                "--bind", env_file.name, env_file.name,
+                "--ro-bind", "/etc/resolv.conf", "/etc/resolv.conf",
+                "--bind", self.working_dir, "/home/skyrl",
+                "--setenv", "HOME", "/home/skyrl/",
+                "sh",
+                script_file.name,
+            ]
+            # fmt: on
+        )
+
+        try:
+            result = subprocess.run(
+                guix_cmd,
+                shell=False,
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+                cwd=self.working_dir,
+            )
+        except Exception as e:
+            return ExecutionResult(
+                output="",
+                error=f"Execution failed: {str(e)}",
+                return_code=-1,
+            )
+
+        with open(env_file.name, "r") as f:
+            self.current_env = f.read()
+
+        return ExecutionResult(
+            output=result.stdout or "",
+            error=result.stderr if result.stderr else None,
+            return_code=result.returncode,
+        )
+
+
+@dataclass
+class ToolResult:
+    """Result from executing a tool"""
+
+    success: bool
+    output: str
+    error: Optional[str] = None
+
+
+class Tool(ABC):
+    """Base class for all tools"""
+
+    @abstractmethod
+    def name(self) -> str:
+        pass
+
+    @abstractmethod
+    def description(self) -> str:
+        pass
+
+    @abstractmethod
+    def parameters(self) -> Dict[str, Any]:
+        pass
+
+    @abstractmethod
+    def execute(self, **kwargs) -> ToolResult:
+        pass
+
+
+class ShellCommandTool(Tool):
+    """Tool for executing shell commands"""
+
+    def __init__(self, executor: Executor):
+        """Initialize the shell command tool with an executor.
+
+        Args:
+            executor: The executor to use for running commands.
+        """
+        self.executor = executor
+
+    def name(self) -> str:
+        return "execute_bash"
+
+    def description(self) -> str:
+        return "Execute a shell command and return the output"
+
+    def parameters(self) -> Dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {"command": {"type": "string", "description": "The shell command to execute"}},
+            "required": ["command"],
+        }
+
+    def execute(self, command: str, timeout: int = 30) -> ToolResult:
+        """Execute a shell command using the configured executor."""
+        execution_result = self.executor.execute(command, timeout=timeout)
+
+        return ToolResult(
+            success=execution_result.return_code == 0, output=execution_result.output, error=execution_result.error
+        )
+
+
+class SimpleCoder:
+
+    def __init__(self, api_key: str, model: str, executor: Executor):
+        self.client = OpenAI(api_key=api_key)
+        self.model = model
+        self.executor = executor
+        self.tools = {
+            tool.name(): tool
+            for tool in [
+                ShellCommandTool(executor=self.executor),
+            ]
+        }
+        self.conversation_history = []
+
+    def _get_tool_definitions(self) -> List[Dict[str, Any]]:
+        """Get OpenAI function definitions for all tools"""
+        return [
+            {
+                "type": "function",
+                "function": {"name": tool.name(), "description": tool.description(), "parameters": tool.parameters()},
+            }
+            for tool in self.tools.values()
+        ]
+
+    def _execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> ToolResult:
+        """Execute a tool with given arguments"""
+        if tool_name not in self.tools:
+            return ToolResult(success=False, output="", error=f"Unknown tool: {tool_name}")
+
+        tool = self.tools[tool_name]
+        return tool.execute(**arguments)
+
+    def run(self, task: str, max_iterations: int = 30):
+
+        self.conversation_history = [
+            {
+                "role": "system",
+                "content": """You are a Software Engineering Agent. You can:
+1. Execute shell commands using execute_shell
+2. Read, write, or append to files using edit_file
+
+Break down complex tasks into steps and use the appropriate tools to complete them.
+Always check the results of your actions and adapt your approach if needed.""",
+            },
+            {"role": "user", "content": task},
+        ]
+
+        for i in range(max_iterations):
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=self.conversation_history,
+                tools=self._get_tool_definitions(),
+                tool_choice="auto",
+            )
+            assistant_message = response.choices[0].message
+            self.conversation_history.append(assistant_message.model_dump())
+
+            # Check if the assistant wants to use tools
+            if assistant_message.tool_calls:
+                # Execute each tool call
+                for tool_call in assistant_message.tool_calls:
+                    function_name = tool_call.function.name
+                    arguments = json.loads(tool_call.function.arguments)
+
+                    print(f"\n🔧 Executing {function_name} with args: {arguments}")
+
+                    # Execute the tool
+                    result = self._execute_tool(function_name, arguments)
+
+                    # Add tool result to conversation
+                    tool_message = {
+                        "role": "tool",
+                        "tool_call_id": tool_call.id,
+                        "content": json.dumps(
+                            {"success": result.success, "output": result.output, "error": result.error}
+                        ),
+                    }
+                    self.conversation_history.append(tool_message)
+
+                    print(f"✅ Result: {result.output}..." if result.success else f"❌ Error: {result.error}")
+            else:
+                print(f"\n🤖 Agent: {assistant_message.content}")
+                return
+
+
+if __name__ == "__main__":
+    import os
+    import simplecoder
+
+    manifest = os.path.abspath("manifest.scm")
+    working_dir = os.path.abspath("test-repo")
+    executor = simplecoder.GuixExecutor(working_dir, manifest)
+
+    coder = simplecoder.SimpleCoder(os.environ["OPENAI_API_KEY"], "o4-mini", executor)
+    task = """
+    I'm running missing_colon.py as follows:
+
+division(23, 0)
+but I get the following error:
+
+  File "/Users/fuchur/Documents/24/git_sync/swe-agent-test-repo/tests/./missing_colon.py", line 4
+    def division(a: float, b: float) -> float
+                                             ^
+SyntaxError: invalid syntax
+"""
+    coder.run(task)