Skip to content

feat: Create a robust Sandbox interface that can be backed by Daytona or Modal #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 10, 2025

Conversation

bradhilton
Copy link
Collaborator

No description provided.

bradhilton and others added 23 commits July 8, 2025 20:54
…c method to use /bin/sh and return exit code
…and update sandbox eval method to raise NotImplementedError
…_tests methods in Sandbox class, and update test cases to reflect new logic
- Implement apply_patch method using base64 encoding to handle special characters and large patches
- Implement run_tests method with comprehensive test execution logic
- Handle both regular pytest tests and doctest-style paths that appear in some instances
- Support automatic dependency installation with retry logic
- Use chunked file writing to avoid command length limits
- Parse test results including regular failures, errors, collection errors, and doctest failures
- Return correct (failed_count, passed_count) tuple matching expected test behavior

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Update comments for clarity on pytest installation and script functionality
- Refactor test categorization to better separate regular tests from doctest paths
- Introduce debug statements to log the number of tests and their types
- Modify exit code handling to ensure accurate reporting of test results
- Clean up code by removing obsolete doctest failure handling
…hods

- Clean up whitespace and formatting for better readability
- Refactor chunked writing logic for patch and test files
- Enhance comments for clarity on functionality and logic flow
- Consolidate test execution logic to streamline pytest invocation
- Ensure consistent handling of exit codes and output parsing
- Add json import for enhanced functionality
- Modify test categorization to ensure individual doctests are run correctly
- Always include --doctest-modules flag for handling doctests
- Simplify test argument assembly for pytest execution
- Update CLAUDE.md to clarify test investigation steps and implementation evaluation criteria
- Improve error handling in new_sandbox function for Daytona provider to manage event loop issues
- Expand test_run_tests to increase instance index range and assert test results for better validation

This commit aims to streamline the testing process and ensure robust sandbox functionality.
…r management

- Introduce safe_exec method to streamline command execution with custom error messages
- Refactor apply_patch method to utilize write_file for patch handling
- Simplify test writing logic by removing chunked writing and directly using write_file
- Update test_run_tests to reduce instance index range for better test coverage

This commit aims to improve the robustness and clarity of the sandbox functionality.
- Standardize error messages in safe_exec method for consistency
- Streamline test list writing by directly joining tests instead of using chunked writing
- Enhance clarity in apply_patch method with improved command formatting

This commit aims to improve the maintainability and readability of the Sandbox class methods.
- Replace safe_exec and write_file methods with direct command execution for improved clarity and performance
- Streamline apply_patch and run_tests methods to enhance error handling and reduce complexity
- Update test writing logic to utilize heredoc for better handling of special characters

This commit aims to enhance the maintainability and efficiency of the Sandbox class methods.
- Remove unnecessary blank lines to enhance code clarity
- Adjust formatting for better consistency in the apply_patch and run_tests methods
- Update regex handling for missing modules to improve readability

This commit aims to streamline the code structure and enhance maintainability of the Sandbox class.
- Implement chunked writing of test lists to avoid command length limits
- Create a dedicated Python script for running pytest with enhanced handling of special characters
- Streamline test execution logic to improve error management and maintainability

This commit aims to enhance the robustness and efficiency of the test execution process within the Sandbox class.
- Add sampling to instance DataFrame for improved randomness in tests
- Introduce pytest-timeout dependency to manage test execution duration
- Update new_sandbox function to accept a timeout parameter for sandbox creation
- Improve error handling in test results parsing to account for collection errors

This commit aims to enhance the robustness and flexibility of the sandbox testing process.
- Add functionality to install project dependencies if setup.py or pyproject.toml exists
- Increase maximum retries for test execution from 5 to 20 to improve reliability
- Implement handling for special cases in package installation to address discrepancies between import names and package names
- Introduce dynamic timeout calculation for test execution based on the number of tests to optimize performance

This commit aims to improve the sandbox's installation process and enhance the robustness of test execution.
- Introduce the `edit` method to integrate the `edit_anthropic` tool for file operations such as create, view, string replacement, insertion, and undo functionality.
- Enhance error handling with specific exit codes and corresponding RuntimeError messages for better debugging.
- Add comprehensive tests to validate the new functionality, covering various scenarios including file creation, viewing, editing, and error cases.

This commit aims to expand the capabilities of the Sandbox class, allowing for more dynamic file manipulation within the sandbox environment.
…Sandbox.edit() method. This file included details on method signature, command construction, error handling, and testing considerations. Its removal streamlines the documentation as the implementation is now integrated into the codebase.
- Refactor the `sandbox.py` file to enhance readability by standardizing formatting and reducing unnecessary whitespace.
- Update the `test.py` file to improve consistency in test case formatting and structure.
- Implement clearer separation of logic in the `edit` method and related test cases for better maintainability.

This commit aims to streamline the codebase, making it easier to navigate and understand while maintaining existing functionality.
- Eliminate unnecessary imports from the `test.py` file, specifically `tempfile` and `os`, to streamline the code and improve readability.
- This change contributes to a cleaner codebase by removing elements that are not utilized in the current implementation.
@bradhilton bradhilton marked this pull request as ready for review July 10, 2025 03:11
@bradhilton bradhilton merged commit fe688ee into main Jul 10, 2025
2 checks passed
@bradhilton bradhilton deleted the feat/swebench-sandbox branch July 10, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant