chore: SWE-Bench related changes #181

bradhilton · 2025-06-30T21:38:01Z

No description provided.

- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds. - Updated execution counts and outputs in train.ipynb, including changes to model name and configuration. - Enhanced output formatting for better readability and added error handling in training process.

- Added a comma to the swebench dependency in pyproject.toml. - Updated the source URL for sweagent in both pyproject.toml and uv.lock. - Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.

- Removed conditional logic for level assignment in run_on_workers, setting it directly to 1 for clarity.

- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration. - Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.

- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.

- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations. - Adjusted the reward calculation formula for improved performance. - Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency. - Modified learning rate in training configuration for optimized model training.

- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity. - Updated model name and adjusted learning rate for better training performance. - Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.

- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration. - Enabled async weight syncing in train.ipynb for improved model training performance. - Commented out the learning rate configuration in the training notebook for clarity.

- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds. - Updated execution counts and outputs in train.ipynb, including changes to model name and configuration. - Enhanced output formatting for better readability and added error handling in training process.

- Added a comma to the swebench dependency in pyproject.toml. - Updated the source URL for sweagent in both pyproject.toml and uv.lock. - Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.

- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration. - Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.

- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.

- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations. - Adjusted the reward calculation formula for improved performance. - Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency. - Modified learning rate in training configuration for optimized model training.

- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity. - Updated model name and adjusted learning rate for better training performance. - Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.

- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration. - Enabled async weight syncing in train.ipynb for improved model training performance. - Commented out the learning rate configuration in the training notebook for clarity.

…nforcement-training into feat/swebench

- Added `daytona-sdk` as a dependency in `pyproject.toml` and updated its version to `0.21.5`. - Introduced `daytona-api-client` and `daytona-api-client-async` packages in `uv.lock`. - Updated `skypilot` dependency in `pyproject.toml` to remove unnecessary extras. - Created a new script `daytona.py` for running SWE-bench tests, including functionality for handling missing modules and analyzing test results.

- Introduced a `Logger` class to manage logging behavior during test execution, supporting various logging modes. - Updated `run_tests` and `install_base_dependencies` functions to utilize the new logging system for better output management. - Enhanced `test_instances` function to support parallel execution and progress tracking with optional progress bar. - Added command-line arguments for logging mode, parallel execution, and exception handling to improve user experience.

- Enhanced the `run_tests` function to provide detailed error messages when tests do not pass as expected, including information on expected vs. actual results. - Updated the `test_instances` function to streamline the execution flow, supporting both parallel and sequential test runs while maintaining progress tracking. - Modified command-line argument help text for clarity regarding exception printing behavior.

- Added an optional `index` parameter to the `run_tests` function for improved logging of instance IDs. - Updated the progress bar in `test_instances` to display the range of instance indices being tested. - Introduced a new `parse_indices` function to handle individual and range-based index inputs from command-line arguments, improving user experience and flexibility in instance selection.

- Implemented functionality in `test_instances` to delete any running sandboxes before executing tests, enhancing resource management. - Updated the instance filtering logic in `get_filtered_swe_smith_instances_df` to exclude a specific instance ID, improving selection accuracy.

…ces_df - Modified the instance filtering logic to include an additional instance ID, improving the accuracy of instance selection for testing.

…mproved diagnostics - Added functionality to automatically install missing pytest plugins when detected during test execution, improving test reliability. - Implemented recursive dependency installation for conftest loading errors, enhancing error handling and reducing manual intervention. - Introduced additional diagnostics to log potential issues when no tests are executed, aiding in troubleshooting and improving user experience.

- Added new instance IDs to the filtering logic, enhancing the selection criteria for testing instances.

bradhilton added 24 commits June 27, 2025 02:49

refactor: Simplify level assignment in TorchtuneService

ba8fe38

- Removed conditional logic for level assignment in run_on_workers, setting it directly to 1 for clarity.

refactor: Clean up outputs and execution counts in training notebook

8953a89

- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.

refactor: Clean up outputs and execution counts in training notebook

3adef91

- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.

Merge branch 'feat/swebench' of https://github.com/OpenPipe/agent-rei…

97627ac

…nforcement-training into feat/swebench

fix: Update instance filtering logic in get_filtered_swe_smith_instan…

b76c958

…ces_df - Modified the instance filtering logic to include an additional instance ID, improving the accuracy of instance selection for testing.

feat: Update instance filtering in get_filtered_swe_smith_instances_df

65f454c

- Added new instance IDs to the filtering logic, enhancing the selection criteria for testing instances.

bradhilton marked this pull request as ready for review July 1, 2025 19:45

bradhilton merged commit b85897f into main Jul 1, 2025
1 check passed

bradhilton deleted the feat/swebench branch July 1, 2025 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: SWE-Bench related changes #181

chore: SWE-Bench related changes #181

Uh oh!

bradhilton commented Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!

chore: SWE-Bench related changes #181

chore: SWE-Bench related changes #181

Uh oh!

Conversation

bradhilton commented Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!