Skip to content

chore: SWE-Bench related changes #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 1, 2025
Merged

chore: SWE-Bench related changes #181

merged 24 commits into from
Jul 1, 2025

Conversation

bradhilton
Copy link
Collaborator

No description provided.

- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds.
- Updated execution counts and outputs in train.ipynb, including changes to model name and configuration.
- Enhanced output formatting for better readability and added error handling in training process.
- Added a comma to the swebench dependency in pyproject.toml.
- Updated the source URL for sweagent in both pyproject.toml and uv.lock.
- Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.
- Removed conditional logic for level assignment in run_on_workers, setting it directly to 1 for clarity.
- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration.
- Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.
- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency.
- Streamlined the notebook to enhance readability and maintainability.
- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations.
- Adjusted the reward calculation formula for improved performance.
- Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency.
- Modified learning rate in training configuration for optimized model training.
- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity.
- Updated model name and adjusted learning rate for better training performance.
- Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.
- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration.
- Enabled async weight syncing in train.ipynb for improved model training performance.
- Commented out the learning rate configuration in the training notebook for clarity.
- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds.
- Updated execution counts and outputs in train.ipynb, including changes to model name and configuration.
- Enhanced output formatting for better readability and added error handling in training process.
- Added a comma to the swebench dependency in pyproject.toml.
- Updated the source URL for sweagent in both pyproject.toml and uv.lock.
- Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.
- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration.
- Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.
- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency.
- Streamlined the notebook to enhance readability and maintainability.
- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations.
- Adjusted the reward calculation formula for improved performance.
- Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency.
- Modified learning rate in training configuration for optimized model training.
- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity.
- Updated model name and adjusted learning rate for better training performance.
- Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.
- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration.
- Enabled async weight syncing in train.ipynb for improved model training performance.
- Commented out the learning rate configuration in the training notebook for clarity.
- Added `daytona-sdk` as a dependency in `pyproject.toml` and updated its version to `0.21.5`.
- Introduced `daytona-api-client` and `daytona-api-client-async` packages in `uv.lock`.
- Updated `skypilot` dependency in `pyproject.toml` to remove unnecessary extras.
- Created a new script `daytona.py` for running SWE-bench tests, including functionality for handling missing modules and analyzing test results.
- Introduced a `Logger` class to manage logging behavior during test execution, supporting various logging modes.
- Updated `run_tests` and `install_base_dependencies` functions to utilize the new logging system for better output management.
- Enhanced `test_instances` function to support parallel execution and progress tracking with optional progress bar.
- Added command-line arguments for logging mode, parallel execution, and exception handling to improve user experience.
- Enhanced the `run_tests` function to provide detailed error messages when tests do not pass as expected, including information on expected vs. actual results.
- Updated the `test_instances` function to streamline the execution flow, supporting both parallel and sequential test runs while maintaining progress tracking.
- Modified command-line argument help text for clarity regarding exception printing behavior.
- Added an optional `index` parameter to the `run_tests` function for improved logging of instance IDs.
- Updated the progress bar in `test_instances` to display the range of instance indices being tested.
- Introduced a new `parse_indices` function to handle individual and range-based index inputs from command-line arguments, improving user experience and flexibility in instance selection.
- Implemented functionality in `test_instances` to delete any running sandboxes before executing tests, enhancing resource management.
- Updated the instance filtering logic in `get_filtered_swe_smith_instances_df` to exclude a specific instance ID, improving selection accuracy.
…ces_df

- Modified the instance filtering logic to include an additional instance ID, improving the accuracy of instance selection for testing.
…mproved diagnostics

- Added functionality to automatically install missing pytest plugins when detected during test execution, improving test reliability.
- Implemented recursive dependency installation for conftest loading errors, enhancing error handling and reducing manual intervention.
- Introduced additional diagnostics to log potential issues when no tests are executed, aiding in troubleshooting and improving user experience.
- Added new instance IDs to the filtering logic, enhancing the selection criteria for testing instances.
@bradhilton bradhilton marked this pull request as ready for review July 1, 2025 19:45
@bradhilton bradhilton merged commit b85897f into main Jul 1, 2025
1 check passed
@bradhilton bradhilton deleted the feat/swebench branch July 1, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant