-
Notifications
You must be signed in to change notification settings - Fork 302
chore: SWE-Bench related changes #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds. - Updated execution counts and outputs in train.ipynb, including changes to model name and configuration. - Enhanced output formatting for better readability and added error handling in training process.
- Added a comma to the swebench dependency in pyproject.toml. - Updated the source URL for sweagent in both pyproject.toml and uv.lock. - Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.
- Removed conditional logic for level assignment in run_on_workers, setting it directly to 1 for clarity.
- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration. - Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.
- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.
- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations. - Adjusted the reward calculation formula for improved performance. - Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency. - Modified learning rate in training configuration for optimized model training.
- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity. - Updated model name and adjusted learning rate for better training performance. - Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.
- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration. - Enabled async weight syncing in train.ipynb for improved model training performance. - Commented out the learning rate configuration in the training notebook for clarity.
- Reduced the default timeout in eval_instance from 600.0 to 240.0 seconds. - Updated execution counts and outputs in train.ipynb, including changes to model name and configuration. - Enhanced output formatting for better readability and added error handling in training process.
- Added a comma to the swebench dependency in pyproject.toml. - Updated the source URL for sweagent in both pyproject.toml and uv.lock. - Bumped versions for flask-cors (6.0.1), python-engineio (4.12.2), swe-rex (1.3.0), and textual (3.5.0) in uv.lock.
- Introduced a timeout parameter (default 15 minutes) to the rollout function in rollout.py to enhance control over execution duration. - Updated the training notebook to reflect changes in model name and configuration, including adjustments to execution counts and outputs.
- Removed unnecessary outputs and set execution counts to null in train.ipynb for improved clarity and consistency. - Streamlined the notebook to enhance readability and maintainability.
- Added a reward_power parameter to the rollout function in rollout.py to allow for customizable reward calculations. - Adjusted the reward calculation formula for improved performance. - Updated the training notebook to reflect changes in rollout parameters and increased batch sizes for better training efficiency. - Modified learning rate in training configuration for optimized model training.
- Set execution count to null and removed unnecessary outputs in train.ipynb for improved clarity. - Updated model name and adjusted learning rate for better training performance. - Added clip_grad_norm parameter to TorchtuneService for enhanced gradient clipping control.
- Updated the timeout parameter in config.py from 10 minutes to 30 minutes for extended execution duration. - Enabled async weight syncing in train.ipynb for improved model training performance. - Commented out the learning rate configuration in the training notebook for clarity.
- Added `daytona-sdk` as a dependency in `pyproject.toml` and updated its version to `0.21.5`. - Introduced `daytona-api-client` and `daytona-api-client-async` packages in `uv.lock`. - Updated `skypilot` dependency in `pyproject.toml` to remove unnecessary extras. - Created a new script `daytona.py` for running SWE-bench tests, including functionality for handling missing modules and analyzing test results.
- Introduced a `Logger` class to manage logging behavior during test execution, supporting various logging modes. - Updated `run_tests` and `install_base_dependencies` functions to utilize the new logging system for better output management. - Enhanced `test_instances` function to support parallel execution and progress tracking with optional progress bar. - Added command-line arguments for logging mode, parallel execution, and exception handling to improve user experience.
- Enhanced the `run_tests` function to provide detailed error messages when tests do not pass as expected, including information on expected vs. actual results. - Updated the `test_instances` function to streamline the execution flow, supporting both parallel and sequential test runs while maintaining progress tracking. - Modified command-line argument help text for clarity regarding exception printing behavior.
- Added an optional `index` parameter to the `run_tests` function for improved logging of instance IDs. - Updated the progress bar in `test_instances` to display the range of instance indices being tested. - Introduced a new `parse_indices` function to handle individual and range-based index inputs from command-line arguments, improving user experience and flexibility in instance selection.
- Implemented functionality in `test_instances` to delete any running sandboxes before executing tests, enhancing resource management. - Updated the instance filtering logic in `get_filtered_swe_smith_instances_df` to exclude a specific instance ID, improving selection accuracy.
…ces_df - Modified the instance filtering logic to include an additional instance ID, improving the accuracy of instance selection for testing.
…mproved diagnostics - Added functionality to automatically install missing pytest plugins when detected during test execution, improving test reliability. - Implemented recursive dependency installation for conftest loading errors, enhancing error handling and reducing manual intervention. - Introduced additional diagnostics to log potential issues when no tests are executed, aiding in troubleshooting and improving user experience.
- Added new instance IDs to the filtering logic, enhancing the selection criteria for testing instances.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.