Skip to content

Conversation

john-b-yang
Copy link
Member

Adding submission for SWE-agent-LM-32B, created as part of the SWE-smith work.

$ python analysis/get_results.py evaluation/verified/20250511_sweagent_lm_32b
Removed evaluation/verified/20250511_sweagent_lm_32b/results (not required for submission)
Removed evaluation/verified/20250511_sweagent_lm_32b/preds.json (not required for submission)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:09<00:00, 50.83it/s]
Submission summary for 20250511_sweagent_lm_32b on SWE-bench verified split
==================================================
Resolved 201 instances (40.2%)
==================================================
Resolved by Repository
- astropy/astropy: 9/22 (40.91%)
- django/django: 95/231 (41.13%)
- matplotlib/matplotlib: 15/34 (44.12%)
- mwaskom/seaborn: 0/2 (0.0%)
- pallets/flask: 1/1 (100.0%)
- psf/requests: 4/8 (50.0%)
- pydata/xarray: 11/22 (50.0%)
- pylint-dev/pylint: 1/10 (10.0%)
- pytest-dev/pytest: 11/19 (57.89%)
- scikit-learn/scikit-learn: 19/32 (59.38%)
- sphinx-doc/sphinx: 12/44 (27.27%)
- sympy/sympy: 23/75 (30.67%)
==================================================
Resolved by Time
- 2013: 2/3 (66.67%)
- 2014: 2/2 (100.0%)
- 2015: 0/1 (0.0%)
- 2016: 2/2 (100.0%)
- 2017: 5/16 (31.25%)
- 2018: 10/24 (41.67%)
- 2019: 47/98 (47.96%)
- 2020: 38/108 (35.19%)
- 2021: 30/86 (34.88%)
- 2022: 38/102 (37.25%)
- 2023: 27/58 (46.55%)

SWE-agent-LM-32B is a Language Model for Software Engineering trained using the SWE-smith toolkit.
We introduce this model as part of our work: SWE-smith: Scaling Data for Software Engineering Agents.

Please copy paste this checklist in your README.md and confirm the following:

  • Is a pass@1 submission (does not attempt the same task instance more than once)
  • Does not use SWE-bench test knowledge (PASS_TO_PASS, FAIL_TO_PASS)
  • Does not use the hints field in SWE-bench
  • Does not have web-browsing OR has taken steps to prevent lookup of SWE-bench solutions via web-browsing

@john-b-yang john-b-yang merged commit 242a8f1 into SWE-bench:main May 12, 2025
FFengIll pushed a commit to project-anders/experiments that referenced this pull request Sep 30, 2025
* Init commit

* Add predictions

* Remove logs and trajs (Uploaded to shared s3 bucket)

* Update metadata with s3 paths

* Update metadata.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant