20250511 SWE-smith Submission #219

john-b-yang · 2025-05-12T03:19:40Z

Adding submission for SWE-agent-LM-32B, created as part of the SWE-smith work.

$ python analysis/get_results.py evaluation/verified/20250511_sweagent_lm_32b
Removed evaluation/verified/20250511_sweagent_lm_32b/results (not required for submission)
Removed evaluation/verified/20250511_sweagent_lm_32b/preds.json (not required for submission)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:09<00:00, 50.83it/s]
Submission summary for 20250511_sweagent_lm_32b on SWE-bench verified split
==================================================
Resolved 201 instances (40.2%)
==================================================
Resolved by Repository
- astropy/astropy: 9/22 (40.91%)
- django/django: 95/231 (41.13%)
- matplotlib/matplotlib: 15/34 (44.12%)
- mwaskom/seaborn: 0/2 (0.0%)
- pallets/flask: 1/1 (100.0%)
- psf/requests: 4/8 (50.0%)
- pydata/xarray: 11/22 (50.0%)
- pylint-dev/pylint: 1/10 (10.0%)
- pytest-dev/pytest: 11/19 (57.89%)
- scikit-learn/scikit-learn: 19/32 (59.38%)
- sphinx-doc/sphinx: 12/44 (27.27%)
- sympy/sympy: 23/75 (30.67%)
==================================================
Resolved by Time
- 2013: 2/3 (66.67%)
- 2014: 2/2 (100.0%)
- 2015: 0/1 (0.0%)
- 2016: 2/2 (100.0%)
- 2017: 5/16 (31.25%)
- 2018: 10/24 (41.67%)
- 2019: 47/98 (47.96%)
- 2020: 38/108 (35.19%)
- 2021: 30/86 (34.88%)
- 2022: 38/102 (37.25%)
- 2023: 27/58 (46.55%)

SWE-agent-LM-32B is a Language Model for Software Engineering trained using the SWE-smith toolkit.
We introduce this model as part of our work: SWE-smith: Scaling Data for Software Engineering Agents.

Please copy paste this checklist in your README.md and confirm the following:

Is a pass@1 submission (does not attempt the same task instance more than once)
Does not use SWE-bench test knowledge (PASS_TO_PASS, FAIL_TO_PASS)
Does not use the hints field in SWE-bench
Does not have web-browsing OR has taken steps to prevent lookup of SWE-bench solutions via web-browsing

* Init commit * Add predictions * Remove logs and trajs (Uploaded to shared s3 bucket) * Update metadata with s3 paths * Update metadata.yaml

john-b-yang added 5 commits May 12, 2025 03:09

Init commit

6c6e26d

Add predictions

52950ec

Remove logs and trajs (Uploaded to shared s3 bucket)

8e56802

Update metadata with s3 paths

b3f71f5

Update metadata.yaml

973507b

john-b-yang merged commit 242a8f1 into SWE-bench:main May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

20250511 SWE-smith Submission #219

20250511 SWE-smith Submission #219

Uh oh!

john-b-yang commented May 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

20250511 SWE-smith Submission #219

20250511 SWE-smith Submission #219

Uh oh!

Conversation

john-b-yang commented May 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant