SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling [ACL'25 Findings]
Haoran Wang*, Zhenyu Hou*, Yao Wei, Jie Tang, Yuxiao Dong
π Paper | π€ HF(Model) | π€ HF(Data)
LLMs have advanced from conversational problem solving to real-world tasks such as software engineering (SWE). However, building effective SWE agents remains challenging due to the lack of high-quality training data and reliable test-time evaluation.
To address this issue, we present SWE-Dev, an SWE agent with a focus on training and inference scaling.
- For training scaling, we develop a robust pipeline to synthesize test cases and scale up agent trajectories to construct the training data.
- For inference scaling, we increase the interaction budget within a single run to enable further thinking within one independent attempt.
Experiments on the SWE-bench-Verified benchmark show that the SWE-Dev models can achieve top performance among all open SWE agents. Specifically, the resolve rate of our 7B and 32B models reach 23.4% and 36.6%, respectively, outperforming state-of-the-art open-source models.
The main configuration file is located at conf/config/default.yaml and contains settings for all pipeline stages:
To validate your configuration:
python -m swedev.config --validateTo view the current configuration:
python -m swedev.config --printYou can override any configuration value when running scripts:
python your_script.py paths.local_repo_dir=/new/path github.tokens=[token1,token2]from swedev.config import Config
# Access basic configuration
conda_base = Config.conda_base
github_tokens = Config.github_tokens
# Access stage-specific settings
localizer_model = Config.Localizer.model
description_model = Config.Description.model
testcase_model = Config.Testcase.model
revise_rounds = Config.Testcase.revise_roundsSet up your configuration in conf/config/default.yaml with GitHub tokens and repository directories before running these commands.
You need to install chrome driver first. In ubuntu, you can install simply by
apt install chromium-chromedriver
python -m swedev.crawl.get_top_pypi \
--max_repos 100 \
--output_folder results/packages \
--num_workers 8 \
--start_at 0python -m swedev.crawl.pypi_crawler \
--output results/packages/github_urls.jsonl \
--workers 16
β οΈ Note: Keep concurrency lower to respect GitHub rate limits
python -m swedev.issues.get_tasks_pipeline \
--repo_file results/packages/pypi_rankings.jsonl \
--output_folder results/issues \
--cutoff_date 20210101 \
--num_workers 64 \
--max_pulls 1000If you enable --do_clone, the script will clone repositories to the directory specified by local_repo_dir in your configuration.
If you encounter persistent
404 - Errormessages, manually terminate and combine results
python -m swedev.issues.get_tasks_pipeline \
--repo_file results/issues/packages/pypi_rankings.jsonl \
--output_folder results/issues \
--combine_resultsFor parallel environments, create a base environment first to avoid Conda concurrent installation issues:
conda create -n swedevbase python=3.11 -y
conda create -n {env_name} --clone swedevbase # For later usageBefore the generation pipeline, you should config your api info at conf/config.yaml
First, generate descriptions:
python -m swedev.testcases.get_descriptions \
--dataset_file results/issues/all_tasks.jsonl \
--output_folder results/descriptions \
--num_workers 16Then generate test cases:
python -m swedev.testcases.get_testcases \
--dataset_file results/descriptions/output_f2p.jsonl \
--top_n 5 \
--output_folder results/testcases/ \
--num_workers 80We provide a Dockerfile based on Ubuntu 22.04 that installs all necessary dependencies for evaluation. The image includes comprehensive development tools. If you encounter errors, you can manually install the dependencies in Dockerfile and then use docker commit to save your image.
First, build the Docker image:
# Build the Docker image from the provided Dockerfile
docker build -t swedev-evaluator:latest .Run the evaluation container:
docker run -d --network host \
-v /raid:/raid \
-w /raid/SWE-Dev \
--restart always \
swedev-evaluator:latest \
/raid/SWE-Dev/miniforge3/envs/swedev/bin/python -m swedev.testcases.eval_testcases \
--dataset /raid/SWE-Dev/results/testcases/output.jsonl \
--output_folder /raid/SWE-Dev/results/evaluation-0508 \
--num_workers 80You should use absolute path when mounting directories
python -m swedev.testcases.eval_testcases \
--dataset /raid/SWE-Dev/results/testcases/output.jsonl \
--output_folder results/evaluation-0508\
--num_workers 32python -m swedev.testcases.eval_testcases \
--dataset results/evaluation-0218/evaluated_testcases \
--show_reportpython swebench.utils.formatter \
--dataset results/trajectory/qwen-45round-v0227.jsonl \
--output_folder results/swedata \
--output_name swe-qwen-45round-v0227.jsonl \
--dataset_type openhandsWe thank the following open-source projects for their contributions: