GitHub - getappmap/navie-benchmark: Navie benchmarks

AppMap Navie SWE Bench Solver

This is a SWE Bench solver based on AppMap Navie.

Build Instructions

Clone with submodules

git submodule update --init --recursive

Create and activate virtualenv

Python 3.12 is required.

virtualenv .venv --python=python3.12
. ./.venv/bin/activate

Install Python dependencies

pip install ".[dev]"

Build appmap-js

cd submodules/appmap-js
yarn && yarn build

Solving Locally

Export LLM key

Options are:

OPENAI_API_KEY
ANTHROPIC_API_KEY
GOOGLE_WEB_CREDENTIALS

Export LLM model

Options are:

gemini-1.5-pro-002
gpt-4o-2024-08-06
gpt-4o-2024-05-13
gpt-4.1-2025-04-14
o1-preview-2024-09-12
o1-mini-2024-09-12
claude-3-5-sonnet-20240620
claude-3-5-sonnet-20241022
claude-3-7-sonnet-20250219

Run the "smoke" subset

python -m solver.solve \
    --instance_set smoke \
    --limit test_files=2 test_status_retry=2 code_files=2 code_status_retry=2 concurrency=1

Solving in CI

Solvers are provided as GitHub Workflows in the .github/workflows directory.

`solve.yml`

This is a main workflow to run the solver when you want to leverage the pre-generated synthetic test cases. That means that the results of this workflow are not independent of previous runs, which is by design.

It can be triggered manually or via pull request with 'test-solve' label. The test-solve label is used for smoke tests of pull requests.

The workflow:

Builds appmap-js dependencies
Prepares matrix for parallel execution
Runs solver instances across runners
Collects and aggregates results
Generates final report and artifacts

Options

use_synthetic_tests: Whether to use synthetic tests (default true)
observe_synthetic_tests: Whether to observe synthetic test execution (default false)

`official.yml`

Workflow runs of this workflow are independent of previous runs. Existing synthetic test that are present in the repo are not used by this workflow. They are create by the workflow itself in an initial step. Then, once synthetic tests are available and no further tests are being discovered, the workflow moves on to finding solutions.

Run tests

python -m pytest solver/tests

Logging

Most logging is directed by default to files, otherwise the console output from the project would be very verbose. Also, because the solver is run in parallel, the console output would be interleaved and hard to read.

So, you'll primarily find logs in the solve directory. Within this directory, the logs are organized by the instance id. Each Navie command is logged into a separate directory, with the inputs, options, and outputs in separate files.

Name		Name	Last commit message	Last commit date
Latest commit History 753 Commits
.github		.github
.navie/issues		.navie/issues
assets		assets
bin		bin
data		data
docs		docs
solver		solver
submodules		submodules
swebench		swebench
tests		tests
.appmapignore		.appmapignore
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
REVIEW.md		REVIEW.md
SWE-bench_Verified.parquet		SWE-bench_Verified.parquet
TODO.txt		TODO.txt
VERIFY.md		VERIFY.md
analyze_patch.py		analyze_patch.py
appmap.yml		appmap.yml
codecov.yml		codecov.yml
print_appmap.py		print_appmap.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
report_code_files.py		report_code_files.py
setup.cfg		setup.cfg
setup.py		setup.py
sitecustomize.py		sitecustomize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AppMap Navie SWE Bench Solver

Build Instructions

Clone with submodules

Create and activate virtualenv

Install Python dependencies

Build appmap-js

Solving Locally

Export LLM key

Export LLM model

Run the "smoke" subset

Solving in CI

`solve.yml`

`official.yml`

Run tests

Logging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 26

Uh oh!

Languages

License

getappmap/navie-benchmark

Folders and files

Latest commit

History

Repository files navigation

AppMap Navie SWE Bench Solver

Build Instructions

Clone with submodules

Create and activate virtualenv

Install Python dependencies

Build appmap-js

Solving Locally

Export LLM key

Export LLM model

Run the "smoke" subset

Solving in CI

solve.yml

official.yml

Run tests

Logging

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 26

Uh oh!

Languages

`solve.yml`

`official.yml`

Packages