Arborist builds trees for the IEDB. The trees are used for the user interface on https://iedb.org and the IEDB curation interface, and also for validating IEDB data. They combine data from the IEDB with community ontologies such as the NCBI Taxonomy and open scientific databases such as UniProt and Genbank.
WARN: This version of Arborist is still work-in-progress. It makes extensive use of Nanobot, which is also work-in-progress.
See docs for more detailed information and instructions.
The Makefile defines and documents all the specific steps for Arborist.
Run make help to see the list of main tasks.
You can either run make directly or inside a Docker container.
For Docker, run ./run_image.sh make or sudo -E ./run_image.sh make.
If you aren't using Docker,
first install the required software by running make deps.
NOTE: Arborist currently supports only Linux on the x86_64 architecture.
The suggested workflow is:
- Update the cache with the latest IEDB tables
by running
src/iedb/update-cache. This requires MySQL/MariaDB connection parameters to be set as IEDB_MYSQL_* environment variables: IEDB_MYSQL_HOST, IEDB_MYSQL_PORT, IEDB_MYSQL_USER, IEDB_MYSQL_PASSWORD, IEDB_MYSQL_DATABASE. - Run
make allto build all trees. - Run
make serveto start the web interface on http://localhost:3000.
These are the key Make tasks for building trees, in their dependency order:
make iedbload IEDB data: This runs thesrc/iedb/update-cachescriptmake ncbitaxonbuild the NCBI Taxonomymake organismbuild the organism and subspecies trees: This also creates the list of "active species" used by IEDB, and the "active taxa" that fall under these species.make proteomeselect a proteome for each active speciesmake proteinbuild the protein treemake allbuild all trees
TODO: build more trees: peptide, molecule, assay, disease, geolocation, ...
Here are some other important Make tasks:
make depsinstall required softwaremake serverun the web interface on http://localhost:3000make cleanremove all build filesmake clobberremove all generated filesmake helpprint this message
bin/contains any required binaries that aren't already installedbuild/all sorts of generated filesiedb/selected tables from IEDB for use herearborist/general build files<species_id>/species-specific build files
cache/compressed data from various sourcesiedb/selected tables from IEDBncbitaxon/NCBI Taxonomy'staxdmp.zipfiles
current/links to the cached data to use for buildsiedblinks to a subdirectory ofcache/iedb/taxdmp.ziplinks to a file incache/ncbitaxon/
result/TODO date-stamped directories of results, andlatestlinksrc/iedb/config and schemas for IEDB dataarborist/config and schemas for Arborist tablesspecies/config and schemas for species proteomes and protein treesorganism/scripts for building the organism treeproteome/scripts for selecting proteomesutil/utility scripts for working with databasestemplates/Nanobot HTML templates