mapquik is an ultra-fast read mapper based on
The underlying seed constructs (
The mapping performance of mapquik degrades markedly when identity between reads and the reference is lower than mapquik is not suitable for mapping PacBio CLR reads, and potentially also Oxford Nanopore reads until base-calling consistently reaches identity levels above
Pre-requisites: A working Rust environment.
Clone the repository, and run
rustup install nightly
cargo +nightly build --release
The nightly version of cargo is required because mapquik uses experimental language features (such as SIMD and intrinsics).
target/release/mapquik <reads.fq> --reference <reference.fa>
mapquik takes a single FASTA/FASTQ input (gzip-compressed or not) as input. Multi-line sequences are not supported.
The output of mapquik is a regular PAF file.
An example reference genome, and a script to simulate reads using pbsim are provided in the example/ folder. To run mapquik on a small set of 100 reads, type:
cd example && bash run_ecoli.sh
which will run both mapquik and minimap2 on 100 simulated reads, and return the output of paftools.js mapeval on both PAF files.
To simulate a larger set of reads using pbsim and map, type:
bash simulate_pbsim.sh && bash run_ecoli_full.sh
For further information on usage and parameters, run
target/release/mapquik -h
for a one-line summary of each flag, or run
target/release/mapquik --help
for a lengthy explanation of each flag.
All scripts used to generate the figures and tables in the paper can be found in the experiments/ folder. Specifically, the simulate_chm13.sh and simulate_maize.sh scripts can be used similarly to simulate reads.
In order to obtain and map DeepConsensus reads, first run
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/data/v0.3/assembly_analysis/fastqs/HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz
gunzip -c HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz | grep -v TOTAL > dc.hg002.fastq
and map to a reference genome reference.fa in your directory with mapquik using
target/release/mapquik dc.hg002.fastq --reference reference.fa -p mapquik-dc
mapquik significantly accelerates the seeding and chaining steps for both the human and maize genomes with minimap2, and on the maize genome, a minimap2.
mapquik indexing is minimap2, which is of independent interest.
mapquik is freely available under the MIT License.
- Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
- Rayan Chikhi at the Department of Computational Biology at Institut Pasteur
@article{mapquik,
title={Efficient mapping of accurate long reads in minimizer space with mapquik},
author={Ekim, Bar{\i}{\c{s}} and Sahlin, Kristoffer and Medvedev, Paul and Berger, Bonnie and Chikhi, Rayan},
journal={Genome Research},
pages={gr--277679},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.