DiMA (DNA Mapping) is a pipeline for Next-Generation Sequencing data alignment.
All solida-core workflows follow GATK Best Practices for Germline Variant Discovery, with the incorporation of further improvements and refinements after their testing with real data in various CRS4 Next Generation Sequencing Core Facility research sequencing projects.
Pipelines are based on Snakemake, a workflow management system that provides all the features needed to create reproducible and scalable data analyses.
Software dependencies are specified into the environment.yaml file and directly managed by Snakemake using Conda, ensuring the reproducibility of the workflow on a great number of different computing environments such as workstations, clusters and cloud environments.
The pipeline workflow is aimed at Mapping paired-end reads in fastq format against a reference genome to produce a deduplicated and recalibrated BAM file.
Obtained BAM files can be then included in Variant Calling processes or visualized with tools like IGV. DiMA pipeline is included in other solida-core pipelines that requires the mapping step (i.e. DiVA). The standalone usage is recommended for analysis which requires BAM files and not a Variant Calling step.
A complete view of the analysis workflow is provided by the pipeline's graph.
DiMA pipeline documentation can be found in the docs/ directory: