Please visit our nextflow implementation if you're familar with .
- Parallel processing with configurable thread usage
- Supports BWA alignment
- Generates consensus sequences
- Creates phylogenetic trees for RSV-A/RSV-B subtypes
- Produces PDF and HTML reports
- Quality control metrics including coverage statistics
git clone https://github.com/yourusername/RSVrecon.git
cd RSVreconDownload the latest release from Github Unzip the package:
gunzip RSVrecon-main.zip
cd RSVrecon-mainDownload the pre-built reference database and unzip it to a location that you have read/write permission https://github.com/stjudecab/RSVreconPy/releases/download/Pre-release/Reference.zip
We use conda to manage all dependencies. Please install conda and 'mamba'
Install Miniconda
Please check conda website for a comprehansive instruction: https://www.anaconda.com/docs/getting-started/miniconda/install
Install Mamba (recommended, it's much faster than conda)
Installing mamba
Once conda is installed, installing mamba with conda:
conda install mamba -c conda-forgeMost high-performance computing (HPC) systems come with Conda/Mamba preinstalled. To use them: Using our system as an example (please contact your HPC mamager for more details):
module load conda
module load mambabash Set_env.shExample config.yaml:
# Required paths
DATA_DIR: /path/to/input/fastq_files # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory # please specify a output folder path
# Performance parameters
THREAD_N: 2 # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10 # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs
# Analysis parameters
TOOL: BWA # Currently only BWA supported
COV_CUTOFF: 50 # Coverage cutoff threshold
# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.Download test dataset from here. FastQ files are under "fastqs" folder. A larger dataset is available here
Download the pre-built reference database from here
Here is an example:
# Required paths
DATA_DIR: /path/to/input/fastq_files # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory # please specify a output folder path
# Performance parameters
THREAD_N: 2 # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10 # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs
# Analysis parameters
TOOL: BWA # Currently only BWA supported
COV_CUTOFF: 50 # upper threshold in the dual-coverage cutoff system
COV_CUTOFF_LOW: 10 # lower threshold in the dual-coverage cutoff system
# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data. You can disable it using "#"# export path to your PATH
export PATH=/path/to/your/RSVrecon/folder:$PATH
# activate conda env
conda activate RSVreconEnv# if you're on your local server
python rsvrecon_pipeline.py config.yaml
# If you're on HPC (using LSF as example)
# number of CPUs requested should >= THREAD_N * MAX_CONCURRENT_JOBS
bsub -n 20 -R "rusage[mem=10001]" -P CAB -J RSV -q priority -cwd $(pwd -P) "python rsvrecon_pipeline.py config.yaml"Report/
├── Mapping/ # Alignment results
├── log/ # Log files
├── Temp/ # Temporary files
├── Report.csv # Summary table
├── Sequence_*.fasta # Consensus sequences
├── Report.pdf # PDF report
└── Report.html # HTML report
Managed via RSV_env.yml:
dependencies:
# R related
- r-base=4.3
- r-ggplot2
- r-biocmanager
- bioconductor-ggtree=3.10.0
- bioconductor-treeio
- r-tidyverse
- r-devtools
# Python related
- python=3.10
- pandas=2.2.2
- biopython=1.78
- pyhocon
- reportlab
- matplotlib
- seaborn
- Pillow
- pyyaml
# Bioinformatics tools
- bioconda::fastp=0.23.4
- bioconda::igvtools=2.3.93
- bioconda::kma=1.4.9
- bioconda::nextclade
- bioconda::samtools=1.18
- bioconda::blast=2.14.1
- bioconda::bwa
- bioconda::mafft=7.505
- bioconda::iqtree- Environment creation fails → Try
conda env create -f RSV_env.yml - Pipeline errors → Check
log/*.err.logfiles - Memory issues → Reduce
MAX_CONCURRENT_JOBS
Our preprint is on-line at bioRxiv