Please visit our nextflow implementation if you're familar with .
- Parallel processing with configurable thread usage
- Supports BWA alignment
- Generates consensus sequences
- Creates phylogenetic trees for RSV-A/RSV-B subtypes
- Produces PDF and HTML reports
- Quality control metrics including coverage statistics
git clone https://github.com/yourusername/RSVrecon.git
cd RSVreconDownload the latest release from Github Unzip the package:
gunzip RSVrecon-main.zip
cd RSVrecon-mainDownload the pre-built reference database and unzip it to a location that you have read/write permission https://github.com/stjudecab/RSVreconPy/releases/download/Pre-release/Reference.zip
We use conda to manage all dependencies. Please install conda and 'mamba'
Install Miniconda
Please check conda website for a comprehansive instruction: https://www.anaconda.com/docs/getting-started/miniconda/install
Install Mamba (recommended, it's much faster than conda)
Installing mamba
Once conda is installed, installing mamba with conda:
conda install mamba -c conda-forgeMost high-performance computing (HPC) systems come with Conda/Mamba preinstalled. To use them: Using our system as an example (please contact your HPC mamager for more details):
module load conda
module load mambabash Set_env.shExample config.yaml:
# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path
# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs
# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # Coverage cutoff threshold
# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.Download test dataset from here. FastQ files are under "fastqs" folder.
Download the pre-built reference database from here
Here is an example:
# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path
# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs
# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # Coverage cutoff threshold
# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.# export path to your PATH
export PATH=/path/to/your/RSVrecon/folder:$PATH
# activate conda env
conda activate RSVreconEnv# if you're on your local server
python rsvrecon_pipeline.py config.yaml
# If you're on HPC (using LSF as example)
# number of CPUs requested should >= THREAD_N * MAX_CONCURRENT_JOBS
bsub -n 20 -R "rusage[mem=10001]" -P CAB -J RSV -q priority -cwd $(pwd -P) "python rsvrecon_pipeline.py config.yaml"Report/
├── Mapping/          # Alignment results
├── log/              # Log files
├── Temp/             # Temporary files
├── Report.csv        # Summary table
├── Sequence_*.fasta  # Consensus sequences
├── Report.pdf        # PDF report
└── Report.html       # HTML report
Managed via RSV_env.yml:
dependencies:
  # R related
  - r-base=4.3
  - r-ggplot2
  - r-biocmanager
  - bioconductor-ggtree=3.10.0
  - bioconductor-treeio
  - r-tidyverse
  - r-devtools
  
  # Python related
  - python=3.10
  - pandas=2.2.2
  - biopython=1.78
  - pyhocon
  - reportlab
  - matplotlib
  - seaborn
  - Pillow
  - pyyaml
  # Bioinformatics tools
  - bioconda::fastp=0.23.4
  - bioconda::igvtools=2.3.93
  - bioconda::kma=1.4.9
  - bioconda::nextclade
  - bioconda::samtools=1.18
  - bioconda::blast=2.14.1
  - bioconda::bwa
  - bioconda::mafft=7.505
  - bioconda::fasttree=2.1.11- Environment creation fails → Try conda env create -f RSV_env.yml
- Pipeline errors → Check log/*.err.logfiles
- Memory issues → Reduce MAX_CONCURRENT_JOBS
Our preprint is on-line at bioRxiv