RSVrecon - RSV Genome Reconstruction Pipeline

Please visit our nextflow implementation if you're familar with .

Features

Parallel processing with configurable thread usage
Supports BWA alignment
Generates consensus sequences
Creates phylogenetic trees for RSV-A/RSV-B subtypes
Produces PDF and HTML reports
Quality control metrics including coverage statistics

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

git clone https://github.com/yourusername/RSVrecon.git
cd RSVrecon

Option B: Download ZIP

Download the latest release from Github Unzip the package:

gunzip RSVrecon-main.zip
cd RSVrecon-main

1.1: Download reference database

Download the pre-built reference database and unzip it to a location that you have read/write permission https://github.com/stjudecab/RSVreconPy/releases/download/Pre-release/Reference.zip

2. Set Up Environment

We use conda to manage all dependencies. Please install conda and 'mamba'

A1. Install Conda/Mamba (If you are not on a HPC)

Install Miniconda

Please check conda website for a comprehansive instruction: https://www.anaconda.com/docs/getting-started/miniconda/install

Install Mamba (recommended, it's much faster than conda)

Installing mamba

Once conda is installed, installing mamba with conda:

conda install mamba -c conda-forge

A2. Load module Conda/Mamba (If you are on a HPC)

Most high-performance computing (HPC) systems come with Conda/Mamba preinstalled. To use them: Using our system as an example (please contact your HPC mamager for more details):

module load conda
module load mamba

B. Setup Env for RSVrecon

bash Set_env.sh

Configuration

Example config.yaml:

# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path

# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs

# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # Coverage cutoff threshold

# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.

Quick Start

1. Download test dataset and prebuilt reference

Download test dataset from here. FastQ files are under "fastqs" folder. A larger dataset is available here

Download the pre-built reference database from here

2. Edit `config.yaml` with your paths

Here is an example:

# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path

# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs

# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # upper threshold in the dual-coverage cutoff system 
COV_CUTOFF_LOW: 10              # lower threshold in the dual-coverage cutoff system

# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data. You can disable it using "#"

3. Run pipeline:

# export path to your PATH
export PATH=/path/to/your/RSVrecon/folder:$PATH
# activate conda env
conda activate RSVreconEnv

# if you're on your local server
python rsvrecon_pipeline.py config.yaml

# If you're on HPC (using LSF as example)
# number of CPUs requested should >= THREAD_N * MAX_CONCURRENT_JOBS
bsub -n 20 -R "rusage[mem=10001]" -P CAB -J RSV -q priority -cwd $(pwd -P) "python rsvrecon_pipeline.py config.yaml"

Output

Report/
├── Mapping/          # Alignment results
├── log/              # Log files
├── Temp/             # Temporary files
├── Report.csv        # Summary table
├── Sequence_*.fasta  # Consensus sequences
├── Report.pdf        # PDF report
└── Report.html       # HTML report

Dependencies

Managed via RSV_env.yml:

dependencies:
  # R related
  - r-base=4.3
  - r-ggplot2
  - r-biocmanager
  - bioconductor-ggtree=3.10.0
  - bioconductor-treeio
  - r-tidyverse
  - r-devtools
  
  # Python related
  - python=3.10
  - pandas=2.2.2
  - biopython=1.78
  - pyhocon
  - reportlab
  - matplotlib
  - seaborn
  - Pillow
  - pyyaml

  # Bioinformatics tools
  - bioconda::fastp=0.23.4
  - bioconda::igvtools=2.3.93
  - bioconda::kma=1.4.9
  - bioconda::nextclade
  - bioconda::samtools=1.18
  - bioconda::blast=2.14.1
  - bioconda::bwa
  - bioconda::mafft=7.505
  - bioconda::iqtree

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
Resource		Resource
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
Genotyping.py		Genotyping.py
Mapping.py		Mapping.py
README.md		README.md
RSV_env.yml		RSV_env.yml
RSV_functions.py		RSV_functions.py
RenderTree.R		RenderTree.R
Report.py		Report.py
Report_functions.py		Report_functions.py
SNP.py		SNP.py
Set_env.sh		Set_env.sh
igv.log		igv.log
rsvrecon_pipeline.py		rsvrecon_pipeline.py
template.html		template.html
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RSVrecon - RSV Genome Reconstruction Pipeline

Table of Contents

Features

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

Option B: Download ZIP

1.1: Download reference database

2. Set Up Environment

A1. Install Conda/Mamba (If you are not on a HPC)

A2. Load module Conda/Mamba (If you are on a HPC)

B. Setup Env for RSVrecon

Configuration

Quick Start

1. Download test dataset and prebuilt reference

2. Edit `config.yaml` with your paths

3. Run pipeline:

Output

Dependencies

Troubleshooting

Common Issues:

Citation

About

Uh oh!

Releases 2

Packages

Languages

stjudecab/RSVreconPy

Folders and files

Latest commit

History

Repository files navigation

RSVrecon - RSV Genome Reconstruction Pipeline

Table of Contents

Features

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

Option B: Download ZIP

1.1: Download reference database

2. Set Up Environment

A1. Install Conda/Mamba (If you are not on a HPC)

A2. Load module Conda/Mamba (If you are on a HPC)

B. Setup Env for RSVrecon

Configuration

Quick Start

1. Download test dataset and prebuilt reference

2. Edit config.yaml with your paths

3. Run pipeline:

Output

Dependencies

Troubleshooting

Common Issues:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

2. Edit `config.yaml` with your paths

Packages