RSVrecon - RSV Genome Reconstruction Pipeline

Please visit our nextflow implementation if you're familar with .

Features

Parallel processing with configurable thread usage
Supports BWA alignment
Generates consensus sequences
Creates phylogenetic trees for RSV-A/RSV-B subtypes
Produces PDF and HTML reports
Quality control metrics including coverage statistics

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

git clone https://github.com/yourusername/RSVrecon.git
cd RSVrecon

Option B: Download ZIP

Download the latest release from Github Unzip the package:

gunzip RSVrecon-main.zip
cd RSVrecon-main

1.1: Download reference database

Download the pre-built reference database and unzip it to a location that you have read/write permission https://github.com/stjudecab/RSVreconPy/releases/download/Pre-release/Reference.zip

2. Set Up Environment

We use conda to manage all dependencies. Please install conda and 'mamba'

A1. Install Conda/Mamba (If you are not on a HPC)

Install Miniconda

Please check conda website for a comprehansive instruction: https://www.anaconda.com/docs/getting-started/miniconda/install

Install Mamba (recommended, it's much faster than conda)

Installing mamba

Once conda is installed, installing mamba with conda:

conda install mamba -c conda-forge

A2. Load module Conda/Mamba (If you are on a HPC)

Most high-performance computing (HPC) systems come with Conda/Mamba preinstalled. To use them: Using our system as an example (please contact your HPC mamager for more details):

module load conda
module load mamba

B. Setup Env for RSVrecon

bash Set_env.sh

Configuration

Example config.yaml:

# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path

# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs

# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # Coverage cutoff threshold

# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.

Quick Start

1. Download test dataset and prebuilt reference

Download test dataset from here. FastQ files are under "fastqs" folder.

Download the pre-built reference database from here

2. Edit `config.yaml` with your paths

Here is an example:

# Required paths
DATA_DIR: /path/to/input/fastq_files         # Please put all your paired-FASTQ files under this input folder
REFERENCE_DIR: /path/to/reference/sequences  # Please download our pre-built reference, unzip it, then paste the path here. Make sure you have both read and write permission
OUTPUT_DIR: /path/to/output/directory        # please specify a output folder path

# Performance parameters
THREAD_N: 2                     # Threads per sample, for BWA-MEM
MAX_CONCURRENT_JOBS: 10         # Parallel samples to process, notice: THREAD_N * MAX_CONCURRENT_JOBS should < than your number of CPUs

# Analysis parameters
TOOL: BWA                       # Currently only BWA supported
COV_CUTOFF: 50                  # Coverage cutoff threshold

# Optional
RSV_NEXT_PIPE_RES: /path/to/additional/results  # We allow users to compare RSVrecon with RSV-NEXT-PIPE results. Please specify the "consensus" folder of RSV-NEXT-PIPE output for the same batch of data.

3. Run pipeline:

# export path to your PATH
export PATH=/path/to/your/RSVrecon/folder:$PATH
# activate conda env
conda activate RSVreconEnv

# if you're on your local server
python rsvrecon_pipeline.py config.yaml

# If you're on HPC (using LSF as example)
# number of CPUs requested should >= THREAD_N * MAX_CONCURRENT_JOBS
bsub -n 20 -R "rusage[mem=10001]" -P CAB -J RSV -q priority -cwd $(pwd -P) "python rsvrecon_pipeline.py config.yaml"

Output

Report/
├── Mapping/          # Alignment results
├── log/              # Log files
├── Temp/             # Temporary files
├── Report.csv        # Summary table
├── Sequence_*.fasta  # Consensus sequences
├── Report.pdf        # PDF report
└── Report.html       # HTML report

Dependencies

Managed via RSV_env.yml:

dependencies:
  # R related
  - r-base=4.3
  - r-ggplot2
  - r-biocmanager
  - bioconductor-ggtree=3.10.0
  - bioconductor-treeio
  - r-tidyverse
  - r-devtools
  
  # Python related
  - python=3.10
  - pandas=2.2.2
  - biopython=1.78
  - pyhocon
  - reportlab
  - matplotlib
  - seaborn
  - Pillow
  - pyyaml

  # Bioinformatics tools
  - bioconda::fastp=0.23.4
  - bioconda::igvtools=2.3.93
  - bioconda::kma=1.4.9
  - bioconda::nextclade
  - bioconda::samtools=1.18
  - bioconda::blast=2.14.1
  - bioconda::bwa
  - bioconda::mafft=7.505
  - bioconda::fasttree=2.1.11

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Resource		Resource
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
Genotyping.py		Genotyping.py
Mapping.py		Mapping.py
README.md		README.md
RSV_env.yml		RSV_env.yml
RSV_functions.py		RSV_functions.py
RenderTree.R		RenderTree.R
Report.py		Report.py
Report_functions.py		Report_functions.py
SNP.py		SNP.py
Set_env.sh		Set_env.sh
igv.log		igv.log
rsvrecon_pipeline.py		rsvrecon_pipeline.py
template.html		template.html
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

RSVrecon - RSV Genome Reconstruction Pipeline

Table of Contents

Features

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

Option B: Download ZIP

1.1: Download reference database

2. Set Up Environment

A1. Install Conda/Mamba (If you are not on a HPC)

A2. Load module Conda/Mamba (If you are on a HPC)

B. Setup Env for RSVrecon

Configuration

Quick Start

1. Download test dataset and prebuilt reference

2. Edit `config.yaml` with your paths

3. Run pipeline:

Output

Dependencies

Troubleshooting

Common Issues:

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

stjudecab/RSVreconPy

Folders and files

Latest commit

History

Repository files navigation

RSVrecon - RSV Genome Reconstruction Pipeline

Table of Contents

Features

Installation

1. Clone Repository

Option A: Clone with Git (recommended)

Option B: Download ZIP

1.1: Download reference database

2. Set Up Environment

A1. Install Conda/Mamba (If you are not on a HPC)

A2. Load module Conda/Mamba (If you are on a HPC)

B. Setup Env for RSVrecon

Configuration

Quick Start

1. Download test dataset and prebuilt reference

2. Edit config.yaml with your paths

3. Run pipeline:

Output

Dependencies

Troubleshooting

Common Issues:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Edit `config.yaml` with your paths

Packages