Legacy archive, now renamed and found at https://github.com/XPRESSyourself/XPRESSpipe

RiboPipe v0.1.5-beta
A Flexible Sequence Assembly and Analysis Pipeline

Author: Jordan A Berg
Affiliation: Department of Biochemistry, University of Utah, Salt Lake City, Utah, USA

Contact: jordan <dot> berg <at> biochem <dot> utah <dot> edu

Please cite the following any publications where this software was used to process or analyze data:

Berg, JA, ..., Rutter, JP. (XXXX) RiboPipe: A Flexible Sequence Assembly and Analysis Pipeline. Coming soon.

WHAT IS RIBOPIPE? Ribosome profiling utilizes Next Generation Sequencing to provide a detailed picture of the protein translation landscape within cells. Cells are lysed, translating ribosomes are isolated, and the ribosome protected mRNA fragments are integrated into a sequencing library. The library is then sequenced and raw data (often in the form of .fastq or .txt files) is generated. This pipeline is flexibly designed to be able to process and perform preliminary analyses on SE (single-end) short (<= 100 bp) read raw sequence data.

See this paper for a recent discussion and detailed protocol of the technique.

RiboPipe is a ribosome profiling raw data assembly and preliminary analysis pipeline intended to ease the process of analyzing ribosome profiling data. It alleviates the pain of having to manually pass each raw read file through the appropriate quality trimming and assembly software. Additionally, it mitigates any potential stress by outputting the necessary quality checking analysis so the user can verify the quality of their run. It also offers the benefit of multiprocessing to make full use of computational resources, as well as faster assemblers to speed up this assembly process.

Watch this video for a walkthrough of how to use Ribopipe.

LOCAL INSTALLATION:

Make sure Python3 (we recommend version 3.5.0 or higher), git, and wget are installed.
Download Conda, a package manager, for your operating system. Double click the .pkg file if on MacOS, the .exe file on Windows, or follow these instructions on Linux.

Execute the following lines of code in Terminal (on Mac, open Spotlight and type 'Terminal'):

#3.1a: to download current repository:
git clone https://github.com/j-berg/ribopipe.git
cd ribopipe/ribopipe/references

#3.1b: to download specific version
tag='v0.1.4-beta'
wget https://github.com/j-berg/ribopipe/archive/$tag.zip
unzip ribopipe-${tag:1}.zip
cd ribopipe-${tag:1}/ribopipe/references

#3.2: get reference
model='yeast'
program='hisat2'
wget https://sourceforge.net/projects/ribopipe/files/${program}_references/${model}_reference_${program}.zip
unzip ${model}_reference_${program}.zip
rm ${model}_reference_${program}.zip
cd ../../
python3 setup.py install --prefix ~/.local

#3.3: add script installation location given near the end of the setup scripting output to ~/.bashrc or ~/.bash_profile
#add to .bashrc
echo "PATH='/path/to/scripts/:$PATH'" >> ~/.bashrc
#add to .bash_profile
echo "PATH='/path/to/scripts/:$PATH'" >> ~/.bash_profile

#3.4: Test by typing the following:
ribopipe --help

#3.5: Install conda dependencies:
ribopipe install

See local_install.sh in the resources folder for interactive script

HPC INSTALLATION:

Make sure Python3, git, and wget are installed (we recommend version 3.5.0 or higher).

Execute the following lines of code:

#3.1a: to download current repository:
git clone https://github.com/j-berg/ribopipe.git
cd ribopipe/ribopipe/references

#3.1b: to download specific version
tag='v0.1.4-beta'
wget https://github.com/j-berg/ribopipe/archive/$tag.zip
unzip ribopipe-${tag:1}.zip
cd ribopipe-${tag:1}/ribopipe/references

#3.2: get reference
model='yeast'
program='hisat2'
wget https://sourceforge.net/projects/ribopipe/files/${program}_references/${model}_reference_${program}.zip
unzip ${model}_reference_${program}.zip
rm ${model}_reference_${program}.zip
cd ../../
module load python3
python setup.py install --prefix ~/.local

#3.3: add script installation location given near the end of the setup scripting output to ~/.bashrc or ~/.bash_profile
#add to .bashrc
echo "PATH='/path/to/scripts/:$PATH'" >> ~/.bashrc
#add to .bash_profile
echo "PATH='/path/to/scripts/:$PATH'" >> ~/.bash_profile

#3.4: Test by typing the following:
ribopipe --help

See hpc_install.sh in the resources folder for interactive script

Modify hpc_run_template.sh in the resources folder for an example script for submitting the pipeline job to the HPC and make sure dependencies listed in this script are on the HPC system, else they need to be locally installed
Run the script by executing the following:
```
sbatch hpc_run_template.sh
```
If you want the slurm output file to be sent to the SLURM directory to avoid storage space issues on your interactive node, then in the #SBATCH -o slurmjob-%j line, replace it with the path to your SLURM directory:
```
#SBATCH -o /scratch/general/lustre/INPUT_USER_ID_HERE/slurmjob-%j
```

RUNNING THE PROGRAM:

Download your raw sequence data and place in a folder -- this folder should contain all the sequence data and nothing else

Make sure files follow a pattern naming scheme. For example, if you had 3 genetic backgrounds of ribosome profiling data, the naming scheme would go as follows:

ExperimentName_BackgroundA_FP.fastq(.qz)  
ExperimentName_BackgroundA_RNA.fastq(.qz)  
ExperimentName_BackgroundB_FP.fastq(.qz)  
ExperimentName_BackgroundB_RNA.fastq(.qz)  
ExperimentName_BackgroundC_FP.fastq(.qz)  
ExperimentName_BackgroundC_RNA.fastq(.qz)

If the sample names are replicates, their sample number needs to be indicated
If you want the final count table to be in a particular order and the samples ordered that way are not alphabetically, append a letter in front of the sample name to force this ordering.

ExperimentName_a_WT_FP.fastq(.qz)  
ExperimentName_a_WT_RNA.fastq(.qz)  
ExperimentName_b_exType_FP.fastq(.qz)  
ExperimentName_b_exType_RNA.fastq(.qz)

If you have replicates

ExperimentName_a_WT_1_FP.fastq(.qz)  
ExperimentName_a_WT_1_RNA.fastq(.qz)  
ExperimentName_a_WT_2_FP.fastq(.qz)  
ExperimentName_a_WT_2_RNA.fastq(.qz)
ExperimentName_b_exType_1_FP.fastq(.qz)  
ExperimentName_b_exType_1_RNA.fastq(.qz)  
ExperimentName_b_exType_2_FP.fastq(.qz)  
ExperimentName_b_exType_2_RNA.fastq(.qz)

If you are just running RNAseq files through the pipeline, you only need the RNA samples in your input directory and specify the rnaseq module:

ExperimentName_a_WT_1_RNA.fastq(.qz)   
ExperimentName_a_WT_2_RNA.fastq(.qz)
ExperimentName_b_exType_1_RNA.fastq(.qz)  
ExperimentName_b_exType_2_RNA.fastq(.qz)

ribopipe rnaseq -i input_directory ...

Create a folder for pipeline output. This folder should be blank.

In Terminal, run the pipeline:

ribopipe riboseq -i /path/to/input/data/folder -o /path/to/output/data/folder -a AACTGTAGGCACCATCAAT --samples 00m 05m -r yeast --experiment ribopipe_basic -p HISAT2 ...

For other customization inputs, in Terminal type:

ribopipe --help

After the pipeline is finished processing, the data (in the case of the RIBOSEQ option) can be accessed along the following path tree:

INTERPRETING THE OUTPUT:
Highlighted meta analyses will be output to the "highlights" folder in your indicated output directory.
RPF vs RNA: This summary plots the RPF counts vs mRNA counts for each gene in the samples. One would expect these metrics to be well-correlated between samples as translation is dependent on mRNA abundance. Super=translated genes are unusual. An r² value > 0.70 generally indicates a good library preparation.
Periodicity: As ribosomes take 3 nt/1 codon steps down the transcript, a periodicity in read location is expected for a good library. The X axis in these figures indicates the start codon region of all transcripts in the organism and the Y axis is relative abundance of reads at that position.
metaORF: This plot takes a meta view of all transcripts normalized to create a representative transcript (relative distance down the transcript, X axis). The Y axis indicates relative abundance of reads at that position down the representative transcript.

The current version includes an empty references folder where reference builds can be stored. Model organisms can be specified for use as references within the pipeline why using '-r human', '-r mouse', etc.
Reference folder after unzipping should be named "yeast_reference_HISAT2" or "human_reference_STAR", etc.

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
data		data
docs		docs
resources		resources
ribopipe		ribopipe
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Legacy archive, now renamed and found at https://github.com/XPRESSyourself/XPRESSpipe

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

j-berg/ribopipe

Folders and files

Latest commit

History

Repository files navigation

Legacy archive, now renamed and found at https://github.com/XPRESSyourself/XPRESSpipe

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages