README

This repository holds a nextflow pipeline for analysing gene expression studies. The experimental design is build on multiple generations (F0, F1, F2, etc) and doses (0, 10, 100, 1000, etc). The pipeline expect quantification files (quant.sf) as input generated with Salmon and outputs tables of (1) differentially expressed genes and (2) gene ontology analysis results. These tables come in Rds file format containing all generations and doses in a single file, respectively. The pipeline also outputs the tables as excel files to be included as supplementary tables in a scientic report.

Note

This pipeline is per default setup for the mouse genome (GRCm39)

Quantification files

To generate methylation coverage files from sequencing files refer to nf-core/rnaseq pipeline

Differential expression

Differential expression was analysed with the R-package edgeR, which utilizes negative binomial distributions and generalized linear model as statistical method. FDR < 0.05 was used for multiple testing correction (Benjamini-Hochberg qvalue). Default settings were used for most of the functions expect; estimateDisp(robust = TRUE) and glmQLFit(robust = TRUE). Summaries findings in a long table w/ a significant gene as a unique row, add results from the differentail gene expression analysis as columns.

Gene ontology analysis

To investigate if any biological functions, processes or pathways are enriched (over-represented) the Over Representation Analysis (ORA) Boyle et al., 2004 method is used. ORA uses hypergeometric distribution and compares the differentially methylated genes with all genes in the dataset. The p-values are adjusted to q-values for multiple corretion (significance threshold qvalue < 0.2).

Enrichment is analysed in three databases; (1) Gene Ontology (GO), (2) Kyoto Encyclopedia of Genes and Genomes (KEGG), and Reactome pathways. GO and KEGG enrichment are tested with the R-package clusterProfiler, Yu et al., 2012, Wu et al., 2021. The reactome pathways are tested with the R-package ReactomePA, Yu et al., 2016.

Reproducibility

Run the pipeline

#!/bin/bash -l

export NXF_HOME=".nextflow/"

nextflow pull andreyhgl/transcriptome-analysis

nextflow run andreyhgl/transcriptome-analysis \
  --quant_path 'path-to-quant-files' \
  --metadata 'path-to-metadata.csv' \
  --tx2gene 'path-to-tx2gene.tsv' \
  -profile local \
  -resume

Singularity containers

For reproducibility this pipeline uses two singularity containers, which can be downloaded from the Cloud Library. The RNAseq container holds most of the R-packages used in the analysis, while gene-ontology container holds gene ontology related R-packages

# apptainer (instead of singulartiy) also works

IMAGE1='library://andreyhgl/singularity-r/rnaseq'
IMAGE2='library://andreyhgl/singularity-r/gene-ontology'

singularity pull ${IMAGE1}
singularity pull ${IMAGE2}

To run scripts manually with the containers use the exec flag or run the script interactively with shell.

# execute script
singularity exec ${IMAGE} <scriptfile>

# run script interactively
singularity shell ${IMAGE}
$ Rscript <scriptfile>

The pipeline

The nextflow pipeline produce the following:

Ensembl database table containing gene annotations
DGEList object
Quality control plots: PCA, distance plots
Differentially expressed genes table
Gene ontology analysis
Supplementary files (plots, excel-tables)
Concatinated tables (for easy import for results report)

Preparation

Metadata

Setup the metadata.csv to look like the following:

gen,id,treatment,...
F0,F0_1,0,...
F0,F0_2,0,...
F1,F1_3,10,...
F2,F2_4,100,...

Each row represents a sample in the column order: generation, sample id and treatment/dose.

Parameters

The pipeline accepts three parameters:

Experimental design:

generations (F0, F1, F2, etc)
doses (0, 10, 100, 1000, etc)
genomic features (CpG-sites, Promoters, CpG-islands)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bin		bin
conf		conf
modules		modules
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README

Reproducibility

Run the pipeline

The pipeline

Preparation

Metadata

Parameters

About

Uh oh!

Languages

andreyhgl/transcriptome-analysis

Folders and files

Latest commit

History

Repository files navigation

README

Reproducibility

Run the pipeline

The pipeline

Preparation

Metadata

Parameters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages