00_Settings.sh: Various settings for pipeline
- Dependencies
- create a conda environment with:
conda create --name DEU conda activate DEU conda install sra-tools samtools star r-base
- create a conda environment with:
- Parameters
accesions: a list of accessions to pullRNA_SEQ_DIR: This is where a directory containing RNA-seq FASTQ files is locatedOUT_DIR: This is where the output directory should be locatedMAX_CPUS: Number of threads to useMAX_MEM_GB: Maximum memory in GB to useHUMAN_GENOME_REF: path to.fnaFASTA reference fileHUMAN_GENOME_GTF: path to the GTF annotation fileHUMAN_RNA_REF: unusedsjdbOverhang: Set to the RNA read-length - 1 more info
01_Download_Accessions.sh: A script to download a given set of accessions with sra-tools
- Output
- FASTQs from given accesssions
02_Alignment.sh: A script to align the resulting FASTQs with STAR
- Output
- BAMs for each sample; indexed and sorted
DEU_1.R: A script to perform Differential Exon Usage analysis with DEXSeq
- Input
- BAMs
- GTF for reference
- Output
DEXSeqReport: StandardDEXSeqreport of all genes and their Differential Exon Usage (DEU) more infodxr.rds: a R dataset object containing theDEXSeqResultsobject generated during analysis
DEU_2.R: A script to analyze the resulting DEXSeqResults .rds file
- Input
-
DEXSeqResults: object generated during previous analysis
-
- Output
-
DEU.xlsx: Contains two sheets:-
Combined Genes: For each gene, lists the following:-
gene: gene name / alias -
gene_desc: short description of the gene -
total_exons: total number of exons in the gene -
pvalue_sig_exons < 0.05: number of exons with p-values that are significant in that gene given the threshold -
pvalue_prop_sig: a proportion;$\frac{total_exons}{pvalue_sig_exons < 0.05}$ -
pvalue_na: number of exons without apvalue -
comb_pvalue: the combined exonpvalueafter using Fisher's method -
comb_pvalue_df: degrees of freedom when combining each exonpvalue -
padj_sig_exons < 0.05: number of exons with adjusted p-values that are significant in that gene given the threshold -
padj_prop_sig: a proportion;$\frac{total_exons}{padj_sig_exons < 0.05}$ -
padj_na: number of exons without apadj -
comb_padj: the combined exonpadjafter using Fisher's method -
comb_padj_df: degrees of freedom when combining each exonpadj
-
-
All Genes: For each exon in each gene, lists the followning:-
groupID: gene name / alias -
gene_desc: short description of the gene -
featureID: feature ID -
exonBaseMean: exon base mean -
dispersion: dispersion -
stat: statistic -
pvalue: p-value -
padj: adjusted p-value -
day_0: condition #1 -
day_100: condition #2 -
log2fold_day_100_day_0: the$log_2$ fold change of the conditions; in this case$\frac{day_100}{day_0}$
-
-
-