Warning
postoga is dependent from TOGA. Any changes in TOGA will have a repercusion here. If you found any bug/errors, please report them here. This project is in constant development, any desired features are welcome!
The post-TOGA processing pipeline.
- Re-implementation of postoga to match TOGA2.0 output
- Includes self-owned rust bed to gtf coverters through rustools
- Forces bed2gtf, bed2gff and gxf2bed installation for quick access
- Adds --extract to extract filtered projections from codon and protein alignments
- Implements --engine option to use polars
- Now manages configuration and test through 'make configure' and 'make test'
- Adds license
- Drops --skip argument and adds --plot argument to plot stats [currently broken]
- Adds additional BUSCO and completeness stats to the main log file
To use postoga, just:
Clone the repository
# clone the repository
git clone --recursive https://github.com/alejandrogzi/postoga.git
cd postogaActivate the environment and configure binaries
pip install hatch
make configureRun test to confirm functionality [you can run the test directly from the configuration step]
make testIf you see something like this at then end, postoga is ready!:
> postoga: the post-TOGA processing pipeline
> version: 0.9.3-devel[2024-10-15 12:25:05] - INFO: postoga started!
[2024-10-15 12:25:05] - INFO: running in mode base with arguments: {'mode': 'base', 'outdir': '/Users/alejandrogzi/Documents/projects/postoga/POSTOGA_TEST', 'togadir': '/Users/alejandrogzi/Documents/projects/postoga/supply/test', 'by_class': 'I,PI', 'by_rel': None, 'threshold': 0.95, 'to': 'gtf', 'assembly_qual': PosixPath('/Users/alejandrogzi/Documents/projects/postoga/supply/Ancestral_placental_complete.txt'), 'species': 'human', 'source': 'ensembl', 'phylo': 'mammals', 'plot': False, 'paralog': None, 'isoforms': None, 'engine': 'pandas'}
[2024-10-15 12:25:05] - INFO: found 52 projections, 49 unique transcripts, 44 unique genes
Here is a descrption of postoga features:
Tip
If the only thing you want to do is apply some filters to a TOGA result or convert results to GTF/GFF files, I recommend the following command:
./postoga.py base \
--togadir /your/TOGA/dir \
--outdir /your/out/dir \
-bc [YOUR CLASSES] \
-br [YOUR RELATIONS] \
-th [YOUR THRESHOLD] \
-to [YOUR FORMAT GTF/GFF/BED] \usage: postoga.py [-h] {base,haplotype}
positional arguments:
{base,haplotype} Select mode
base Base mode
haplotype Haplotype mode
postoga.py base [-h] --outdir OUTDIR --togadir TOGADIR [-bc BY_CLASS] [-br BY_REL]
[-th THRESHOLD] -to {gtf,gff,bed} [-aq ASSEMBLY_QUAL]
[-sp {human,mouse,chicken}] [-src {ensembl,gene_name,entrez}]
[-phy {mammals,birds}] [-s] [-par PARALOG]
optional arguments:
-h, --help show this help message and exit
--outdir OUTDIR, -o OUTDIR
Path to posTOGA output directory
--togadir TOGADIR, --td TOGADIR
Path to TOGA results directory
-bc BY_CLASS, --by-class BY_CLASS
Filter parameter to only include certain orthology classes (I, PI, UL, M, PM,
L, UL)
-br BY_REL, --by-rel BY_REL
Filter parameter to only include certain orthology relationships (o2o, o2m,
m2m, m2m, o2z)
-th THRESHOLD, --threshold THRESHOLD
Filter parameter to preserve orthology scores greater or equal to a given
threshold (0.0 - 1.0)
-to {gtf,gff,bed}, --to {gtf,gff,bed}
Specify the conversion format for .bed (query_annotation/filtered) file (gtf,
gff3) or just keep it as .bed (bed)
-aq ASSEMBLY_QUAL, --assembly_qual ASSEMBLY_QUAL
Calculate assembly quality based on a list of genes provided by the user
(default: Ancestral_placental.txt)
-sp {human,mouse,chicken}, --species {human,mouse,chicken}
Species name to be used as a reference for the assembly quality calculation
(default: human)
-src {ensembl,gene_name,entrez}, --source {ensembl,gene_name,entrez}
Source of the ancestral gene names (default: ENSG)
-phy {mammals,birds}, --phylo {mammals,birds}
Phylogenetic group of your species (default: mammals)
-par PARALOG, --paralog PARALOG
Filter parameter to preserve transcripts with paralog projection probabilities
less or equal to a given threshold (0.0 - 1.0)
-iso ISOFORMS, --isoforms ISOFORMS
Path to a custom isoform table (default: None)
-e {pandas,polars}, --engine {pandas,polars}
Database engine to create inner db representations (default: pandas)
-p, --plot Flag to plot statistics about the filtered genes (default: False)
-ext [{query,reference}], --extract [{query,reference}]
Flag or option to extract sequences (only codon and protein alignments) from the filtered genes. Can be 'query', 'reference', or just set as a flag
(default: False). When used as a flag extracting 'query' sequences is default.
postoga.py haplotype [-h] --outdir OUTDIR -hp HAPLOTYPE_DIR [-r RULE] [-s {query,loss}]
optional arguments:
-h, --help show this help message and exit
--outdir OUTDIR, -o OUTDIR
Path to posTOGA output directory
-hp HAPLOTYPE_DIR, --haplotype_dir HAPLOTYPE_DIR
Path to TOGA results directories separated by commas (path1,path2,path3)
-r RULE, --rule RULE Rule to merge haplotype assemblies (default: I>PI>UL>L>M>PM>PG>abs)
-s {query,loss}, --source {query,loss}
Source of the haplotype classes (query, loss)