Read one or more FASTQ files, fastqe will compute quality stats for each file and print those stats as emoji... for some reason.
Given a fastq file in Illumina 1.8+/Sanger format, calculate the mean (rounded) score for each position and print a corresponding emoji!
Latest release versions of fastqe
are available via pip
or BioConda:
pip install fastqe
conda install -c bioconda fastqe
Development version can be isntall from this repository in the master
branch.
fastqe
can display usage information on the command line via the -h
or --help
argument:
usage: fastqe [-h] [--minlen N] [--scale] [--version] [--mean]
[--custom CUSTOM_DICT] [--bin] [--noemoji] [--min] [--max]
[--output OUTPUT_FILE] [--long READ_LENGTH] [--log LOG_FILE]
[FASTQ_FILE [FASTQ_FILE ...]]
Read one or more FASTQ files, compute quality stats for each file, print as
emoji... for some reason.π
positional arguments:
FASTQ_FILE Input FASTQ files
optional arguments:
-h, --help show this help message and exit
--minlen N Minimum length sequence to include in stats (default
0)
--scale show relevant scale in output
--version show program's version number and exit
--mean show mean quality per position (DEFAULT)
--custom CUSTOM_DICT use a mapping of custom emoji to quality in
CUSTOM_DICT (ππ΄)
--bin use binned scores (π«ππ©β οΈππππ)
--noemoji use mapping without emoji (βββββ
βββ)
--min show minimum quality per position
--max show maximum quality per position
--output OUTPUT_FILE write output to OUTPUT_FILE instead of stdout
--long READ_LENGTH enable long reads up to READ_LENGTH bp long
--log LOG_FILE record program progress in LOG_FILE
fastqe
will summarise FASTQ files to display the max, mean and minumum quality using emoji. To convert a file into this format, rather than summarise, you can use the companion program biomojify
that will convert both sequence and quality information to emoji:
$ cat test.fq
@ Sequence
GTGCCAGCCGCCGCGGTAGTCCGACGTGGC
+
GGGGGGGGGGGGGGGGGGGGGG!@#$%&%(
$ biomojify fastq test.fq
βΆοΈ Sequence
ππ
ππ½π½π₯ππ½π½ππ½π½ππ½πππ
π₯ππ
π½π½ππ₯π½ππ
πππ½
πππππππππππππππππππππππ«ππΊππ
πΎπ
π
Intall with pip install biomojify
, and see the biomojify
page for more information: https://github.com/fastqe/biomojify/
fastqe test.fastq
fastqe --min test.fastq
fastqe --max test.fastq
fastqe --max -min -bin test.fastq
This lesson introduces NGS process in the command line using by using the results of FASTQE before and after quality filerting
using fastp
:
https://qubeshub.org/publications/1092/2
Rachael St. Jacques, Max Maza, Sabrina Robertson, Guoqing Lu, Andrew Lonsdale, Ray A Enke (2019).
A Fun Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!.
NIBLSE Incubator: Intro to Command Line Coding Genomics Analysis, (Version 2.0).
QUBES Educational Resources. doi:10.25334/Q4D172
A Galaxy wrapper is available from the IUC toolshed. Contact your Galaxy Admin if you would like to have it installed. A Galaxy Tutorial using FASTQE is in development.
FASTQE started out as part of PyCon Au presentations:
- PyCon Au 2016 - Python for science, side projects and stuff!
- PyCon Au 2017 - Lightning Talk
- BCC 2020 - Short Presentaion
- version 0.0.1 at PyCon Au 2016:
- Mean position per read
- version 0.0.2 at PyconAu 2017:
- update emoji map
- Max and minimum scores per position added
- Wrapper code based on early version of Bionitio added
- prepare for PyPi
- version 0.1.0 July 2018
- clean up code
- add binning
- version 0.2.6 July 2020
- refactor code
- add long read support with --long
- add --noemoji for block-based output on systems that don't support emoji
- add --custom for user-defined mapping to emoji
- add --output to redirect to file instead of stdout
- add gzip support
- add redirect from stdin support
- fix bug of dropping position if some sequences are only 0 quality
- Galaxy Wrapper created July 2020
biomojify
created July 2020- version 0.2.7 2021
- bugfix
- version 0.3.1 2023
- HTML reporting for Galaxy
- version 0.3.3 2024
- Update emoji that render in default fonts with alternatives
Reads up to 500bp onlyRead length above 500bp allowed but must be set by user with--long MAX_LENGTH
- Same emoji for all scores above 41
This program is released as open source software under the terms of BSD License
- pyemojify
- BioPython
- NumPy
- Rearrange emoji to use more realistic ranges (i.e > 60 use uncommon emoji) and remove inconsistencies
-
Add conversion to emoji sequence format, with/without binning, for compressed fastq datafits into https://github.com/fastqe/biomojify/ - Rewrite conversion to standalone function for use in iPython etc.
- Teaching resources
- Test data and unit tests
-
Add FASTA mode for nucleotide and proteins emojisee https://github.com/fastqe/biomojify/ - MultiQC plugin
-
Galaxy Wrapper: available form the IUC toolshed
Rather convert to emoji than summarise? We've just started biomojify
for that: https://github.com/fastqe/biomojify/
- Andrew Lonsdale
- BjΓΆrn GrΓΌning
- Catherine Bromhead
- Clare Sloggett
- Clarissa Womack
- Helena Rasche
- Maria Doyle
- Michael Franklin
- Nicola Soranzo
- Phil Ewels
Use the --scale
option to include in output.
0 ! π«
1 " β
2 # πΊ
3 $ π
4 % π
5 & πΎ
6 ' πΏ
7 ( π
8 ) π»
9 * π
10 + π
11 , π
12 - π΅
13 . πΏ
14 / πΎ
15 0 π
16 1 π£
17 2 π₯
18 3 π‘
19 4 π©
20 5 π¨
21 6 π
22 7 π
23 8 π
24 9 π
25 : π
26 ; π
27 < π
28 = π
29 > π
30 ? π
31 @ π
32 A π
33 B π
34 C π
35 D π
36 E π
37 F π
38 G π
39 H π
40 I π
41 J π
Binned scale:
0 ! π«
1 " π«
2 # π
3 $ π
4 % π
5 & π
6 ' π
7 ( π
8 ) π
9 * π
10 + π©
11 , π©
12 - π©
13 . π©
14 / π©
15 0 π©
16 1 π©
17 2 π©
18 3 π©
19 4 π©
20 5 π¨
21 6 π¨
22 7 π¨
23 8 π¨
24 9 π¨
25 : π
26 ; π
27 < π
28 = π
29 > π
30 ? π
31 @ π
32 A π
33 B π
34 C π
35 D π
36 E π
37 F π
38 G π
39 H π
40 I π
41 J π
Use a dictionary of Pyemojify mappings in a text file instead of built in emoji choices:
{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': ':japanese_goblin:',
'$': ':broken_heart:'
}
Emoji characters can also be used directlty instead (experimental):
{
'#': ':no_entry_sign:',
'\"': ':x:',
'!': 'πΏ',
'$': ':broken_heart:'
}