Skip to content

Conversation

@RoelKluin
Copy link
Contributor

In Casava 1.8 the fastq output changed, the name has a space which bwa
wasn't parsing correctly. This patch fixes it and enables bwa to filter
sequences marked :Y: by Casava. The tag is removed from the output.

Signed-off-by: RoelKluin [email protected]

wasn't parsing correctly. This patch fixes that and enables bwa to filter
sequences marked by Casava, removing this tag from the output.

Signed-off-by: RoelKluin <[email protected]>
@lh3
Copy link
Owner

lh3 commented Jul 8, 2011

Thanks. Nonetheless, you should not modify kseq.h. By convention, a FASTA name should not contain any space. Anything beyond space is comment. Allowing space in sequence names as is in the modified kseq.h will cause problems to other input sequences. If you want to get the string after space in the fasta/q header lines, you should check "kseq_t::comment". Could you help to modify using "kseq_t::comment" without touching kseq.h? Thanks.

…which bwa"

This reverts commit 36cd4f9.

The comment shouldn't be included in the sequence name.
In Casava 1.8 the fastq output changed. e.g.

@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@

The part after the space, treated as comment by bwa, contains the fields:
<read number>:<is filtered>:<control number>:<barcode sequence>

With `Y' Casava indicates that a sequence should be filtered. This patch
enables bwa, with an -Y flag, to filter these sequences.

Signed-off-by: Roel Kluin <[email protected]>
@RoelKluin RoelKluin closed this Jul 10, 2011
pmarks added a commit to 10XGenomics/bwa that referenced this pull request Jan 4, 2019
ksprintf is inline, to avoid collision with htslib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants