-
Notifications
You must be signed in to change notification settings - Fork 144
Fix multiple sequencing platform validation #912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the rapid fix @dialvarezs !
|
Looks like |
I'm not following (sorry if it's illness-brain), What exactly is not stable? The md5sum? I would not exclude entirely as it's a critical file. At a minimum check number of rows and maybe a few strings (you can have multiple assertions for the same file). But is the problem that |
|
@jfy133 My bad here, I did this in a hurry and for some reason I thought it was |
77430f4 to
83a108c
Compare
| GenomeBinning/MetaBAT2/unbinned/discarded/*.unbinned.pooled.fa.gz | ||
| GenomeBinning/QC/CheckM2/**/DIAMOND_RESULTS.tsv | ||
| GenomeBinning/QC/CheckM2/*/checkm2.log | ||
| GenomeBinning/QC/busco_summary.tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find this satisfying, can we check for the number of rows instead? But I'll give a preemptive approval on case that's not possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I agree 100%.
But I guess we can address it later in a snapshot related PR? I can add it to my PR for test_assembly_input and copy it from all the snapshots using BUSCO.
The information of that file es contained on bin_summary.tsv, so we're checking it someway. The problem is that BUSCO in batch mode definitively can't produce stable results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check for the number of rows instead
A quick heads up from my own metagenome pipeline - I have found that the binning tools can output varying numbers of bins between runs. So you need to be a bit clever and can't just hard-code a number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick heads up from my own metagenome pipeline - I have found that the binning tools can output varying numbers of bins between runs. So you need to be a bit clever and can't just hard-code a number
Our binning results seem stable (maybe because the dataset small and simple enough). We have snapshots for all the bin FASTA files, and the number is consistent too. So I guess it's not so bad to have the number harcoded until we change the dataset. But if you have a better idea @prototaxites, we can implement it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specifically (I'm just comparing counts across files) - I just point it out because in my experience MetaBat2 was the big culprit and did not consistently put out the same files or in the same order...
But if it's working here that's fine!

Fix for an issue reported on Slack.
The source of the issue is that
.collect()is returning a string, not a list, so.size()will give > 1 always.I tried adding
flat: falsebut didn't work either, so I used.toList().PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).