Fix multiple sequencing platform validation #912

dialvarezs · 2025-11-04T08:27:00Z

Fix for an issue reported on Slack.
The source of the issue is that .collect() is returning a string, not a list, so .size() will give > 1 always.
I tried adding flat: false but didn't work either, so I used .toList().

PR checklist

…de = "all"

jfy133

Looks good, thanks for the rapid fix @dialvarezs !

jfy133 · 2025-11-04T08:57:12Z

Looks like quay.io is having problems...

dialvarezs · 2025-11-04T09:54:03Z

Not now BUSCO, not now! >:(

dialvarezs · 2025-11-04T10:26:47Z

@jfy133 I excluded bin_summary.tsv from the snapshot, because as I related in #905 it seems impossible to keep it stable, unless we add a sorting step aftewards.

If that's ok, I'll merge.

jfy133 · 2025-11-04T11:12:48Z

@jfy133 I excluded bin_summary.tsv from the snapshot, because as I related in #905 it seems impossible to keep it stable, unless we add a sorting step aftewards.

If that's ok, I'll merge.

I'm not following (sorry if it's illness-brain), What exactly is not stable? The md5sum?

I would not exclude entirely as it's a critical file. At a minimum check number of rows and maybe a few strings (you can have multiple assertions for the same file).

But is the problem that bin_summary.csv has rows in different order? If so I think your .sort() fix for the mag_depths script was a good one.

dialvarezs · 2025-11-04T14:19:03Z

@jfy133 My bad here, I did this in a hurry and for some reason I thought it was busco_summary instead of bin_summary.
And yes, the problem is the row ordering, but in the bin summary is solvable, it just need a sort in the Python script.
I'm reverting this now and aplying the proper fix.

jfy133 · 2025-11-04T15:27:51Z

tests/.nftignore

 GenomeBinning/MetaBAT2/unbinned/discarded/*.unbinned.pooled.fa.gz
 GenomeBinning/QC/CheckM2/**/DIAMOND_RESULTS.tsv
 GenomeBinning/QC/CheckM2/*/checkm2.log
+GenomeBinning/QC/busco_summary.tsv


I don't find this satisfying, can we check for the number of rows instead? But I'll give a preemptive approval on case that's not possible

Absolutely, I agree 100%.

But I guess we can address it later in a snapshot related PR? I can add it to my PR for test_assembly_input and copy it from all the snapshots using BUSCO.

The information of that file es contained on bin_summary.tsv, so we're checking it someway. The problem is that BUSCO in batch mode definitively can't produce stable results.

can we check for the number of rows instead

A quick heads up from my own metagenome pipeline - I have found that the binning tools can output varying numbers of bins between runs. So you need to be a bit clever and can't just hard-code a number

A quick heads up from my own metagenome pipeline - I have found that the binning tools can output varying numbers of bins between runs. So you need to be a bit clever and can't just hard-code a number

Our binning results seem stable (maybe because the dataset small and simple enough). We have snapshots for all the bin FASTA files, and the number is consistent too. So I guess it's not so bad to have the number harcoded until we change the dataset. But if you have a better idea @prototaxites, we can implement it that way.

Not specifically (I'm just comparing counts across files) - I just point it out because in my experience MetaBat2 was the big culprit and did not consistently put out the same files or in the same order...

But if it's working here that's fine!

dialvarezs added 2 commits November 4, 2025 05:24

Fix multiple sequencing platform validation when using binning_map_mo…

bc63970

…de = "all"

Update changelog

b8f946a

dialvarezs requested review from d4straub, jfy133, muabnezor and prototaxites as code owners November 4, 2025 08:27

Update changelog

87ca9ec

jfy133 approved these changes Nov 4, 2025

View reviewed changes

Merge branch 'dev' into fix-platform-validation

cb8ee67

Fix ordering

83a108c

dialvarezs force-pushed the fix-platform-validation branch from 77430f4 to 83a108c Compare November 4, 2025 14:43

jfy133 approved these changes Nov 4, 2025

View reviewed changes

dialvarezs mentioned this pull request Nov 4, 2025

add MetaBinner #881

Merged

11 tasks

Exclude busco_summary

5336881

jfy133 approved these changes Nov 4, 2025

View reviewed changes

dialvarezs merged commit b60e1f6 into nf-core:dev Nov 4, 2025
25 checks passed

dialvarezs mentioned this pull request Nov 5, 2025

Release: v5.2.0 Puce Pangolin #914

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multiple sequencing platform validation #912

Fix multiple sequencing platform validation #912

Uh oh!

dialvarezs commented Nov 4, 2025 •

edited

Loading

Uh oh!

jfy133 left a comment

Uh oh!

jfy133 commented Nov 4, 2025

Uh oh!

dialvarezs commented Nov 4, 2025 •

edited

Loading

Uh oh!

dialvarezs commented Nov 4, 2025

Uh oh!

jfy133 commented Nov 4, 2025

Uh oh!

dialvarezs commented Nov 4, 2025

Uh oh!

jfy133 Nov 4, 2025

Uh oh!

dialvarezs Nov 4, 2025

Uh oh!

prototaxites Nov 4, 2025

Uh oh!

dialvarezs Nov 4, 2025 •

edited

Loading

Uh oh!

prototaxites Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix multiple sequencing platform validation #912

Fix multiple sequencing platform validation #912

Uh oh!

Conversation

dialvarezs commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

jfy133 left a comment

Choose a reason for hiding this comment

Uh oh!

jfy133 commented Nov 4, 2025

Uh oh!

dialvarezs commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dialvarezs commented Nov 4, 2025

Uh oh!

jfy133 commented Nov 4, 2025

Uh oh!

dialvarezs commented Nov 4, 2025

Uh oh!

jfy133 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

dialvarezs Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

prototaxites Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

dialvarezs Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prototaxites Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dialvarezs commented Nov 4, 2025 •

edited

Loading

dialvarezs commented Nov 4, 2025 •

edited

Loading

dialvarezs Nov 4, 2025 •

edited

Loading