Skip to content

Conversation

@harper357
Copy link
Contributor

@harper357 harper357 commented Jul 26, 2025

This is pretty simple PR (hopefully) to update the MultiQC config to include CheckM, checkm2, and GTDB.

See #739 .

PR checklist

  • This comment contains a description of changes (with reason).
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

These show where I think I need to add things.
@harper357
Copy link
Contributor Author

I think this is ready for review. There is one caveat though, it turns out that when I made the MultiQC modules, I didn't have them add their columns to the general stats table (and BUSCO is the same way), so I am not sure if which is the best path for that:

A) don't worry about it
B) update the modules, then update this config again when MultiQC is updated
C) update this config to be work with an updated set of modules, then update the modules

@harper357
Copy link
Contributor Author

I think this is ready for review. There is one caveat though, it turns out that when I made the MultiQC modules, I didn't have them add their columns to the general stats table (and BUSCO is the same way), so I am not sure if which is the best path for that:

A) don't worry about it
B) update the modules, then update this config again when MultiQC is updated
C) update this config to be work with an updated set of modules, then update the modules

Also, I tested it directly with MultiQC instead of nextflow/running the pipeline. Happy to do that, but it might take me a little while to get a machine set up for that.

@harper357 harper357 marked this pull request as ready for review July 28, 2025 23:34
@charles-plessy
Copy link

Hello, is there a reason you do not give name and info information for the MultiQC modules you added? Most of the others appear to have it…

@harper357
Copy link
Contributor Author

@charles-plessy MultiQC appears to pull that data from the MultiQC module itself, so you only need to have it in the config if you want to more/different text.

Copy link

@charles-plessy charles-plessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jfy133
Copy link
Member

jfy133 commented Jul 29, 2025

Thanks for this @harper357 !

I think this is ready for review. There is one caveat though, it turns out that when I made the MultiQC modules, I didn't have them add their columns to the general stats table (and BUSCO is the same way), so I am not sure if which is the best path for that:

A) don't worry about it B) update the modules, then update this config again when MultiQC is updated C) update this config to be work with an updated set of modules, then update the modules

I would go for B or C, although I think in the case C would be best (as then you technically don't have to come back to the pipeline - we will just get the updates when we pull the next container) - unless MultiQC complains (in which case go with B)
I think the core info is still too important for A, as in an ideal world, I would love to have the general stats table to have sufficient info that a user can evaluate the data against MIMAG criteria with a (relatively) quick glance.

Also, I tested it directly with MultiQC instead of nextflow/running the pipeline. Happy to do that, but it might take me a little while to get a machine set up for that.

You should just be able to run it in GitPod, then download the MultiQC report to check (except for GTDBTK currently)

@prototaxites
Copy link
Contributor

Question, since I don't immediately see this in the main workflow: do you also need to merge in the required outputs from [checkm, checkm2, gtdbtk] into the MultiQC input channel?

@jfy133
Copy link
Member

jfy133 commented Jul 29, 2025

Question, since I don't immediately see this in the main workflow: do you also need to merge in the required outputs from [checkm, checkm2, gtdbtk] into the MultiQC input channel?

Good point, I don't think these are being mixed in already indeed

@harper357
Copy link
Contributor Author

@prototaxites Whoops, I was even looking at that subworkflow and didn't notice that. I will fix that later today.

@jfy133 Ok, I will do option C. I think it is just adding a single command for each of the modules in MultiQC, so it should be easy enough.

@harper357
Copy link
Contributor Author

Sorry this is taking so long. I had some trouble figuring out why non-BUSCO outputs weren't being seen by MultiQC, and then GTDB database is taking forever to download so I can't run the last test run yet. I should have it all wrapped up sometime tomorrow, though.

@jfy133
Copy link
Member

jfy133 commented Jul 31, 2025

Sorry this is taking so long. I had some trouble figuring out why non-BUSCO outputs weren't being seen by MultiQC, and then GTDB database is taking forever to download so I can't run the last test run yet. I should have it all wrapped up sometime tomorrow, though.

Absolutely no rush, most of us are on holiday and/or conferencing!

For GTDB we have a mini database you can maybe try, however with our current test data files it doesn't result in any hits.

But if you want I can point you to it so at least you can try.

Alternatively I'm back from my conference on Saturday so could also run with my copy of GTDB if it hasn't finished downloading for you then

@harper357
Copy link
Contributor Author

Ok, I ran the pipeline with -profile bin_refinement,docker and changed params.binqc_tool to checkm and then checkm2. Then after the database finished downloading, I ran it with params.skip_gtdbtk set to false.

All three worked as expected, adding the respective outputs to the MultiQC Report.

@jfy133 I tried to use the mock GTDB-Tk database on nf-core/test-datasets (after manually updating the module to 2.4.1) but it ended up giving me errors about not finding a file

@dialvarezs
Copy link
Member

@nf-core-bot fix linting

Copy link
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good for me. Ran a full test with GTDBTk and CheckM2, and it's working as expected.
image

The only issue I see (but this is for the MultiQC module, not for this PR) is that displaying only the species can leave a lot of empty values. Maybe using the last non-empty taxon would be better.

@harper357 remember to update the changelog and add your name the contributors in the readme. With that ready, it would be ready to merge.

@harper357 harper357 merged commit 49de2bd into nf-core:dev Aug 6, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants