Skip to content

Conversation

@jeffe107
Copy link
Member

Hello,

I apologize for creating a new PR, I couldn't force the previous one to take the changes from dev branch. I include again my message, and in the next comment I will post your comments from the previous PR.

According to issue [#859 ] to produce the file that is used as input for BIgMAG I have included the following:

I didn't want to mess with the current state of the pipeline, so I included a subworkflow that handles the execution in this manner:

  • If BUSCO (default tool) is selected as QC tool, it will trigger CheckM2. If the user selects CheckM or CheckM2, BUSCO will be executed then.
  • It runs GUNC by default. It is aware that GUNC can be run within the subworkflow BIN_QC, so it adjusted to that situation.
  • It takes as input the global bin_summary file.
  • Two modules were added: concat_bigmag.nf and bigmag_summary.nf. The first performs the merge of the TSV files just like CSVTK_CONCAT. I had to create it because if I call the same module, it would overwrite the summary file in the GenomeBinning folder, but in essence they are the same module. The second one joins all of the files (bin_summary, alternative_tool and gunc_summary). bigmag_summary.nf executes the script bigmag_summary.py found in the folder bin.

PR checklist

  • [ ✓] This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • [ ✓] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • [ ✓] Usage Documentation in docs/usage.md is updated.
  • [ ✓] Output Documentation in docs/output.md is updated.
  • [ ✓] CHANGELOG.md is updated.
  • [ ✓] README.md is updated (including new tool citations and authors/contributors).

@jeffe107
Copy link
Member Author

@harper357 I think your PR #842 might be relevant for this

No pressure (just trying to work out priorities), but I was wondering how far off you think your PR would be?

If so we could maybe get that in first and it might make @jeffe107 's life easier?

What do you think? If you think it'll take you a while still then we can just proceed with Jeferyd's system here and later on once your PR is merged in change the logic

@jfy133 I am hoping to have it done this week as long as life stuff doesn't eat up all my time. I will try to look at BIgMAG to make sure I am not doing anything that will make it harder to add it.

@jfy133
Copy link
Member

jfy133 commented Sep 17, 2025

@nf-core-bot fix linting

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise

First pass for checking adherence to nf-core guidelines/the mag conventions (i.e., without testing), but it looks really good for a first contribution, thank you @jeffe107 👏

@@ -0,0 +1,55 @@
process CONCAT_BIGMAG {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(if-minor)
Is this a duplicate of modules/nf-core/csvtk/concat? If so, you don't need to write it again, you can just import the module in your subworkflow and rename it with the as alias (see the subworkflows/local/assembly_shortread` for how we do that to import SPADES as METASPADES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. It does exactly the same, the difference relies on that modules/nf-core/csvtk/concat takes binqc_tool as parameter to generate the output file. If I call the same module on bigmag subworkflow, it would overwrite the file generated during BIN_QC subworkflow. As a result, I created CONCAT_BIGMAG to rename the binqc_tool and create the file for the tool that bigmag subworkflow is executing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would it re-write the file? You mean in --outdir? If that's the case we can just change the resulting file's name using publishDir for an aliased CSVTK module in modules.config :)

Unless I misunderstand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it would overwrite the file in --outdir. So, great, I agree with the solution you propose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffe107 will you then remove this module, if it still necessary? It does not appear to be usd anywhere anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers: this is generally redundant, but we will wait for @harper357 's PR before we simplify this subworkflow

@prototaxites
Copy link
Contributor

prototaxites commented Sep 19, 2025

Hi all, a couple of initial thoughts on a quick passs-through:

If BUSCO (default tool) is selected as QC tool, it will trigger CheckM2. If the user selects CheckM or CheckM2, BUSCO will be executed then.

It runs GUNC by default. It is aware that GUNC can be run within the subworkflow BIN_QC, so it adjusted to that situation.

I would prefer if we just transparently refused to run the subworkflow if the required tools are not enabled. We've gone to some effort to have most of the tools be configurable with flags, and I wouldn't want to pre-empt the user's choices in this matter. We should just print a warning if the BigMag subworkflow can't run.

Two modules were added: concat_bigmag.nf and bigmag_summary.nf. The first performs the merge of the TSV files just like CSVTK_CONCAT. I had to create it because if I call the same module, it would overwrite the summary file in the GenomeBinning folder, but in essence they are the same module. The second one joins all of the files (bin_summary, alternative_tool and gunc_summary). bigmag_summary.nf executes the script bigmag_summary.py found in the folder bin.

Unless I'm missing something, as James says, you can definitely just re-use CSVTK_CONCAT. You just import it:

include { CSVTK_CONCAT as CSVTK_CONCAT_BIGMAG } from '../../../modules/nf-core/csvtk/concat/main'

And then you can reference CSVTK_CONCAT_BIGMAG in the config (and it shouldn't pick up the config from other CSVTK_CONCAT processes in the pipeline).

@jfy133
Copy link
Member

jfy133 commented Oct 3, 2025

OK I'm back from the nf-core core team retreat and a conference, sorry for the delay and thanks for the patience @jeffe107 !

Looking through this PR again, I do think we should work off of @harper357 's PR. I've offered to take that over so we can get that in and have something easier for you to work with. So I'll focus on that next week and then we can re-jig this PR based on those changes :)

And I agree with @prototaxites that I (once Greg's PR is in), that we should just error on pipeline initialisation if BIgMAG is requested but the necessary QC tools are not activated.

@jeffe107
Copy link
Member Author

jeffe107 commented Oct 3, 2025

I agree with @jfy133 PR #842 can easily solve the logic to generate the file for BIgMAG. Let's wait then to solve that one to proceed with this.

@dialvarezs
Copy link
Member

@nf-core-bot fix linting

nf-core-bot and others added 2 commits November 28, 2025 15:17
@dialvarezs
Copy link
Member

Sorry @jeffe107, I merged the pending commits from dev in an attempt to fix the snapshot mismatch

@jeffe107
Copy link
Member Author

jeffe107 commented Nov 28, 2025

Hello,

I applied the requested changes. @jfy133, to fix the problem with the nf-test, do I have to run each test where the md5 of the bin_summary.tsv is checked and then modify such md5 value in the *.snap files?

@dialvarezs
Copy link
Member

dialvarezs commented Nov 28, 2025

Yes! I see that you added Genomize_size to the bin_summary.tsv, so that should be the difference between the snapshots.
Update the nf-test snapshot of tests/test_alternatives.nf.test (that's the only one that runs CheckM2) and with that the issue should be resolved!

EDIT: I updated the snapshot for you.

@jeffe107
Copy link
Member Author

Hello,

Thank you @dialvarezs. I noticed that you already updated the md5 for the test_alternatives which was the one generating the conflict (I was doing it myself when you updated it 😔). With this, I think we are ready to proceed @jfy133.

Copy link
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!
I did a quick run to test the new functionality and didn’t find any issues. The file was generated correctly.
I didn’t test running BigMAG, but I’m assuming you have that well covered.

@jfy133
Copy link
Member

jfy133 commented Dec 1, 2025

Just running running the test, (and I'll try loading into bigMag) and then I think we are good to go!

@jfy133
Copy link
Member

jfy133 commented Dec 1, 2025

@jeffe107 I do get a file, but if I follow the instructions on the BIgMAG instructions, I get the following error

(BIgMAG) james@bionb103:~/git/scratch $ BIgMAG/app_lite.py /home/james/git/scratch/mag/testing/results/GenomeBinning/BIgMAG/bigmag_summary.tsv -o tst.html
/home/james/bin/miniforge3/envs/BIgMAG/lib/python3.11/site-packages/pingouin/parametric.py:1360: RuntimeWarning:

divide by zero encountered in double_scalars

Traceback (most recent call last):
  File "/home/james/bin/miniforge3/envs/BIgMAG/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 155, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index_class_helper.pxi", line 19, in pandas._libs.index.Float64Engine._check_type
KeyError: True

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/james/git/scratch/BIgMAG/app_lite.py", line 954, in <module>
    figure=figure_heatmap(),
           ^^^^^^^^^^^^^^^^
  File "/home/james/git/scratch/BIgMAG/app_lite.py", line 433, in figure_heatmap
    pass_GUNC = temp_df['pass.GUNC'].value_counts()[True]/len(temp_df)
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/james/bin/miniforge3/envs/BIgMAG/lib/python3.11/site-packages/pandas/core/series.py", line 1007, in __getitem__
    return self._get_value(key)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/james/bin/miniforge3/envs/BIgMAG/lib/python3.11/site-packages/pandas/core/series.py", line 1116, in _get_value
    loc = self.index.get_loc(label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/james/bin/miniforge3/envs/BIgMAG/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
    raise KeyError(key) from err
KeyError: True

Is that expected? Installed BIgMAG following teh conda instructions

bigmag_summary.tsv

@jeffe107
Copy link
Member Author

jeffe107 commented Dec 1, 2025

Hello,

Thank you both for your help! Yes, it happens from the GUNC test, the column where there should be values regarding whether they pass the test or not is empty. If you use the regular version of the dashboard, you should see some of the plots, although with errors to build the heatmap and some of them empty because there is no enough data.

@jfy133
Copy link
Member

jfy133 commented Dec 1, 2025

If you use the regular version of the dashboard, you should see some of the plots, although with errors to build the heatmap and some o

Which regular version do you mean? Or if you can guide me to see if I can load the nf-core/mag generated file ?

@jeffe107
Copy link
Member Author

jeffe107 commented Dec 1, 2025

Yes, you are using the lite version that allows to generate the 'static html'. But if you run the 'live version', it would work: BIgMAG/app.py -p 8050 /home/james/git/scratch/mag/testing/results/GenomeBinning/BIgMAG/bigmag_summary.tsv

You can change the port (-p), 8050 is the default.

@jfy133
Copy link
Member

jfy133 commented Dec 1, 2025

BIgMAG/app.py -p 8050 /home/james/git/scratch/mag/testing/results/GenomeBinning/BIgMAG/bigmag_summary.tsv

Ah understood!

OK yes that works! I will approve and you can merge :D Great work @jeffe107 thank youy very much!

@jfy133 jfy133 self-requested a review December 1, 2025 09:15
@jeffe107 jeffe107 merged commit 06c5ab7 into nf-core:dev Dec 1, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants