Skip to content

Conversation

hoangtran9122
Copy link
Contributor

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Fix illegal CUDA memory error when evaluating using the decoding methods with NGPU-LM while training ASR models.

Collection: asr

Changelog

  • Add WithOptionalCudaGraphs to the new decoding classes with ngpu-lm and add the required methods to enable/disable CUDA graph while training.

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Copy link
Contributor

github-actions bot commented Jul 9, 2025

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jul 9, 2025
@ahmadki
Copy link
Member

ahmadki commented Jul 9, 2025

@nithinraok The PR looks good, can you take a look if this is something we want to merge ?

@hoangtran9122: Do you mind signing the commit and fixing the linting errors ?

@nithinraok nithinraok requested a review from artbataev July 9, 2025 16:41
@github-actions github-actions bot removed the stale label Jul 10, 2025
@artbataev
Copy link
Collaborator

@hoangtran9122 please update manually to the latest main branch (to fix "isort and black formatting") and sign off the commit

artbataev
artbataev previously approved these changes Jul 10, 2025
Copy link
Collaborator

@artbataev artbataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!
Please fix the issues with CI.

@hoangtran9122
Copy link
Contributor Author

Yes sure, I'll sign fix it

@hoangtran9122 hoangtran9122 force-pushed the fix-decoding-with-ngpu-lm-on-training branch from 4ecde2d to c7be903 Compare July 11, 2025 06:45
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 11, 2025
update the latest main branch

Signed-off-by: Hoang Tran <[email protected]>
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 11, 2025
@hoangtran9122 hoangtran9122 force-pushed the fix-decoding-with-ngpu-lm-on-training branch from 7087c91 to 5c143f5 Compare July 11, 2025 08:04
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 11, 2025
@hoangtran9122 hoangtran9122 force-pushed the fix-decoding-with-ngpu-lm-on-training branch from 5c143f5 to 3879d54 Compare July 11, 2025 08:10
@github-actions github-actions bot added the CI label Jul 11, 2025
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 11, 2025
@hoangtran9122
Copy link
Contributor Author

I think have some difficulties to set the NEMO_REFORMAT_TOKEN for the commit @artbataev

@artbataev
Copy link
Collaborator

artbataev commented Jul 11, 2025

@hoangtran9122 The problem is with the PR permissions. After reformatting the code, the changes should be pushed back to your (PR) branch, but you disallowed edits.

There are 2 options how to fix it:

  1. allow edits by maintainers (in PR settings - usually enabled by default when you create the PR) https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests

  2. Install isort and black from requirements_test.txt, reformat each file manually (black . will reformat nothing, need to explicitly pass each file), then push to your branch ("isort & black" task will reformat nothing - no need to push changes)

@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 14, 2025
Signed-off-by: Hoang Tran <[email protected]>
@hoangtran9122 hoangtran9122 force-pushed the fix-decoding-with-ngpu-lm-on-training branch from ff9f69c to cb86955 Compare July 14, 2025 04:25
@github-actions github-actions bot removed the CI label Jul 14, 2025
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jul 14, 2025
@artbataev artbataev merged commit 826b9f7 into NVIDIA-NeMo:main Jul 15, 2025
130 checks passed
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Jul 23, 2025
…eMo#13995)

* Fix decoding with ngpu-lm when training (NVIDIA-NeMo#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
monica-sekoyan pushed a commit that referenced this pull request Aug 4, 2025
* Fix decoding with ngpu-lm when training (#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
…eMo#13995)

* Fix decoding with ngpu-lm when training (NVIDIA-NeMo#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
…eMo#13995)

* Fix decoding with ngpu-lm when training (NVIDIA-NeMo#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
nasretdinovr pushed a commit to nasretdinovr/NeMo that referenced this pull request Aug 8, 2025
…eMo#13995)

* Fix decoding with ngpu-lm when training (NVIDIA-NeMo#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
guyueh1 pushed a commit to guyueh1/NeMo that referenced this pull request Aug 25, 2025
…eMo#13995)

* Fix decoding with ngpu-lm when training (NVIDIA-NeMo#13994)

Signed-off-by: Hoang Tran <[email protected]>

* code_format

Signed-off-by: Hoang Tran <[email protected]>

---------

Signed-off-by: Hoang Tran <[email protected]>
Signed-off-by: Guyue Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants