Skip to content

Conversation

@divyegala
Copy link
Member

The kernel, in order:

  1. Starts a loop for every thread
  2. Writes to shared memory using all threads, then syncs
  3. Uses thread 0 to reduce in shared memory
  4. Continues forward with the loop

A sync was necessary between step 3 and 4 to ensure other threads don't continue and overwrite in shared memory while thread 0 is still reading and doing reductions.

@divyegala divyegala self-assigned this Apr 23, 2025
@divyegala divyegala requested a review from a team as a code owner April 23, 2025 18:54
@divyegala divyegala requested review from lowener and teju85 April 23, 2025 18:54
@divyegala divyegala added bug Something isn't working non-breaking Non-breaking change labels Apr 23, 2025
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@divyegala
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit ee74673 into rapidsai:branch-25.06 Apr 26, 2025
77 of 78 checks passed
Ofek-Haim pushed a commit to Ofek-Haim/cuml that referenced this pull request May 13, 2025
The kernel, in order:
1. Starts a loop for every thread
2. Writes to shared memory using all threads, then syncs
3. Uses thread 0 to reduce in shared memory
4. Continues forward with the loop

A sync was necessary between step 3 and 4 to ensure other threads don't continue and overwrite in shared memory while thread 0 is still reading and doing reductions.

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Micka (https://github.com/lowener)

URL: rapidsai#6578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CUDA/C++ non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants