-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
I'm not sure that this one is worth fixing, but I want to document it.
If you compile your cuda kernels with the "-G" flag to compile in debug mode, it turns off optimizations. Turning off optimizations normally is expected not to change the functionality of your code, but unfortunately it can with cuda code.
The best way to turn on debug information is by adding string(APPEND CMAKE_CUDA_FLAGS_DEBUG " -G")
right here:
Line 902 in 2247aa6
Turning off optimizations will increase the number of registers used by your code, which can prevent certain block sizes from being used, because that block size would use more registers than there are in SM, which results in cudaErrorLaunchOutOfResources.
Specifically, I have seen that torch._C._nn.cross_entropy_loss can fail, though I did not bother to document the sizes and dtypes which cause the failure.
I'm not sure this is worth fixing, since it's hard to fix and a rarely used option. But I think it is worthwhile again, to document.