Skip to content

Conversation

sraikund16
Copy link
Contributor

@sraikund16 sraikund16 commented Feb 12, 2025

Summary: We induce a buffer request denial by running a basic resnet50 training script on this branch. It then causes a segfault in libcutpi.so.

Crash stack

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)
==== backtrace (tid: 455245) ====
 0  <test_path>/libucm_ucs_dev.so(+0x581aa) [0x7f10159d61aa]
 1  <test_path>/ibucm_ucs_dev.so(+0x57ba1) [0x7f10159d5ba1]
 2  <test_path>/ibfolly_fibers_guard_page_allocator.so(+0x6c46) [0x7f11359b0c46]
 3  <usr_lib_path>/libc.so.6(+0x44560) [0x7f1148044560]
 4  <usr_lib_path>/cuda-no-rpath-12.4/libcupti.so.2024.1.1(+0x122790) [0x7f100a322790]
 5  <usr_lib_path>cuda-no-rpath-12.4/libcupti.so.2024.1.1(+0x141769) [0x7f100a341769]
 6  <usr_lib_path>/cuda-no-rpath-12.4/libcupti.so.2024.1.1(+0x10b10a) [0x7f100a30b10a]
 7  <usr_lib_path>/cuda-no-rpath-12.4/libcupti.so.2024.1.1(+0x10d3a8) [0x7f100a30d3a8]
 8  <usr_lib_path>/libcuda.so.1(+0x445b06) [0x7f1008845b06]
 9  <usr_lib_path>/cudnn-no-rpath-8.9.3/libcudnn_ops_train.so.8.9.3(+0x14cfed) [0x7f0271b4cfed]
10  <usr_lib_path>/cudnn-no-rpath-8.9.3/libcudnn_ops_train.so.8.9.3(+0x4fffb) [0x7f0271a4fffb]
11 <usr_lib_path>/cudnn-no-rpath-8.9.3/libcudnn_ops_train.so.8.9.3(cudnnBatchNormalizationForwardTrainingEx+0x4a5) [0x7f0271a597b5]
...

Differential Revision: D69558264

Summary: We induce a buffer request denial and by running a  basic resnet50 training script on this branch. It induces a segfault in libcutpi.so

Differential Revision: D69558264
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69558264

@sraikund16 sraikund16 changed the title CUPTI Segfault Reproducer for Denying Buffer Request [DO NOT LAND] CUPTI Segfault Reproducer for Denying Buffer Request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants