PTXAS compilation error: '.tile::gather4 with destination state space as .shared::cluster' not supported on target 'sm_121a'

### Describe the bug

When JIT-compiling a Triton kernel (specifically `matmul_ogs` from `triton_kernels`), the compiler generates PTX assembly that utilizes the `.tile::gather4` instruction with `.shared::cluster` as the destination state space.

The NVIDIA `ptxas` assembler fails to compile this PTX code, reporting that this specific feature is not supported on the target architecture `sm_121a`. This suggests that Triton's code generation for this architecture is producing an instruction that the hardware/driver toolchain does not support.

The issue occurs during a call to the `matmul_ogs` kernel. The full PTX code generated by Triton is attached below (full trace in the file), which may help in debugging.

[triton.log](https://github.com/user-attachments/files/22629368/triton.log)

Summary:

```
Traceback (most recent call last):
  File "/REDACTED.py", line 316, in REDACTED
    REDACTED = matmul_ogs(
               ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton_kernels/matmul_ogs.py", line 601, in matmul_ogs
    (kernels._p_matmul_ogs if opt_flags.is_persistent else kernels._matmul_ogs)[(grid,)](
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 419, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 733, in run
    kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 861, in _do_compile
    kernel = self.compile(src, target=target, options=options.__dict__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 320, in compile
    next_module = compile_ir(module, metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/compiler.py", line 520, in <lambda>
    stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.target.arch)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/compiler.py", line 503, in make_cubin
    raise PTXASError(error)
triton.runtime.errors.PTXASError: PTXAS error: Internal Triton PTX codegen error
`ptxas` stderr:
ptxas /tmp/tmpda2tgdg3.ptx, line 4253; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4258; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4262; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4266; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4270; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4274; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4278; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4282; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4286; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4290; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4294; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4298; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4302; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4306; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4310; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas /tmp/tmpda2tgdg3.ptx, line 4314; error   : Feature '.tile::gather4 with destination state space as .shared::cluster' not supported on .target 'sm_121a'
ptxas fatal   : Ptx assembly aborted due to errors

Repro command: /usr/local/cuda/bin/ptxas -lineinfo -v --gpu-name=sm_121a /tmp/tmpda2tgdg3.ptx -o /tmp/tmpda2tgdg3.ptx.o
```

### Environment details

* **Triton version**: `3.5.0` coming with PyTorch nightly.
* **GPU**: DGX Spark, GB10, `sm_121a`.
* **CUDA Toolkit**: CUDA 13.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PTXAS compilation error: '.tile::gather4 with destination state space as .shared::cluster' not supported on target 'sm_121a' #8335

Describe the bug

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PTXAS compilation error: '.tile::gather4 with destination state space as .shared::cluster' not supported on target 'sm_121a' #8335

Description

Describe the bug

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions