[RFC] Support Symmetric Memory programming

### 🚀 The feature, motivation and pitch

This RFC proposes to add the following capabilities to PyTorch:

#### 1. Allocate symmetric tensors that allow remote direct access
Reference: [PyTorch SymmetricMemory: Harnessing NVLink Programmability with Ease](https://dev-discuss.pytorch.org/t/pytorch-symmetricmemory-harnessing-nvlink-programmability-with-ease/2798)

#### 2. Provide accelerated and/or nontraditional collectives leveraging direct access meeting new model demands
Example 1: [Intra-Node one-shot/two-shot low latency kernels for fp32](https://github.com/pytorch/pytorch/issues/131737)
Example 2: [DeepSeek: a2a communication with metadata on the GPU](https://github.com/pytorch/pytorch/issues/146329)

#### 3. Provide programming support for developers to author customized multi-GPU kernels
Example ask/practice in community: [Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed](https://rocm.blogs.amd.com/software-tools-optimization/triton-distributed-c/README.html)

#### 4. Provide compute-comm fusion kernels that accelerate prevalent code patterns. 
Continued support of Async TP and generalization to other patterns. 


### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Support Symmetric Memory programming #163666

🚀 The feature, motivation and pitch

1. Allocate symmetric tensors that allow remote direct access

2. Provide accelerated and/or nontraditional collectives leveraging direct access meeting new model demands

3. Provide programming support for developers to author customized multi-GPU kernels

4. Provide compute-comm fusion kernels that accelerate prevalent code patterns.

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Support Symmetric Memory programming #163666

Description

🚀 The feature, motivation and pitch

1. Allocate symmetric tensors that allow remote direct access

2. Provide accelerated and/or nontraditional collectives leveraging direct access meeting new model demands

3. Provide programming support for developers to author customized multi-GPU kernels

4. Provide compute-comm fusion kernels that accelerate prevalent code patterns.

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions