Skip to content

[RFC] Support Symmetric Memory programming #163666

@kwen2501

Description

@kwen2501

🚀 The feature, motivation and pitch

This RFC proposes to add the following capabilities to PyTorch:

1. Allocate symmetric tensors that allow remote direct access

Reference: PyTorch SymmetricMemory: Harnessing NVLink Programmability with Ease

2. Provide accelerated and/or nontraditional collectives leveraging direct access meeting new model demands

Example 1: Intra-Node one-shot/two-shot low latency kernels for fp32
Example 2: DeepSeek: a2a communication with metadata on the GPU

3. Provide programming support for developers to author customized multi-GPU kernels

Example ask/practice in community: Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

4. Provide compute-comm fusion kernels that accelerate prevalent code patterns.

Continued support of Async TP and generalization to other patterns.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    release-feature-requestThis tag is to mark Feature Tracked for PyTorch OSS ReleasestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions