-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
release-feature-requestThis tag is to mark Feature Tracked for PyTorch OSS ReleasesThis tag is to mark Feature Tracked for PyTorch OSS ReleasestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🚀 The feature, motivation and pitch
This RFC proposes to add the following capabilities to PyTorch:
1. Allocate symmetric tensors that allow remote direct access
Reference: PyTorch SymmetricMemory: Harnessing NVLink Programmability with Ease
2. Provide accelerated and/or nontraditional collectives leveraging direct access meeting new model demands
Example 1: Intra-Node one-shot/two-shot low latency kernels for fp32
Example 2: DeepSeek: a2a communication with metadata on the GPU
3. Provide programming support for developers to author customized multi-GPU kernels
Example ask/practice in community: Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed
4. Provide compute-comm fusion kernels that accelerate prevalent code patterns.
Continued support of Async TP and generalization to other patterns.
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
release-feature-requestThis tag is to mark Feature Tracked for PyTorch OSS ReleasesThis tag is to mark Feature Tracked for PyTorch OSS ReleasestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module