Is your feature request related to a problem? Please describe.
On a system with ROCm 6.4.1 and PyTorch 2.5.1, I have both an iGPU and a dGPU available:
- GPU[0]: Radeon RX 7900 XTX (Device ID: 0x744c, recognized as cuda:0)
- GPU[1]: AMD Radeon Graphics iGPU on Ryzen 9 9900X (Device ID: 0x13c0, recognized as cuda:1)
My goal is to use both GPUs together (external + integrated) for PyTorch computations.
When running a matrix multiplication on the dGPU (cuda:0), everything works fine. But on the iGPU (cuda:1), I hit the following error:
It seems gfx1036 is not supported in the TensileLibrary. As a result, PyTorch cannot run BLAS operations on the iGPU, and the process aborts.
Describe the solution you'd like
I'm looking for a way to use gfx1036 (AMD iGPU) in PyTorch without getting errors.
i think this is an extension of a previous issue: #1346.
I'm wondering whether rocmBLAS is currently working on enabling simultaneous use of iGPU and dGPU, or if such development is being considered for the future.
Ideally, I want to be able to use both the iGPU and dGPU simultaneously in PyTorch for distributed or split workloads.