Skip to content

[Issue]: pytorch: consistent HIPBLAS_STATUS_ALLOC_FAILED on 7900 XTX #1708

@benrichard-amd

Description

@benrichard-amd

Problem Description

On recent TheRock + torch nightly releases, encountering HIPBLAS_STATUS_ALLOC_FAILED whenever attempting to use torch to do matrix stuff, even on small matrices.

Torch version: 2.10.0a0+rocm7.9.0rc20251007

Operating System

Ubuntu 24.04.3 LTS (Noble Numbat)

CPU

Intel(R) Core(TM) i9-14900K

GPU

AMD Radeon RX 7900 XTX

ROCm Version

7.9.0

ROCm Component

No response

Steps to Reproduce

  1. Install ROCm + torch following the directions here: https://github.com/ROCm/TheRock/blob/main/RELEASES.md#index-page-listing

  2. Run sample reproducer:

import torch

device = torch.device("cuda")

# Create two random matrices on the GPU
A = torch.randn(128, 128, device=device)
B = torch.randn(128, 128, device=device)

# Perform matrix multiplication
C = torch.matmul(A, B)
$ python test2.py 
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    C = torch.matmul(A, B)
        ^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support


### Additional Information

_No response_

Metadata

Metadata

Labels

bugSomething isn't workingecosystem: PyTorchIssue pertains to PyTorch and related librariesstatus: triageIndicates an issue has been assigned for investigation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions