Skip to content

Only enumerate ROCm-capable AMD GPUs #1500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2025

Conversation

alaviss
Copy link
Contributor

@alaviss alaviss commented Jun 10, 2025

Discover AMD graphics devices using AMDKFD topology instead of enumerating the PCIe bus. This interface exposes a lot more information about potential devices, allowing RamaLama to filter out unsupported devices.

Currently, devices older than GFX9 are filtered, as they are no longer supported by ROCm.

Ref: #1482

Summary by Sourcery

Use the AMDKFD sysfs topology interface to detect and select ROCm-capable AMD GPUs by architecture and VRAM capacity.

New Features:

  • Introduce an amdkfd module to parse KFD properties and enumerate GPUs via /sys/devices/virtual/kfd
  • Replace PCI-based enumeration with AMDKFD for ROCm GPU detection
  • Filter out GPUs older than GFX9 (gfx_target_version < 90000)
  • Aggregate VRAM across memory banks and require a minimum of 1 GiB (MIN_VRAM_BYTES)

Discover AMD graphics devices using AMDKFD topology instead of
enumerating the PCIe bus. This interface exposes a lot more information
about potential devices, allowing RamaLama to filter out unsupported
devices.

Currently, devices older than GFX9 are filtered, as they are no longer
supported by ROCm.

Signed-off-by: Leorize <[email protected]>
@alaviss alaviss force-pushed the push-pwxuznmnqptr branch from 7bf6748 to fab8765 Compare June 10, 2025 19:54
@alaviss
Copy link
Contributor Author

alaviss commented Jun 10, 2025

At the moment, RamaLama will fallback to CPU inference for unsupported AMD GPUs since -ngl is not passed if RamaLama doesn't detect an accelerator. I'm not sure how to detect if a Vulkan-capable accelerator is available to force passing in -ngl.

@alaviss
Copy link
Contributor Author

alaviss commented Jun 10, 2025

Ideally this should be tested on AMD APUs systems. I have only verified this implementation with my dGPU.

@ericcurtin ericcurtin requested a review from Copilot June 10, 2025 20:02
Copilot

This comment was marked as outdated.

@ericcurtin
Copy link
Member

Not a big fan of inner functions, it encourages indenting for little benefit, can we just make them normal functions?

@ericcurtin
Copy link
Member

Thanks for completing this, the logic seems to make sense, not that I tested

@ericcurtin
Copy link
Member

As regards "I'm not sure how to detect if a Vulkan-capable accelerator is available to force passing:"

We recently integrated this change:

ggml-org/llama.cpp#14099

Maybe we want to default to --ngl 999 all round and see how that goes. My concern would be machines with limited VRAM, would they play nicely with that default.

We could parse this just before we execute llama-server also potentially:

$ vulkaninfo 2>&1 | grep deviceType
	deviceType        = PHYSICAL_DEVICE_TYPE_CPU

@alaviss
Copy link
Contributor Author

alaviss commented Jun 10, 2025

Maybe we want to default to --ngl 999 all round and see how that goes

I actually tested that, and it tries to do repacks for CPU. I'm not sure if that's a good or a bad thing. But either way I think that change should be done separately.

We could parse this just before we execute llama-server also potentially:

$ vulkaninfo 2>&1 | grep deviceType
	deviceType        = PHYSICAL_DEVICE_TYPE_CPU

vulkaninfo is not installed by default anywhere afaict.

@ericcurtin
Copy link
Member

Maybe we want to default to --ngl 999 all round and see how that goes

I actually tested that, and it tries to do repacks for CPU. I'm not sure if that's a good or a bad thing. But either way I think that change should be done separately.

Yeah... There's a patch from llama.cpp merged hours ago that might change this behaviour though. Did you try with that patch?

We could parse this just before we execute llama-server also potentially:

$ vulkaninfo 2>&1 | grep deviceType
	deviceType        = PHYSICAL_DEVICE_TYPE_CPU

vulkaninfo is not installed by default anywhere afaict.

It's installed in all our container images with vulkan, so if it's done just before where llama-server is execute it should be fine in the containerized case.

@ericcurtin ericcurtin requested a review from Copilot June 10, 2025 21:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces PCIe-based enumeration with AMDKFD topology to detect ROCm-capable AMD GPUs, filters out unsupported older architectures, and accounts for VRAM across memory banks.

  • Add ramalama.amdkfd module for parsing KFD properties and listing GPU nodes
  • Update check_rocm_amd() to use KFD topology, filter gfx versions <90000, and sum VRAM from public/private heaps
  • Introduce MIN_VRAM_BYTES constant (1 GiB) for minimum VRAM threshold

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ramalama/common.py Import new module, add VRAM threshold constant, overhaul GPU scan
ramalama/amdkfd.py New utilities for reading KFD sysfs properties and listing GPUs
Comments suppressed due to low confidence (3)

ramalama/common.py:453

  • [nitpick] The variable name np is ambiguous and commonly associated with NumPy; consider renaming to node_path or similar for clarity.
for i, (np, props) in enumerate(amdkfd.gpus()):

ramalama/common.py:470

  • This assignment is outside the if mem_bytes > ... block, so gpu_num will be set on every iteration rather than only when a larger VRAM GPU is found. It should be indented inside the if.
gpu_num = i

ramalama/amdkfd.py:12

  • The new gpus() function and parse_props() logic lack tests; consider adding unit tests or mocks for sysfs reads to verify correct parsing and filtering.
def gpus():

mem_banks_count = int(props['mem_banks_count'])
mem_bytes = 0
for bank in range(mem_banks_count):
bank_props = amdkfd.parse_props(np + f'/mem_banks/{bank}/properties')
Copy link
Preview

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Path construction via string concatenation can be fragile; prefer os.path.join(np, 'mem_banks', str(bank), 'properties').

Suggested change
bank_props = amdkfd.parse_props(np + f'/mem_banks/{bank}/properties')
bank_props = amdkfd.parse_props(os.path.join(np, 'mem_banks', str(bank), 'properties'))

Copilot uses AI. Check for mistakes.

# See /usr/include/linux/kfd_sysfs.h for possible heap types
#
# Count public and private framebuffer memory as VRAM
if bank_props['heap_type'] in [1, 2]:
Copy link
Preview

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Magic numbers 1 and 2 for heap types are unclear; consider defining named constants or an enum for readability and future maintenance.

Suggested change
if bank_props['heap_type'] in [1, 2]:
if bank_props['heap_type'] in [HEAP_TYPE_PUBLIC, HEAP_TYPE_PRIVATE]:

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is correct we should try and take this is another PR @alaviss . I have no idea what 1 and 2 mean.

@ericcurtin ericcurtin merged commit 4808a49 into containers:main Jun 10, 2025
14 of 16 checks passed
@alaviss alaviss deleted the push-pwxuznmnqptr branch June 10, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants