Support for fabric-attached GPUs

In the KEP, the use cases and goals of DRA include using devices dynamically attached from the fabric, for this DRA driver, does NVIDIA have any plan to support fabric-attached GPUs?

Considering that the work of allocating fabric-attached resources contains both infrastructure provider's work (configuring the fabric)
and device vendor specific work (setting up the environment, configuring devices...), I am curious and confused about who should and how to implement the related driver.
Should the infrastructure provider create a custom driver which is able to do both work or should a device vendor create a driver for handling device specific work and enble it to talk to some remote fabric manager to request fabric-attached devices? For example, in this dra driver, add a gRPC client which can ask the remote server to filter out unsuitable nodes for attaching gpus and attach gpus to a specific node or dettach gpus from a specific node, so that the driver can use both local and fabric-attached gpus.
Maybe a common api between a dra driver and a fabric controller component should be discussed.

I would like to know the thought of NVIDIA about such things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for fabric-attached GPUs #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for fabric-attached GPUs #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions