-
Notifications
You must be signed in to change notification settings - Fork 89
Description
In the KEP, the use cases and goals of DRA include using devices dynamically attached from the fabric, for this DRA driver, does NVIDIA have any plan to support fabric-attached GPUs?
Considering that the work of allocating fabric-attached resources contains both infrastructure provider's work (configuring the fabric)
and device vendor specific work (setting up the environment, configuring devices...), I am curious and confused about who should and how to implement the related driver.
Should the infrastructure provider create a custom driver which is able to do both work or should a device vendor create a driver for handling device specific work and enble it to talk to some remote fabric manager to request fabric-attached devices? For example, in this dra driver, add a gRPC client which can ask the remote server to filter out unsuitable nodes for attaching gpus and attach gpus to a specific node or dettach gpus from a specific node, so that the driver can use both local and fabric-attached gpus.
Maybe a common api between a dra driver and a fabric controller component should be discussed.
I would like to know the thought of NVIDIA about such things.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status