Skip to content

Actively handling resource fragmentation #111

@xiaowu0162

Description

@xiaowu0162

Is your feature request related to a problem? Please describe:
Fragmentation might happen if matrix is used to serve models that require different number of GPUs. For example. Model A requires 1 GPU and model B requires 2 GPUs. When we deploy an application of 14 model A and 1 model B on 2 nodes, matrix tends to put 7 replicas of A on each node and thus there is no space for model B.

Describe the solution you would like:
It would be great if matrix can actively detect these bubbles and handle them by moving the replicas around.

Describe the alternatives you have considered:
Right now to solve the example above, I can add 1 node to the cluster first, deploy model A, and then add another node and deploy both model A and B.
Update: it seems that writing the model takes up the most number of GPUs per replica in list of applications passed into the matrix_deploy can solve the issue as well

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions