Actively handling resource fragmentation

**Is your feature request related to a problem? Please describe:**
Fragmentation might happen if matrix is used to serve models that require different number of GPUs. For example. Model A requires 1 GPU and model B requires 2 GPUs. When we deploy an application of 14 model A and 1 model B on 2 nodes, matrix tends to put 7 replicas of A on each node and thus there is no space for model B. 

**Describe the solution you would like:**
It would be great if matrix can actively detect these bubbles and handle them by moving the replicas around. 

**Describe the alternatives you have considered:**
Right now to solve the example above, I can add 1 node to the cluster first, deploy model A, and then add another node and deploy both model A and B. 
Update: it seems that writing the model takes up the most number of GPUs per replica in list of applications passed into the matrix_deploy can solve the issue as well  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Actively handling resource fragmentation #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Actively handling resource fragmentation #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions