-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Description
🚀 The feature, motivation and pitch
python -m vllm.entrypoints.openai.api_server \
--model /workspace/meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
The /v1/models
response from above setup can not expose the lineage between lora and base models. In below example, root always points to the base_model.
Current Status
-
Base model will either use
--model
or--served-model-name
. If user use local path, then theid
androot
would not be model id like OpenAI. -
Lora model card information is from LoraRequest which doesn't have base_model at this moment. Technically, we can assume they are all adapters to base model. This may break later once the engine supports multiple models.
{
"object": "list",
"data": [
{
"id": "/workspace/meta-llama/Llama-2-7b-hf",
"object": "model",
"created": 1715644056,
"owned_by": "vllm",
"root": "/workspace/meta-llama/Llama-2-7b-hf",
"parent": null,
"permission": [
{
.....
}
]
},
{
"id": "sql-lora",
"object": "model",
"created": 1715644056,
"owned_by": "vllm",
"root": "/workspace/meta-llama/Llama-2-7b-hf",
"parent": null,
"permission": [
{
....
}
]
}
]
}
Expected
We can use root
to represent model path and parent
to indicate base_model for lora adapters. seems they are not OpenAI protocols, we should be able to make the change
{
"object": "list",
"data": [
{
"id": "meta-llama/Llama-2-7b-hf",
"object": "model",
"created": 1715644056,
"owned_by": "vllm",
"root": "~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/",
"parent": null,
"permission": [
{
.....
}
]
},
{
"id": "sql-lora",
"object": "model",
"created": 1715644056,
"owned_by": "vllm",
"root": "~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/",
"parent": meta-llama/Llama-2-7b-hf,
"permission": [
{
....
}
]
}
]
}
I am drafting a PR to address this issue and please help review whether above looks good.
Alternatives
No response
Additional context
No response