-
Notifications
You must be signed in to change notification settings - Fork 730
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
Hi, I deployed a VLM model in Dynamo with huggingface model, adding an argument "--served-model-name".
All the pod deployed successfully. However, when I use API, it fails. It seems that the frontend pod tries to fetch the model from huggingface with the value of --served-model-name.
Does frontend pod need to download a model? Even if so, I guess it use "--model" value, not "--served-model-name" value.
Thanks in advance!
Steps to Reproduce
- Apply the yaml file below e.g.
kubectl apply -f dynamo/examples/backends/vllm/deploy/vlm_agg.yaml -n dynamo-system - Wait until all pod is READY
- Get the svc name and port-forward e.g.
kubectl port-forward svc/llm-vllm-agg-llmfrontend 8000:8000 -n dynamo-system curl localhost:8000/v1/models- get the log of the frontend pod e.g.
kubectl logs <frontend pod> -n dynamo-system
### yaml file for kubectl apply
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: vlm-vllm-agg
spec:
services:
VLMFrontend:
dynamoNamespace: vlm-vllm-agg
componentType: frontend
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
VLMVllmDecodeWorker:
envFromSecret: hf-token-secret
dynamoNamespace: vlm-vllm-agg
componentType: worker
replicas: 1
resources:
limits:
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
workingDir: /workspace/examples/backends/vllm
command:
- python3
- -m
- dynamo.vllm
args:
- --model=nvidia/Cosmos-Reason1-7B
- --served-model-name=cosmos-reason1-7b
startupProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 120
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 60 # 32 minutes total (120s + 60*30s)
livenessProbe:
httpGet:
path: /live
port: 9090
initialDelaySeconds: 300
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 10
readinessProbe:
httpGet:
path: /live
port: 9090
initialDelaySeconds: 300
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 10
Expected Behavior
- when
curl http://localhost:8000/v1/modelsit returns the "--served-model-name" value as its data.id
{"object":"list","data":[{"id":"cosmos-reason1-7b","object":"object","created":1764046339,"owned_by":"nvidia"}]} - frontend pod doesn't download a model, or frontend pod uses "--model" arg value to download a model.
Actual Behavior
- when curl http://localhost:8000/v1/models it returns nothing in data
{"object":"list","data":[{}]} - frontend pod uses "--served-model-name" arg value to download a model and it fails.
# kubectl logs <frontend pod> -n dynamo-system
2025-12-02T02:18:52.808639Z WARN dynamo_llm::hub: ModelExpress download failed for model 'cosmos-reason1-7b': Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)
2025-12-02T02:18:52.808665Z ERROR dynamo_llm::discovery::watcher: Error adding model from discovery model_name="cosmos-reason1-7b" namespace="vlm-vllm-agg" error="Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)"
Environment
ubuntu 22.04
dynamo 0.7.0
kubernetes v1.31
Additional Context
No response
Screenshots
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working