Skip to content

[BUG]: Frontend pod tries to download a model via huggingface with the value of arg --served-model-name (and it fails) #4748

@MisakiNakagawan

Description

@MisakiNakagawan

Describe the Bug

Hi, I deployed a VLM model in Dynamo with huggingface model, adding an argument "--served-model-name".
All the pod deployed successfully. However, when I use API, it fails. It seems that the frontend pod tries to fetch the model from huggingface with the value of --served-model-name.

Does frontend pod need to download a model? Even if so, I guess it use "--model" value, not "--served-model-name" value.

Thanks in advance!

Steps to Reproduce

  1. Apply the yaml file below e.g. kubectl apply -f dynamo/examples/backends/vllm/deploy/vlm_agg.yaml -n dynamo-system
  2. Wait until all pod is READY
  3. Get the svc name and port-forward e.g. kubectl port-forward svc/llm-vllm-agg-llmfrontend 8000:8000 -n dynamo-system
  4. curl localhost:8000/v1/models
  5. get the log of the frontend pod e.g. kubectl logs <frontend pod> -n dynamo-system
### yaml file for kubectl apply
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: vlm-vllm-agg
spec:
  services:
    VLMFrontend:
      dynamoNamespace: vlm-vllm-agg
      componentType: frontend
      replicas: 1
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
    VLMVllmDecodeWorker:
      envFromSecret: hf-token-secret
      dynamoNamespace: vlm-vllm-agg
      componentType: worker
      replicas: 1
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
          workingDir: /workspace/examples/backends/vllm
          command:
            - python3
            - -m
            - dynamo.vllm
          args:
            - --model=nvidia/Cosmos-Reason1-7B
            - --served-model-name=cosmos-reason1-7b
          startupProbe:
            httpGet:
              path: /health
              port: 9090
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 60  # 32 minutes total (120s + 60*30s)
          livenessProbe:
            httpGet:
              path: /live
              port: 9090
            initialDelaySeconds: 300
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 10
          readinessProbe:
            httpGet:
              path: /live
              port: 9090
            initialDelaySeconds: 300
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 10

Expected Behavior

  • when curl http://localhost:8000/v1/models it returns the "--served-model-name" value as its data.id
    {"object":"list","data":[{"id":"cosmos-reason1-7b","object":"object","created":1764046339,"owned_by":"nvidia"}]}
  • frontend pod doesn't download a model, or frontend pod uses "--model" arg value to download a model.

Actual Behavior

  • when curl http://localhost:8000/v1/models it returns nothing in data
    {"object":"list","data":[{}]}
  • frontend pod uses "--served-model-name" arg value to download a model and it fails.
# kubectl logs <frontend pod> -n dynamo-system
2025-12-02T02:18:52.808639Z  WARN dynamo_llm::hub: ModelExpress download failed for model 'cosmos-reason1-7b': Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)
2025-12-02T02:18:52.808665Z ERROR dynamo_llm::discovery::watcher: Error adding model from discovery model_name="cosmos-reason1-7b" namespace="vlm-vllm-agg" error="Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)"

Environment

ubuntu 22.04
dynamo 0.7.0
kubernetes v1.31

Additional Context

No response

Screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions