[BUG]: Frontend pod tries to download a model via huggingface with the value of arg --served-model-name (and it fails)

### Describe the Bug

Hi, I deployed a VLM model in Dynamo with huggingface model, adding an argument "--served-model-name".
All the pod deployed successfully. However, when I use API, it fails. It seems that the frontend pod tries to fetch the model from huggingface with the value of --served-model-name.

Does frontend pod need to download a model? Even if so, I guess it use "--model" value, not "--served-model-name" value.

Thanks in advance!

### Steps to Reproduce

1. Apply the yaml file below  e.g. `kubectl apply -f dynamo/examples/backends/vllm/deploy/vlm_agg.yaml -n dynamo-system`
2. Wait until all pod is READY
3. Get the svc name and port-forward  e.g. `kubectl port-forward svc/llm-vllm-agg-llmfrontend 8000:8000 -n dynamo-system`
4. `curl localhost:8000/v1/models`
5. get the log of the frontend pod  e.g.  `kubectl logs <frontend pod> -n dynamo-system`
```
### yaml file for kubectl apply
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: vlm-vllm-agg
spec:
  services:
    VLMFrontend:
      dynamoNamespace: vlm-vllm-agg
      componentType: frontend
      replicas: 1
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
    VLMVllmDecodeWorker:
      envFromSecret: hf-token-secret
      dynamoNamespace: vlm-vllm-agg
      componentType: worker
      replicas: 1
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1
          workingDir: /workspace/examples/backends/vllm
          command:
            - python3
            - -m
            - dynamo.vllm
          args:
            - --model=nvidia/Cosmos-Reason1-7B
            - --served-model-name=cosmos-reason1-7b
          startupProbe:
            httpGet:
              path: /health
              port: 9090
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 60  # 32 minutes total (120s + 60*30s)
          livenessProbe:
            httpGet:
              path: /live
              port: 9090
            initialDelaySeconds: 300
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 10
          readinessProbe:
            httpGet:
              path: /live
              port: 9090
            initialDelaySeconds: 300
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 10

```

### Expected Behavior

- when `curl http://localhost:8000/v1/models` it returns the "--served-model-name" value as its data.id
{"object":"list","data":[{"id":"cosmos-reason1-7b","object":"object","created":1764046339,"owned_by":"nvidia"}]}
- frontend pod doesn't download a model, or frontend pod uses "--model" arg value to download a model. 


### Actual Behavior

- when curl http://localhost:8000/v1/models it returns nothing in data
{"object":"list","data":[{}]}
- frontend pod uses "--served-model-name" arg value to download a model and it fails.
```
# kubectl logs <frontend pod> -n dynamo-system
2025-12-02T02:18:52.808639Z  WARN dynamo_llm::hub: ModelExpress download failed for model 'cosmos-reason1-7b': Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)
2025-12-02T02:18:52.808665Z ERROR dynamo_llm::discovery::watcher: Error adding model from discovery model_name="cosmos-reason1-7b" namespace="vlm-vllm-agg" error="Failed to fetch model 'cosmos-reason1-7b' from HuggingFace. Is this a valid HuggingFace ID? Error: request error: HTTP status client error (401 Unauthorized) for url (https://huggingface.co/api/models/cosmos-reason1-7b/revision/main)"
```

### Environment

ubuntu 22.04
dynamo 0.7.0
kubernetes v1.31


### Additional Context

_No response_

### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: Frontend pod tries to download a model via huggingface with the value of arg --served-model-name (and it fails) #4748

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Frontend pod tries to download a model via huggingface with the value of arg --served-model-name (and it fails) #4748

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions