[BUG] Unable to set max-local-prefill-length on VllmDecodeWorker ("unknown field")

### Describe the Bug

I’m trying to run a disaggregated service with the vLLM backend.

According to the docs [Disaggregation and Performance Tuning (section “Set the local prefill length”)](https://docs.nvidia.com/dynamo/latest/performance/tuning.html), `max-local-prefill-length` should be configurable on the vLLM decode worker so that short prompts stay local and longer prompts are sent to prefill engines.

However, when I add `max-local-prefill-length` under spec.services.VllmDecodeWorker in my DynamoGraphDeployment YAML, the CRD validation fails with:
```
Error from server (BadRequest): error when creating "disagg.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.VllmDecodeWorker.max-local-prefill-length"
```

So it looks like the CRD schema does not accept this field, even though the docs say it should be supported.

I’m not sure if:
- this is a bug in the CRD / operator,
- or if max-local-prefill-length is now supposed to be set in a different place (for example in a nested config / env var / JSON),
- or if the option was renamed in newer versions.

### Steps to Reproduce

Add `max-local-prefill-length: 1500` in disagg.yaml file

```
spec:
  services:
    VllmDecodeWorker:
      max-local-prefill-length: 1500
```

kubectl apply -f disagg.yaml

### Expected Behavior

Based on the docs, I expected one of the following:

1. `max-local-prefill-length` to be a valid field under `spec.services.VllmDecodeWorker` in the `DynamoGraphDeployment` CRD, or

2. Clear documentation showing the correct place / key to configure `max-local-prefill-length `in the DGD spec (for example under some config/params section that is passed to dynamo.vllm).

### Actual Behavior

The CRD validation rejects `max-local-prefill-length` as an unknown field on VllmDecodeWorker, so there is currently no obvious way (from the docs) to configure this option via the DGD YAML.

### Environment

- Dynamo version: 0.6.1
- Backend: vLLM (nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1)
- Deployment: Kubernetes, using DynamoGraphDeployment CRD
- Mode: disaggregated (VllmDecodeWorker + VllmPrefillWorker)

### Additional Context

1. Is max-local-prefill-length still a supported option in the current Dynamo + vLLM integration?
2. If yes, what is the correct way to configure it for VllmDecodeWorker in a DynamoGraphDeployment?
3. If it has been renamed / moved, could the docs be updated to reflect the new parameter name and location?

Thanks!

### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Unable to set max-local-prefill-length on VllmDecodeWorker ("unknown field") #4750

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Unable to set max-local-prefill-length on VllmDecodeWorker ("unknown field") #4750

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions