Skip to content

[BUG] Unable to set max-local-prefill-length on VllmDecodeWorker ("unknown field") #4750

@PoWeiShen

Description

@PoWeiShen

Describe the Bug

I’m trying to run a disaggregated service with the vLLM backend.

According to the docs Disaggregation and Performance Tuning (section “Set the local prefill length”), max-local-prefill-length should be configurable on the vLLM decode worker so that short prompts stay local and longer prompts are sent to prefill engines.

However, when I add max-local-prefill-length under spec.services.VllmDecodeWorker in my DynamoGraphDeployment YAML, the CRD validation fails with:

Error from server (BadRequest): error when creating "disagg.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.VllmDecodeWorker.max-local-prefill-length"

So it looks like the CRD schema does not accept this field, even though the docs say it should be supported.

I’m not sure if:

  • this is a bug in the CRD / operator,
  • or if max-local-prefill-length is now supposed to be set in a different place (for example in a nested config / env var / JSON),
  • or if the option was renamed in newer versions.

Steps to Reproduce

Add max-local-prefill-length: 1500 in disagg.yaml file

spec:
  services:
    VllmDecodeWorker:
      max-local-prefill-length: 1500

kubectl apply -f disagg.yaml

Expected Behavior

Based on the docs, I expected one of the following:

  1. max-local-prefill-length to be a valid field under spec.services.VllmDecodeWorker in the DynamoGraphDeployment CRD, or

  2. Clear documentation showing the correct place / key to configure max-local-prefill-length in the DGD spec (for example under some config/params section that is passed to dynamo.vllm).

Actual Behavior

The CRD validation rejects max-local-prefill-length as an unknown field on VllmDecodeWorker, so there is currently no obvious way (from the docs) to configure this option via the DGD YAML.

Environment

  • Dynamo version: 0.6.1
  • Backend: vLLM (nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1)
  • Deployment: Kubernetes, using DynamoGraphDeployment CRD
  • Mode: disaggregated (VllmDecodeWorker + VllmPrefillWorker)

Additional Context

  1. Is max-local-prefill-length still a supported option in the current Dynamo + vLLM integration?
  2. If yes, what is the correct way to configure it for VllmDecodeWorker in a DynamoGraphDeployment?
  3. If it has been renamed / moved, could the docs be updated to reflect the new parameter name and location?

Thanks!

Screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions