-
Notifications
You must be signed in to change notification settings - Fork 730
Description
Describe the Bug
I’m trying to run a disaggregated service with the vLLM backend.
According to the docs Disaggregation and Performance Tuning (section “Set the local prefill length”), max-local-prefill-length should be configurable on the vLLM decode worker so that short prompts stay local and longer prompts are sent to prefill engines.
However, when I add max-local-prefill-length under spec.services.VllmDecodeWorker in my DynamoGraphDeployment YAML, the CRD validation fails with:
Error from server (BadRequest): error when creating "disagg.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.VllmDecodeWorker.max-local-prefill-length"
So it looks like the CRD schema does not accept this field, even though the docs say it should be supported.
I’m not sure if:
- this is a bug in the CRD / operator,
- or if max-local-prefill-length is now supposed to be set in a different place (for example in a nested config / env var / JSON),
- or if the option was renamed in newer versions.
Steps to Reproduce
Add max-local-prefill-length: 1500 in disagg.yaml file
spec:
services:
VllmDecodeWorker:
max-local-prefill-length: 1500
kubectl apply -f disagg.yaml
Expected Behavior
Based on the docs, I expected one of the following:
-
max-local-prefill-lengthto be a valid field underspec.services.VllmDecodeWorkerin theDynamoGraphDeploymentCRD, or -
Clear documentation showing the correct place / key to configure
max-local-prefill-lengthin the DGD spec (for example under some config/params section that is passed to dynamo.vllm).
Actual Behavior
The CRD validation rejects max-local-prefill-length as an unknown field on VllmDecodeWorker, so there is currently no obvious way (from the docs) to configure this option via the DGD YAML.
Environment
- Dynamo version: 0.6.1
- Backend: vLLM (nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1)
- Deployment: Kubernetes, using DynamoGraphDeployment CRD
- Mode: disaggregated (VllmDecodeWorker + VllmPrefillWorker)
Additional Context
- Is max-local-prefill-length still a supported option in the current Dynamo + vLLM integration?
- If yes, what is the correct way to configure it for VllmDecodeWorker in a DynamoGraphDeployment?
- If it has been renamed / moved, could the docs be updated to reflect the new parameter name and location?
Thanks!
Screenshots
No response