-
Notifications
You must be signed in to change notification settings - Fork 729
Description
Hi Dynamo developers!
Many apologies for being late. It is the conference season, and we were unfortunately swamped with talk preparations. The v0.6.1 already has gone out, but we wanted to provide visibility into Dynamo v0.7.0 release. Please refer to the long term H2 roadmap here.
As before in H2, we are contributing to make progress on the five major focus areas:
- Performance
- Fault tolerance
- K8 deployment
- KV cache management and transfer
- Scheduling with smart router and planner
Additionally, with 0.7.0 release, we will have 2 new artifacts for increased modularity and it will ship with CUDA v13:
- KVBM pip wheel
- End Point Picker (EPP) container image
This release will emphasize on
- Providing maximum performance via composibility (KV aware routing + KV offloading + disaggregated serving)
- Seamless production grade serving from configuration (AIConfigurator & Planner) to production (Grove & granular fault tolerance for LLMs)
📅 Timeline
The target date for the v0.7.0 release is 11/19 (Thu)
Dynamo v0.7.0 Features
1. Performance
- Consolidated examplar showcasing composibility with disaggregated serving, KV aware routing and KV offloading with KVBM.
2. Fault Tolerance & Observability
Fault Tolerance
- ETCD lease keep alive resilience.
- ETCD watcher resilience.
- Fault tolerance CI harness.
- Request cancellation test cases.
Observability
- Achieve parity with SGLang, TRT-LLM, and vLLM metrics
- Add engine (component/backend) metrics
- Extend metric collection guide for K8 with CPU metrics
- Publish NSight integration example
3. K8s Deployment
- Remove ETCD dependency
- Multi-LoRA support
- SLA profiler and AIConfigurator integration
4. KV Cache Management & Transfer
KV Block Manager
Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage
- Enable KV event sharing with router
- Pip wheel for KVBM
- Performant G4 offloading
5. Planning & Routing
Router
- Enable composibility with KV aware routing + KVBM + disaggregated serving
Planner
- SLA planner MoE scaling support
- Seamless UX for using AIC + Planner + Grove for multinode deployments
- Extend Planner support to aggregated deployments
If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.