Skip to content

Dynamo v0.7.0.post1

Latest

Choose a tag to compare

@dagil-nvidia dagil-nvidia released this 06 Dec 04:17
41a72ab

Dynamo v0.7.0.post1 - Release Notes

Summary

Dynamo 0.7.0.post1 is a minor release focusing on a TensorRT-LLM version upgrade, observability enhancements, and critical bug fixes. This release upgrades TensorRT-LLM to version 1.2.0rc3 with updated KV cache transfer defaults, adds comprehensive KServe health check endpoints for production monitoring, and resolves metrics visibility issues in Kubernetes deployments.

Base Branch: release/0.7.0

Full Changelog

Performance and Framework Support

  • TensorRT-LLM 1.2.0rc3: Upgraded TensorRT-LLM dependency to version 1.2.0rc3 (#4645) with updated KV cache transfer configuration changing the default from UCX-only to NIXL with UCX backend for improved memory transfer performance, making UCX KVCache opt-in rather than default.

Fault Tolerance & Observability

  • KServe gRPC Health Endpoints: Added gRPC health check endpoints for system monitoring including ServerLive, ServerReady, and ModelReady (#4708) to enable verification of server liveness, overall system readiness state, and per-model availability for improved Kubernetes integration with liveness and readiness probes.
  • KServe HTTP Metrics Endpoint: Added configurable HTTP metrics endpoint to KServe gRPC service (#4400) enabling concurrent execution of HTTP metrics and gRPC servers with custom host and port parameters for improved observability in production deployments.

Documentation

  • TensorRT-LLM Multimodal EPD: Updated TensorRT-LLM commit reference from v1.2.0rc2 to v1.2.0rc3 in multimodal EPD documentation (#4713) to ensure users build with the correct tested version for Encode-Prefill-Decode feature, aligning with the upgraded TensorRT-LLM dependency in this release.

Bug Fixes

  • LMCache Prometheus Metrics: Fixed LMCache metrics visibility when PROMETHEUS_MULTIPROC_DIR is explicitly set in Kubernetes deployments (#4654) by implementing dual-registry approach that resolves Prometheus registry conflicts, ensuring lmcache:* metrics are properly exposed in production environments.
  • KvEventPublisher Signature: Fixed KvEventPublisher method signature (#4754) to resolve compatibility issues with KV block manager event publishing system and prevent runtime errors in disaggregated deployments.

Known Issues

Helm Chart Image Tag Mismatch

There is a version format inconsistency between the Helm chart's appVersion and the Docker image tags in this release:

Resource Format Value
Git tag PEP 440 v0.7.0.post1
Helm Chart SemVer dynamo-platform-0.7.0-post1.tgz
Docker image PEP 440 kubernetes-operator:0.7.0.post1

Impact: Deploying the dynamo-platform Helm chart without overriding the image tag will fail with ImagePullBackOff. This occurs because the Helm chart's appVersion uses SemVer format (0.7.0-post1 with a hyphen), but the actual Docker images on nvcr.io use PEP 440 format (0.7.0.post1 with a dot).

Workaround: Explicitly override the image tag during Helm installation:

helm install dynamo-platform \
  --set "dynamo-operator.controllerManager.manager.image.tag=0.7.0.post1" \
  # ... other options

Or in a values file:

dynamo-operator:
  controllerManager:
    manager:
      image:
        tag: "0.7.0.post1"