Skip to content

Releases: envoyproxy/ai-gateway

v0.3.0

21 Aug 10:28
da86fc6

Choose a tag to compare

Release Announcement

Check out the v0.3.0 release notes to learn more about the release.

Envoy AI Gateway v0.3.x

Release version introducing intelligent inference routing with Endpoint Picker Provider, enhanced observability features, Google Vertex AI support, and enhanced provider integrations.

v0.3.0

August 21, 2025
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.

✨ New Features

Endpoint Picker Provider (EPP) Integration

  • Gateway API Inference Extension Support
    • Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
  • Dual Integration Approaches
    • Support for both HTTPRoute + InferencePool and AIGatewayRoute + InferencePool integration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
  • Dynamic Load Balancing
    • Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
  • Extensible Architecture
    • Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.

Expanded Provider Ecosystem

  • Google Vertex AI Production Support
    • Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
  • Anthropic on Vertex AI Integration
    • Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
  • Enhanced Gemini Capabilities
    • Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
  • Strengthened OpenAI-Compatible Ecosystem
    • Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.

Observability Enhancements

  • OpenInference Tracing Support
    • Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
  • Configurable Metrics Labels
    • Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
  • Embeddings Metrics Support
    • Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
  • Enhanced GenAI Metrics
    • Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →

Infrastructure and Configuration

  • Model Name Virtualization

    • Added a new modelNameOverride field in the backendRef of AIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
  • Unified Gateway Support

    • Enhanced Gateway resource management by allowing both standard HTTPRoute and AIGatewayRoute to be attached to the same Gateway object. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.

🔗 API Updates

  • BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
  • Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
  • modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.

Deprecations

  • backendSecurityPolicyRef Pattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.
  • AIGatewayRoute's targetRefs Pattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.
  • AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.
  • controller.envoyGatewayNamespace helm value is no longer necessary: This value is no longer necessary and is redundant when configured.
  • controller.podEnv helm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.

📖 Upgrade Guidance

For users upgrading from v0.2.x to v0.3.0:

1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.

2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.

--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
  hooks:
    xdsTranslator:
+     translation:
+       listener:
+         includeAll: true
+       route:
+         includeAll: true
+       cluster:
+         includeAll: true
+       secret:
+         includeAll: true
  post:
- - VirtualHost
  - Translation
+ - Cluster
+ - Route

3. Upgrade Envoy AI Gateway to v0.3.0

4. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.

5. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.

6. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.

📦 Dependencies Versions

  • Go 1.24.6
    • Updated to latest Go version for improved performance and security.
  • Envoy Gateway v1.5
    • Built on Envoy Gateway for proven data plane capabilities.
  • Envoy v1.35
    • Leveraging Envoy Proxy's battle-tested networking capabilities.
  • Gateway API v1.3.1
    • Support for latest Gateway API specifications.
  • Gateway API Inference Extension v0.5.1
    • Integration with Gateway API Inference Extension for intelligent endpoint selection.

🙏 Acknowledgements

This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.

The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.

New Contributors

Read more

v0.3.0-rc2

21 Aug 10:01
da86fc6

Choose a tag to compare

v0.3.0-rc2 Pre-release
Pre-release

Release candidate

v0.3.0-rc1

15 Aug 04:15
e33a5f3

Choose a tag to compare

v0.3.0-rc1 Pre-release
Pre-release

Release candidate for v0.3.0!

helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.3.0-rc1 --namespace envoy-ai-gateway-system --create-namespace

v0.2.1

10 Jun 04:01
36fc248

Choose a tag to compare

Overview

This patch release v0.2.1 includes fixes to aws signing issue for large request body.

Commits

  • backport: do not sign content-length for AWS (#697)

v0.2.0

05 Jun 21:20
4b5ebbd

Choose a tag to compare

Envoy AI Gateway v0.2.x

June 5, 2025

Envoy AI Gateway v0.2.0 builds upon the solid foundation of v0.1.0 with focus on expanding provider ecosystem support, improving reliability and performance through architectural changes, and enterprise-grade authentication support for Azure OpenAI.

Azure OpenAI Integration Sidecar Architecture Performance Improvements CLI Tools Model Failover and Retry Certificate Manager Integration

✨ New Features

Azure OpenAI Integration

  • Full Azure OpenAI Support
    • Complete integration with Azure OpenAI services with request/response transformation for the unified OpenAI compatiple completions API.
  • Upstream Authentication for Azure Enterprise Integration
    • Support for accessing Azure via OIDC tokens and Entra ID for enterprise-grade authentication for secure and compliant upstream authentication.
  • Enterprise Proxy URL Support for Azure Authentication
    • Enhanced Azure authentication with proxy URL configuration options for enterprise proxy support.
  • Flexible Token Providers
    • Generalized token provider architecture supporting both client secret and federated token flows

Architecture Improvements

  • Sidecar and UDS External Processor
    • Switched to sidecar deployment model with Unix Domain Sockets for improved performance and resource efficiency
  • Enhanced ExtProc Buffer Limits
    • Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via filterConfig.externalProcessor.resources for better resource management.
  • Multiple AIGatewayRoute Support
    • Support for multiple AIGatewayRoute resources per gateway, removing the previous single-route limitation. This enables better organization, scalability, and management of complex routing configurations across teams.
  • Certificate Manager Integration
    • Integrated cert-manager for automated TLS certificate provisioning and rotation for the mutating webhook server that injects AI Gateway sidecar containers into Envoy Gateway pods. This enables enterprise-grade certificate management, eliminating manual certificate handling and improving security.

Cross-Backend Failover and Retry

  • Provider Fallback Logic
    • Priority-based failover system that automatically routes traffic to lower priority AI providers as higher priority endpoints become unhealthy, ensuring high availability and fault tolerance.
  • Backend Retry Support
    • Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's BackendTrafficPolicy.
  • Weight-Based Routing
    • Enhanced backend routing with weighted traffic distribution, enabling gradual rollouts, cost optimization, and A/B testing across multiple AI providers

Enhanced CLI Tools

  • aigw run Command

    • New CLI command for local development and testing of Envoy AI Gateway resources.
  • Configuration Translation

    • aigw translate for translating Envoy AI Gateway Resources to Envoy Gateway and Kubernetes CRDs.

🔗 API Updates

  • AIGatewayRoute Metadata: Added ownedBy and createdAt fields for better resource tracking.
  • Backend Configuration: Moved Backend configuration back to RouteRule for improved flexibility.
  • OIDC Field Types: Specific typing for OIDC-related configuration fields.
  • Weight Type Changes: Updated Weight field type to match Gateway API specifications.

Deprecations

  • AIServiceBackend.Timeouts: Deprecated in favor of more granular timeout configuration.

🐛 Bug Fixes

  • ExtProc Image Syncing: Fixed issue where external processor image wouldn't sync properly.
  • Router Weight Validation: Fixed negative weight validation in routing logic.
  • Content Body Handling: Fixed empty content body issues causing AWS validation errors.
  • First Match Routing: Fixed router logic to ensure first match wins as expected.

⚠️ Breaking Changes

  • Sidecar Architecture: The switch to sidecar and UDS model may require configuration updates for existing deployments.
  • API Field Changes: Some API fields have been moved or renamed - see migration guide for details. Please review the migration guide for details.
  • Timeout Configuration: Deprecated timeout fields require migration to new configuration format.
  • Routing to Kubernetes Services: Routing to Kubernetes services is not supported in Envoy AI Gateway v0.2.0. This is a known limitation and will be addressed in a future release.

📖 Upgrade Guidance

For users upgrading from v0.1.x to v0.2.0:

  1. Review usage of any deprecated API fields (particularly AIServiceBackend.Timeouts).
  2. Update deployment configurations if using custom replica configurations - the replicas field in AIGatewayFilterConfigExternalProcessor is now deprecated due to the new sidecar architecture.
  3. Remove routing to Kubernetes services - currently, Envoy AI Gateway does not support routing to Kubernetes services. This is a known limitation and will be addressed in a future release.

📦 Dependencies Versions

  • Go 1.24.2 - Updated to latest Go version for improved performance and security.
  • Envoy Gateway v1.4 - Built on Envoy Gateway for proven data plane capabilities.
  • Envoy v1.34 - Leveraging Envoy Proxy's battle-tested networking capabilities.
  • Gateway API v1.3 - Support for latest Gateway API specifications.

🙏 Acknowledgements

This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Google, and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.

There are those who engage in conversations, provide feedback, and contribute to the project in other ways than code, and we appreciate them greatly. Ideas, suggestions, and feedback are always welcome.

🔮 What's Next (beyond v0.2)

We're already working on exciting features:

  • Google Gemini & Vertex Integration
  • Anthropic Integration
  • Support for the Gateway API Inference Extension
  • Endpoint picker support for Pod routing
  • What else do you want to see? Get involved and open an issue and let us know!

v0.2.0-rc3

05 Jun 21:12
4b5ebbd

Choose a tag to compare

v0.2.0-rc3 Pre-release
Pre-release

Release candidate

v0.2.0-rc1

02 Jun 22:27
0cd003e

Choose a tag to compare

v0.2.0-rc1 Pre-release
Pre-release

Release candidate

v0.1.5

04 Jul 17:22
30b4783

Choose a tag to compare

Overview

This patch release v0.1.5 introduces a fix to syncing the ext proc image.

Commits

extproc: set extProcImage before potentially returning #515

v0.1.4

20 Mar 20:37
d00bf8c

Choose a tag to compare

Overview

This patch release v0.1.4 introduces fix to aws validation error when assistant content is empty.

Commits

  • translator: skip adding content if assistant content string is empty (#508)

v0.1.3

14 Mar 20:00
2cc198a

Choose a tag to compare

Overview

This patch release v0.1.3 includes fixes to chat completion streaming and openai assistant content type, and adds genai metrics.

Commits

  • extproc: add genai metrics to track token usage and latency (#459)
  • extproc: properly stream chat completions (#468)
  • feat: openai: allow assistant content to be string (#486)