Skip to content

Conversation

mmabrouk
Copy link
Member

Summary

  • Add production-ready Kubernetes deployment configurations for Agenta OSS
  • Include comprehensive Helm chart with parameterized templates
  • Support for multiple environments (dev, test, production)
  • Auto-detect cluster type for ingress controller setup

Features Added

Kubernetes Manifests (hosting/kubernetes/oss/)

  • Complete service deployments (API, web, worker, chat, completion)
  • Infrastructure components (PostgreSQL, Redis, RabbitMQ, SuperTokens)
  • Networking setup with Ingress configurations
  • Configuration management with ConfigMaps and Secrets
  • Deployment and validation scripts

Helm Chart (hosting/helm/oss/)

  • Parameterized deployment templates
  • Configurable resource limits and requests
  • Support for different environments
  • Comprehensive values schema and documentation

Test Plan

  • Verify all Kubernetes manifests are syntactically valid
  • Confirm Helm chart templates render correctly
  • Test deployment on local k3s cluster
  • Test on cloud providers (EKS, GKE, AKS)
  • Validate ingress controller installation script
  • Performance testing with production resource limits

🤖 Generated with Claude Code

This commit adds production-ready Kubernetes deployment configurations for Agenta OSS:

## Kubernetes Manifests
- Complete service deployments (API, web, worker, chat, completion)
- Infrastructure components (PostgreSQL, Redis, RabbitMQ, SuperTokens)
- Networking setup with Ingress configurations
- Configuration management with ConfigMaps and Secrets
- Deployment scripts and validation tools

## Helm Chart
- Parameterized deployment templates
- Configurable resource limits and requests
- Support for different environments (dev, test, production)
- Comprehensive values schema and documentation

All configurations use public OSS images and are designed for production use on any Kubernetes distribution.
@adarshsingh2
Copy link

adarshsingh2 commented Sep 12, 2025


@mmabrouk I tested Agenta OSS on an AKS cluster(1.32.5) using the default values.yaml and encountered two issues:

  1. Installation Error

    • On a fresh install, I received the following warning and error:

      coalesce.go:223: warning: destination for rabbitmq.ingress.tls is a table. Ignoring non-table value (false)
      Error: INSTALLATION FAILED: Operation cannot be fulfilled on resourcequotas "agenta-agenta-oss-resource-quota": the object has been modified; please apply your changes to the latest version and try again
      
    • To proceed, I had to disable resourceQuota.

  2. Pod CrashLoopBackOff

    • Both agenta-oss-chat and agenta-oss-completion pods are failing with the following error logs:

      2025-09-12T03:55:39.928Z [INFO.] Agenta - SDK version: 0.51.6 [agenta.sdk.agenta_init] 
      2025-09-12T03:55:39.928Z [INFO.] Agenta - Host: https://agenta.genai-dev.sc.eng.hitachivantara.com [agenta.sdk.agenta_init] 
      2025-09-12T03:55:39.928Z [INFO.] Agenta - OLTP URL: https://agenta.genai-dev.sc.eng.hitachivantara.com/api/otlp/v1/traces [agenta.sdk.tracing.tracing] 
      INFO:     Will watch for changes in these directories: ['/app']
      ERROR:    [Errno 13] Permission denied
      

image

@adarshsingh2
Copy link

I fixed the earlier issues on my local setup, but I’m now running into new ones:

New Issue:
While creating a new chat prompt, I encountered errors with the API:
https://agenta.genai-dev.sc.eng.dxc.com/services/chat/openapi.json

  • Initially, the request returned 503 from the chat service.
  • I discovered that the Agenta SDK requires the environment variable AGENTA_HOST to build the API endpoint, even though AGENTA_API_INTERNAL_URL was already configured for oss-chat container.
  • After setting this, the error changed to 404.

Logs from agenta-oss-chat-6c45d4c74b-84hg2:

2025-09-20T03:20:55.549Z [INFO.] Agenta - SDK version: 0.52.2 [agenta.sdk.agenta_init] 
2025-09-20T03:20:55.549Z [INFO.] Agenta - Host: http://agenta-oss-api.agenta-oss.svc.cluster.local:8000 [agenta.sdk.agenta_init] 
2025-09-20T03:20:55.549Z [INFO.] Agenta - OLTP URL: http://agenta-oss-api.agenta-oss.svc.cluster.local:8000/api/otlp/v1/traces [agenta.sdk.tracing.tracing] 
INFO:     Started server process [9]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     10.244.0.21:45712 - "GET /services/chat/openapi.json?project_id=01993bff-4973-7b32-bfbe-4b579e9b2ea1 HTTP/1.1" 404 Not Found

Current Status:
The SDK initializes correctly and resolves the host, but the /services/chat/openapi.json endpoint consistently returns 404 when queried with project_id.

@mmabrouk any suggestion how to proceed further ? I'll be happy to provide additional info if required.

@mmabrouk
Copy link
Member Author

Hello @adarshsingh2

Thanks for all the info. That is very useful.

I understand that the chat/completion endpoints are not reachable.
First, we need to understand whether the issue is in routing the requests (nginx config) or in the chat/completion not reaching the backend.

To do that. Let's first check whether */services/completion/health is reachable. If it is the case that it is and returns {"status":"ok"} then routing works fine. If not, then we have an issue with nginx. Maybe it's reachable at another path?

Now if /services/completion/health is reachable but openapi.json does not, then very likely the issue is that the completion/chat service are not able to reach the backend.

I see in the logs that the backend is configured at agenta-oss-api.agenta-oss.svc.cluster.local:8000. Is the port correctly configured? Can the backend be reached there? Is the issue in the nginx config for the backend? or is it in the environment variable being wrongly set?

My guess is that the issue lies in this config in values.yml

internalUrls:
  api: ""  # Will be set to http://{{ release-name }}-agenta-oss-api:8000

@CLAassistant
Copy link

CLAassistant commented Oct 7, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants