GitHub actions e2e testing #4986

chitalian · 2025-10-04T01:59:20Z

No description provided.

This commit adds comprehensive end-to-end testing infrastructure: 1. **Docker Image Enhancement**: - Added worker stage to Dockerfile to include Cloudflare Workers - Workers are now built and available in the Docker image 2. **GitHub Actions Workflow**: - Created `.github/workflows/e2e-full-stack-test.yml` - Workflow builds Docker image with all services - Starts PostgreSQL, ClickHouse, MinIO, Jawn, Web, and Workers - Makes test requests and verifies data storage - Includes comprehensive logging for debugging 3. **Test Coverage**: - Tests full request flow from worker → LLM provider - Verifies data logging to PostgreSQL and ClickHouse - Validates all services can communicate properly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

The worker package.json needs to be copied before yarn install to ensure proper dependency installation in the worker-stage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Add worker-openai, worker-helicone, and worker-gateway as supervisord programs - Workers now start automatically with the container instead of manually - Disable ai-gateway (port conflict) and move to port 8794 - Fix missing env vars in jawn (CSB_API_KEY, TOGETHER_API_KEY, etc.) - Improve GitHub Actions workflow wait times and logging - Remove manual worker startup from GitHub Actions - Add comprehensive logging for all workers in GitHub Actions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Add supervisorctl status checks at start and end of service wait - This will help debug service startup issues in GitHub Actions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Disable container support in wrangler for workers running in Docker to avoid Docker-in-Docker issues. Workers were failing with "The Docker CLI could not be launched" error when trying to prepare container images for Durable Objects. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

vercel · 2025-10-04T01:59:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
helicone	Ready	Preview	Comment	Oct 4, 2025 3:40am
helicone-bifrost	Ready	Preview	Comment	Oct 4, 2025 3:40am
helicone-eu	Canceled			Oct 4, 2025 3:40am

claude · 2025-10-04T01:59:37Z

Claude finished @chitalian's task —— View job

PR Review Complete ✅

Reviewing this E2E testing infrastructure PR for:

Critical bugs or security issues
Performance improvements and optimizations
Code quality and best practices
Potential refactoring opportunities
Breaking changes or API issues

Score: 4/10 - Several important issues need addressing before merge

Critical Issues Found:

🚨 BLOCKER: Database Verification Logic

Location: .github/workflows/e2e-full-stack-test.yml:130-135

# These commands ignore failures with '|| true'
docker exec helicone-test psql -U postgres -d helicone_test -c "SELECT ..." || true
docker exec helicone-test curl -s "http://localhost:8123/..." || true

Problem: The E2E test could pass even if no data is persisted to databases
Fix: Remove || true and add proper error handling to fail the workflow if verification fails

⚡ Performance: Inefficient Worker Waiting

Location: .github/workflows/e2e-full-stack-test.yml:80-82

echo "Waiting for Workers to start (giving them time to initialize)..."
sleep 30

Problem: Fixed 30-second delay regardless of actual readiness
Fix: Implement proper health check polling for workers like other services

🏗️ Code Quality Issues:

Hardcoded Test Branch Trigger
- Location: .github/workflows/e2e-full-stack-test.yml:6-9
- Issue: Only triggers on github-actions-e2e-testing branch
- Fix: Should trigger on all PRs for comprehensive testing
Missing Error Handling
- Location: Throughout workflow steps
- Issue: Many commands could fail silently
- Fix: Add proper error checking and early termination
Resource Management
- Location: Container port mapping (lines 44-61)
- Issue: Exposes many unnecessary ports
- Fix: Only expose ports needed for external testing

Positive Aspects:

✅ Security: Test credentials are properly marked as dummy values
✅ Comprehensive: Tests full stack including all services
✅ Documentation: Good logging and status checking
✅ Infrastructure: Proper Docker multi-stage build
✅ Type Safety: Autogenerated TypeScript files properly updated for novita provider

Suggestions Summary:

High Priority:

Fix database verification logic - Remove || true to ensure data persistence
Implement proper worker health checks - Replace fixed sleep with polling
Add error handling - Ensure failures stop the workflow appropriately

Medium Priority:

Generalize branch triggers - Enable on all PRs, not just test branch
Optimize port exposure - Only expose necessary ports
Add test result validation - Verify request actually processed successfully

Low Priority:

Add retry logic - For intermittent service startup issues
Improve logging - Structure logs for better debugging
Consider test data cleanup - For reproducible test runs

Overall Assessment: This is excellent E2E infrastructure work that will greatly improve testing confidence. The main blocker is the database verification logic that could lead to false positives. Once the critical fixes are implemented, this will be a valuable addition to the CI/CD pipeline.

greptile-apps

Greptile Overview

Summary

This PR introduces comprehensive end-to-end testing infrastructure for the Helicone platform through GitHub Actions. The changes implement a Docker-based full-stack testing environment that validates the complete system architecture including PostgreSQL, ClickHouse, Jawn backend API, Web frontend, MinIO storage, and multiple Cloudflare Workers running locally.

The core addition is a new GitHub Actions workflow (.github/workflows/e2e-full-stack-test.yml) that builds and orchestrates all Helicone services in a containerized environment. The workflow tests the full request flow from client through proxy workers to backend processing and database storage. To support this testing infrastructure, the supervisord configuration has been updated to manage three new Cloudflare Worker services (worker-openai, worker-helicone, worker-gateway) that simulate the production edge environment locally using wrangler dev.

The Dockerfile has been enhanced with a new worker-stage to properly build and install dependencies for the worker components. Several autogenerated TypeScript files have been updated to add support for a new 'novita' LLM provider, expanding Helicone's provider ecosystem. The .gitignore has been updated to exclude testing artifacts and utility files generated during the E2E testing process.

However, there are critical issues with two autogenerated TypeScript files (bifrost/lib/clients/jawnTypes/private.ts and web/lib/clients/jawnTypes/private.ts) that appear to have been corrupted during the build process, resulting in empty files instead of containing the expected type definitions.

Important Files Changed

Changed Files

Filename	Score	Overview
`.gitignore`	5/5	Added entries for E2E testing artifacts including Playwright MCP directory and prompt utility files
`bifrost/lib/clients/jawnTypes/private.ts`	0/5	CRITICAL: Autogenerated TypeScript types file is completely empty, will break bifrost application
`supervisord.conf`	4/5	Updated configuration to support E2E testing with dummy API keys and three new Cloudflare Worker services
`bifrost/lib/clients/jawnTypes/public.ts`	5/5	Added 'novita' provider to ModelProviderName enum in autogenerated types file
`web/lib/clients/jawnTypes/public.ts`	5/5	Added 'novita' provider to ModelProviderName enum in autogenerated types file
`web/lib/clients/jawnTypes/private.ts`	0/5	CRITICAL: Autogenerated TypeScript types file is completely empty, will break web application
`valhalla/jawn/src/tsoa-build/private/routes.ts`	4/5	Added 'novita' provider to TSOA-generated routes file
`valhalla/jawn/src/tsoa-build/private/swagger.json`	5/5	Added 'novita' provider to autogenerated OpenAPI specification
`Dockerfile`	4/5	Added worker-stage to build pipeline for E2E testing support with proper dependency management
`.github/workflows/e2e-full-stack-test.yml`	3/5	New comprehensive E2E testing workflow with potential issues around hardcoded values and error handling

Confidence score: 1/5

This PR has critical blocking issues that will prevent the application from building or running properly
Score reflects two completely empty autogenerated TypeScript files that will cause immediate compilation failures
Pay special attention to bifrost/lib/clients/jawnTypes/private.ts and web/lib/clients/jawnTypes/private.ts which need to be regenerated

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHub
    participant Runner as "GitHub Runner"
    participant Docker as "Docker Engine"
    participant Container as "Helicone Container"
    participant PostgreSQL
    participant ClickHouse
    participant MinIO
    participant Jawn as "Jawn (Backend)"
    participant Web as "Web Frontend"
    participant Workers as "Worker Services"
    
    User->>GitHub: "Push to github-actions-e2e-testing branch"
    GitHub->>Runner: "Trigger E2E Full Stack Test workflow"
    
    Runner->>Runner: "Checkout code"
    Runner->>Docker: "Set up Docker Buildx"
    Runner->>Docker: "Build Docker image (helicone-test:latest)"
    
    Docker->>Docker: "Multi-stage build process"
    Note over Docker: "database-stage → jawn-stage → web-stage → worker-stage → minio-stage"
    
    Runner->>Container: "Run Helicone container with exposed ports"
    Note over Container: "Ports: 3000, 8585, 8123, 9080, 9001, 5432, 8787-8793"
    
    Container->>PostgreSQL: "Start PostgreSQL service"
    Container->>ClickHouse: "Start ClickHouse service"
    Container->>MinIO: "Start MinIO service"
    
    Runner->>Container: "Check supervisord status"
    
    Runner->>PostgreSQL: "Wait for PostgreSQL readiness"
    PostgreSQL-->>Runner: "Ready signal"
    
    Runner->>ClickHouse: "Wait for ClickHouse ping"
    ClickHouse-->>Runner: "Ready signal"
    
    Container->>PostgreSQL: "Run Flyway migrations"
    Container->>ClickHouse: "Run ClickHouse migrations"
    Container->>MinIO: "Setup MinIO buckets"
    
    Container->>Jawn: "Start Jawn backend service"
    Runner->>Jawn: "Wait for health check (port 8585)"
    Jawn-->>Runner: "Health check OK"
    
    Container->>Web: "Start Web frontend service"
    Runner->>Web: "Wait for service (port 3000)"
    Web-->>Runner: "Service ready"
    
    Container->>Workers: "Start Worker services"
    Note over Workers: "OpenAI Proxy (8787), Helicone API (8788), Gateway API (8789)"
    Runner->>Workers: "Wait for OpenAI Worker (port 8787)"
    Workers-->>Runner: "Worker ready"
    
    Runner->>Runner: "All services ready!"
    
    Runner->>Workers: "Make test request to OpenAI proxy"
    Note over Runner,Workers: "POST /v1/chat/completions with test data"
    Workers-->>Runner: "Request completed (may error as expected)"
    
    Runner->>PostgreSQL: "Verify request data in PostgreSQL"
    PostgreSQL-->>Runner: "Query results"
    
    Runner->>ClickHouse: "Verify request data in ClickHouse"
    ClickHouse-->>Runner: "Query results"
    
    alt Always execute
        Runner->>Container: "Check service logs"
        Container-->>Runner: "PostgreSQL, ClickHouse, Jawn, Web, MinIO, Worker logs"
        
        Runner->>Container: "Show error logs"
        Container-->>Runner: "Jawn and Worker error logs"
    end
    
    Runner->>Container: "Cleanup - Stop and remove container"
    Container-->>Runner: "Container stopped and removed"
    
    Runner->>GitHub: "Test results and logs"
    GitHub->>User: "Workflow completion notification"

Additional Comments (2)

bifrost/lib/clients/jawnTypes/private.ts, line 1 (link)

logic: File appears to be empty but should contain autogenerated TypeScript type definitions. This will break the bifrost application's ability to interact with Jawn API. Check the TSOA build process.
web/lib/clients/jawnTypes/private.ts, line 1 (link)

logic: This auto-generated file is completely empty but should contain TypeScript type definitions. This will cause compilation errors and break the frontend application.

Context used:

Context from dashboard - When naming jobs in GitHub Actions, prefer descriptive names that clearly indicate the job's purpose... (source)
Context from dashboard - If there are multiple variants of a service (e.g., US/EU specific docker images), consider using a m... (source)

_{10 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-04T02:01:13Z

.github/workflows/e2e-full-stack-test.yml

+          docker exec helicone-test psql -U postgres -d helicone_test -c "SELECT id, created_at FROM request ORDER BY created_at DESC LIMIT 5;" || true
+
+      - name: Verify data in ClickHouse
+        run: |
+          echo "Checking request data in ClickHouse..."
+          docker exec helicone-test curl -s "http://localhost:8123/?query=SELECT%20request_id%2C%20request_created_at%20FROM%20request_response_rmt%20ORDER%20BY%20request_created_at%20DESC%20LIMIT%205%20FORMAT%20Pretty" || true


logic: Database verification steps use '|| true' which ignores failures. This means the E2E test could pass even if data isn't properly persisted. Consider failing the workflow if no data is found.

Prompt To Fix With AI

This is a comment left during a code review. Path: .github/workflows/e2e-full-stack-test.yml Line: 113:118 Comment: **logic:** Database verification steps use '|| true' which ignores failures. This means the E2E test could pass even if data isn't properly persisted. Consider failing the workflow if no data is found. How can I resolve this? If you propose a fix, please make it concise.

- Increase timeout from 30m to 60m for large Docker builds - Add GitHub Actions cache for Docker layers to speed up builds - Use docker/build-push-action for better caching support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Free up ~14GB of disk space by removing unused Android SDK, .NET, GHC, and Boost libraries before Docker build to prevent "no space left on device" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Use 'docker exec' for worker health checks since workers run on localhost inside the container, not on the host network interface. This fixes the 180s timeout waiting for workers to be ready. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Replace curl-based worker health check with 30-second sleep. Workers start reliably within this time as shown in previous runs. This avoids issues with health check execution inside container. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

chitalian and others added 5 commits October 3, 2025 21:37

Add supervisord status checks to E2E workflow

f381b80

- Add supervisorctl status checks at start and end of service wait - This will help debug service startup issues in GitHub Actions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

greptile-apps bot reviewed Oct 4, 2025

View reviewed changes

vercel bot deployed to Preview – helicone October 4, 2025 02:17 View deployment

vercel bot deployed to Preview – helicone October 4, 2025 02:35 View deployment

vercel bot deployed to Preview – helicone-eu October 4, 2025 02:37 View deployment

vercel bot deployed to Preview – helicone-bifrost October 4, 2025 02:42 View deployment

chitalian and others added 2 commits October 4, 2025 02:49

vercel bot deployed to Preview – helicone-eu October 4, 2025 03:32 View deployment

vercel bot deployed to Preview – helicone-bifrost October 4, 2025 03:37 View deployment

vercel bot deployed to Preview – helicone October 4, 2025 03:40 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GitHub actions e2e testing #4986

GitHub actions e2e testing #4986

Uh oh!

chitalian commented Oct 4, 2025

Uh oh!

vercel bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot Oct 4, 2025

Uh oh!

Uh oh!

GitHub actions e2e testing #4986

Are you sure you want to change the base?

GitHub actions e2e testing #4986

Uh oh!

Conversation

chitalian commented Oct 4, 2025

Uh oh!

vercel bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Critical Issues Found:

🚨 BLOCKER: Database Verification Logic

⚡ Performance: Inefficient Worker Waiting

🏗️ Code Quality Issues:

Positive Aspects:

Suggestions Summary:

High Priority:

Medium Priority:

Low Priority:

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Greptile Overview

Summary

Important Files Changed

Confidence score: 1/5

Sequence Diagram

Additional Comments (2)

Uh oh!

greptile-apps bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel bot commented Oct 4, 2025 •

edited

Loading

claude bot commented Oct 4, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading