Skip to content

Conversation

@amirejaz
Copy link
Contributor

@amirejaz amirejaz commented Nov 6, 2025

Summary

This PR implements separate workload management interfaces for CLI (Docker/Podman) and Kubernetes environments, following the same architectural pattern established by groups.Manager. This enables platform-specific optimizations while maintaining consistent patterns across ToolHive's workload management system.

Motivation

Following the successful unification of group management, we now extend this pattern to workload management with platform-specific interfaces. This enables:

  • Platform-Specific Optimization: Each runtime can have an interface tailored to its capabilities
  • Type Safety: Kubernetes workloads use k8s.Workload (with MCPServerPhase) while CLI workloads use core.Workload (with WorkloadStatus)
  • Clear Separation: Avoids coupling Kubernetes CRDs to CLI container runtime concepts
  • Unified Discovery: Enables vmcp aggregator to discover backends from both CLI and Kubernetes workloads via separate discoverers
  • Future-Proof: Easier to extend each platform independently

Implementation

Separate Manager Interfaces

CLI Manager (workloads.Manager)

  • Manages Docker/Podman containers
  • Returns core.Workload domain model
  • Supports full lifecycle operations (run, stop, delete, restart, update)
  • Includes log retrieval (GetLogs, GetProxyLogs)
  • Uses filesystem-based storage (runconfig.json)

Kubernetes Manager (k8s.Manager)

  • Manages MCPServer CRDs via Kubernetes API
  • Returns k8s.Workload domain model (with MCPServerPhase)
  • Provides read operations and group management only
  • Lifecycle managed by ToolHive operator (CRDs)
  • Log retrieval not included (use kubectl logs directly)

Platform-Specific Domain Models

core.Workload (CLI)

  • Uses runtime.WorkloadStatus enum (Running, Stopped, etc.)
  • Includes container-specific fields
  • Supports remote workloads

k8s.Workload (Kubernetes)

  • Uses mcpv1alpha1.MCPServerPhase enum (Running, Pending, Failed, etc.)
  • Includes namespace and CRD-specific fields (GroupRef)
  • Maps MCPServer CRD status to workload representation

Separate Backend Discoverers

CLI Discoverer (cliBackendDiscoverer)

  • Uses workloads.Manager interface
  • Discovers workloads from Docker/Podman containers
  • Maps runtime.WorkloadStatus to vmcp health status

Kubernetes Discoverer (k8sBackendDiscoverer)

  • Uses k8s.Manager interface
  • Discovers workloads from MCPServer CRDs
  • Maps mcpv1alpha1.MCPServerPhase to vmcp health status

Factory Functions

CLI Manager

// CLI mode only - returns error in Kubernetes
manager, err := workloads.NewManager(ctx)

Kubernetes Manager

// Kubernetes mode only
k8sManager, err := k8s.NewManagerFromContext(ctx)

Runtime Detection

  • workloads.NewManager() detects Kubernetes runtime and returns error
  • Forces explicit use of k8s.NewManagerFromContext() for Kubernetes
  • Prevents accidental mixing of interfaces

Key Features

Group Integration

  • Both managers support ListWorkloadsInGroup and MoveToGroup
  • CLI: Updates runconfig.json files
  • Kubernetes: Updates MCPServer CRD GroupRef field
  • Seamless integration with groups.Manager

Unified Backend Discovery

  • vmcp aggregator automatically selects the right discoverer based on runtime
  • NewCLIBackendDiscoverer for CLI workloads
  • NewK8SBackendDiscoverer for Kubernetes workloads
  • Both discoverers implement the same BackendDiscoverer interface

Comprehensive Testing

  • Full unit test coverage for both implementations
  • Separate test files: cli_manager_test.go and k8s_test.go
  • Table-driven tests for all operations
  • Mock-based testing with platform-specific mocks
  • Edge case and error handling coverage

Files Added

CLI Manager:

  • pkg/workloads/cli_manager.go - CLI implementation
  • pkg/workloads/cli_manager_test.go - CLI tests (2214 lines)

Kubernetes Manager:

  • pkg/workloads/k8s/manager.go - Kubernetes interface definition
  • pkg/workloads/k8s/k8s.go - Kubernetes implementation
  • pkg/workloads/k8s/workload.go - Kubernetes domain model
  • pkg/workloads/k8s/k8s_test.go - Kubernetes tests (758 lines)
  • pkg/workloads/k8s/mocks/mock_manager.go - Generated mocks

Discoverers:

  • pkg/vmcp/aggregator/cli_discoverer.go - CLI backend discoverer
  • pkg/vmcp/aggregator/k8s_discoverer.go - Kubernetes backend discoverer
  • pkg/vmcp/aggregator/cli_discoverer_test.go - CLI discoverer tests
  • pkg/vmcp/aggregator/k8s_discoverer_test.go - Kubernetes discoverer tests

Files Modified

  • pkg/workloads/manager.go - Factory functions for CLI manager (returns error in Kubernetes)
  • pkg/workloads/manager_test.go - Factory function tests
  • cmd/vmcp/app/commands.go - Runtime-based discoverer selection

Benefits

  1. Type Safety: Platform-specific domain models prevent mixing CLI and Kubernetes concepts
  2. Separation of Concerns: Each manager interface is tailored to its platform's capabilities
  3. Maintainability: Clear boundaries between CLI and Kubernetes implementations
  4. Extensibility: Easy to add platform-specific features without affecting the other
  5. Testability: Each implementation can be tested independently with platform-specific mocks
  6. Integration: Enables unified features like vmcp backend discovery via separate discoverers

Testing

  • All unit tests pass
  • Linting passes
  • Verified CLI workload operations (run, stop, delete, restart, logs)
  • Verified Kubernetes MCPServer operations (get, list, group operations)
  • Tested vmcp discovery with both CLI and Kubernetes workloads
  • Verified group integration (ListWorkloadsInGroup, MoveToGroup)
  • Verified runtime detection and error handling

Design Decisions

Why Separate Interfaces?

  1. Different Capabilities: Kubernetes workloads are managed by the operator (CRDs), not directly by the manager
  2. Different Domain Models: CLI uses WorkloadStatus, Kubernetes uses MCPServerPhase
  3. Avoid Coupling: Prevents Kubernetes from depending on CLI container runtime concepts
  4. Future Flexibility: Allows each platform to evolve independently

Why No GetLogs/GetProxyLogs in K8s Manager?

  • Kubernetes log retrieval requires a clientset (not just controller-runtime client)
  • Users can use kubectl logs directly, which is more flexible
  • Can be added later when needed with proper implementation

Related

  • Follows the architectural pattern from groups.Manager (separate CLI and CRD managers)
  • Enables unified backend discovery in vmcp aggregator via separate discoverers
  • Prepares foundation for future workload management features
  • Integrates with ToolHive operator for Kubernetes workload lifecycle

Example Usage

CLI Mode:

// CLI manager (returns error in Kubernetes)
manager, err := workloads.NewManager(ctx)
if err != nil {
    return err
}

// Works with Docker/Podman containers
workloads, err := manager.ListWorkloadsInGroup(ctx, "engineering-team")

Kubernetes Mode:

// Kubernetes manager
k8sManager, err := k8s.NewManagerFromContext(ctx)
if err != nil {
    return err
}

// Works with MCPServer CRDs
workloads, err := k8sManager.ListWorkloadsInGroup(ctx, "engineering-team")

vmcp Discovery (Automatic Selection):

// Automatically selects CLI or K8s discoverer based on runtime
var discoverer aggregator.BackendDiscoverer
if rt.IsKubernetesRuntime() {
    k8sManager, _ := k8s.NewManagerFromContext(ctx)
    discoverer = aggregator.NewK8SBackendDiscoverer(k8sManager, groupsManager, authConfig)
} else {
    cliManager, _ := workloads.NewManager(ctx)
    discoverer = aggregator.NewCLIBackendDiscoverer(cliManager, groupsManager, authConfig)
}

backends, err := discoverer.Discover(ctx, "engineering-team")

@amirejaz amirejaz marked this pull request as draft November 6, 2025 16:45
@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 65.37983% with 278 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.82%. Comparing base (c61e36f) to head (dd7c0ed).

Files with missing lines Patch % Lines
pkg/workloads/cli_manager.go 59.28% 156 Missing and 72 partials ⚠️
pkg/workloads/k8s/k8s.go 83.21% 20 Missing and 4 partials ⚠️
cmd/vmcp/app/commands.go 0.00% 13 Missing ⚠️
pkg/workloads/manager.go 40.00% 6 Missing ⚠️
pkg/vmcp/aggregator/k8s_discoverer.go 94.59% 3 Missing and 1 partial ⚠️
cmd/thv-operator/api/v1alpha1/mcpserver_types.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2487      +/-   ##
==========================================
+ Coverage   55.36%   55.82%   +0.45%     
==========================================
  Files         307      309       +2     
  Lines       29375    29580     +205     
==========================================
+ Hits        16264    16512     +248     
+ Misses      11669    11617      -52     
- Partials     1442     1451       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@dmjb dmjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this as "request changes" until we figure out how to split up this interface.

@amirejaz amirejaz marked this pull request as ready for review November 12, 2025 15:09
@amirejaz amirejaz requested review from dmjb and jhrozek November 12, 2025 15:09

// Default to 8080 if no port is specified (matches GetProxyPort behavior)
// This is needed for HTTP-based transports (SSE, streamable-http) which require a target port
return 8080
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular change could go in a dedicated PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8sClient, err := client.New(cfg, client.Options{Scheme: scheme})
if err != nil {
return nil, fmt.Errorf("failed to create Kubernetes client: %w", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already have pkg/container/kubernetes/client.go. We should probably move that to a dedicated common package so we could reuse it around. Let me do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, here it is #2560

// NewManager creates a new container manager instance.
// NewManager creates a new CLI workload manager.
// Returns Manager interface (existing behavior, unchanged).
// IMPORTANT: This function only works in CLI mode. For Kubernetes, use k8s.NewManagerFromContext() directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this is kept here so you wouldn't need to make that many changes in the codebae from the CLI point of view. Right? Wouldn't this be an issue for the proxyrunner which assumes it'll be running on k8s, wouldn't we need to change that? It used to be the case that it would try to do the right thing depending on whether it was running on k8s or not. Why not keep that functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s correct. We can create a separate PR to update the references, I’d prefer not to include too many of those changes here since this PR has already grown quite large.

Also, it would make more sense to move the CLI manager into the cli package.

As for proxyrunner, the NewManagerFromRuntime function uses the Kubernetes runtime, and the status manager then checks the environment to determine whether it’s running on Kubernetes and picks the appropriate status manager accordingly.

Does that address your concern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants