Add EmbeddingServer CRD for deploying HuggingFace embedding models #3323

ptelang · 2026-01-16T18:00:51Z

Summary

This PR introduces the EmbeddingServer Custom Resource Definition (CRD) for the ToolHive Kubernetes Operator, enabling deployment and management of HuggingFace Text Embeddings Inference (TEI) servers.

Key Features

New CRD: EmbeddingServer for deploying HuggingFace Text Embeddings Inference servers in Kubernetes
Model Management: Supports downloading and caching embedding models from HuggingFace
Persistent Storage: Optional PVC support for model caching to speed up restarts after initial download
Resource Configuration: Full control over deployment resources, environment variables, and pod templates
Group Integration: Can be associated with MCPGroups for organization and management
Status Management: Comprehensive condition-based status tracking:
- ModelReady: Indicates whether the embedding model is downloaded and ready
- VolumeReady: Indicates whether the PVC for model caching is ready
- ImageValidated: Validates the container image
- PodTemplateValid: Validates the pod template configuration

Components Added

CRD Definition: cmd/thv-operator/api/v1alpha1/embeddingserver_types.go
Controller: cmd/thv-operator/controllers/embeddingserver_controller.go
Unit Tests: Comprehensive test coverage for the controller
Example Configurations:
- basic-embedding.yaml: Minimal configuration with default settings
- embedding-with-cache.yaml: Configuration with persistent model cache (10Gi PVC)
- embedding-advanced.yaml: Advanced setup with custom resources and environment variables
Documentation: README with detailed examples and usage instructions

Test plan

Unit tests for EmbeddingServer controller
Example YAML configurations validated
E2E operator tests (Chainsaw) for EmbeddingServer CRD
Manual testing of embedding server deployment in Kubernetes cluster
Verification of model caching with persistent volumes

Large PR Justification

Multiple related changes that would break if separated

Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

Large PR justification has been provided. Thank you!

github-actions · 2026-01-16T18:09:06Z

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

jerm-dro

This mostly looks good, but I have a few blocking changes:

Let's use a Secret to plumb huggingface tokens into the EmbeddingServer
Let's treat the server as a StatefulSet rather than attaching PVCs to a deployment

cmd/thv-operator/api/v1alpha1/embeddingserver_types.go

cmd/thv-operator/controllers/embeddingserver_controller.go

...insaw/operator/single-tenancy/test-scenarios/embeddingserver/with-cache/embeddingserver.yaml

...hainsaw/operator/single-tenancy/test-scenarios/embeddingserver/with-cache/chainsaw-test.yaml

cmd/thv-operator/controllers/embeddingserver_controller.go

ChrisJBurns · 2026-01-19T20:57:12Z

@ptelang Thanks for this. Is there any way we can split this into multiple PRs? 4k diffs is quite a large change and I feel folks will miss stuff in the cracks due to trying to review the entire thing. I've also noticed that the controller has 1k lines of code but only 300 lines of unit tests. Is there a way we can increase the test coverage to ensure the code does what we want?

ChrisJBurns · 2026-01-19T20:59:06Z

@claude Can you please review this PR using the necessary subagents in this repository. Leverage your knowledge of software engineering best practices (such as SOLID / DRY etc) and Kubernetes controller in order to support your review.

claude · 2026-01-19T20:59:42Z

Claude finished @ChrisJBurns's task in 5m 24s —— View job

PR Review Complete

I've conducted a comprehensive review of the EmbeddingServer CRD implementation using the kubernetes-expert and code-reviewer specialized agents. Here are the findings:

Tasks

Review CRD design and Kubernetes patterns (kubernetes-expert)
Review code quality and ToolHive best practices (code-reviewer)
Analyze test coverage and quality
Consolidate findings and provide comprehensive feedback

Critical Issues (Blocking)

1. Security: EnvVar Type Lacks Secret Support ⛔

Location: cmd/thv-operator/api/v1alpha1/mcpserver_types.go:225-234

The custom EnvVar type only supports plain string values:

type EnvVar struct {
    Name  string `json:"name"`
    Value string `json:"value"`  // ⚠️ No support for secrets!
}

Problem: HuggingFace models often require authentication tokens (especially for private models or to avoid rate limits). Storing these as plain text environment variables is a security vulnerability - tokens will be exposed in Git commits, logs, and kubectl describe output.

This aligns with @jerm-dro's blocking review comment about using Secrets for HF tokens.

Recommendation: Extend the EnvVar type to support ValueFrom.SecretKeyRef like Kubernetes native envvars. Your codebase already has the SecretKeyRef type in MCPExternalAuthConfig that can be reused:

type EnvVar struct {
    Name string `json:"name"`
    // Value is the plain text value (mutually exclusive with ValueFrom)
    // +optional
    Value string `json:"value,omitempty"`
    // ValueFrom is a reference to a secret (mutually exclusive with Value)
    // +optional
    ValueFrom *EnvVarSource `json:"valueFrom,omitempty"`
}

type EnvVarSource struct {
    // SecretKeyRef selects a key from a secret
    // +optional
    SecretKeyRef *SecretKeyRef `json:"secretKeyRef,omitempty"`
}

Then update the controller to validate secret references exist and add RBAC markers:

//+kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch

2. Kubernetes: Deployment + PVC Scaling Issue ⛔

Location: cmd/thv-operator/controllers/embeddingserver_controller.go:740-752

The controller creates a Deployment with a single PVC that all replicas try to mount:

podTemplate.Spec.Volumes = []corev1.Volume{
    {
        Name: "model-cache",
        VolumeSource: corev1.VolumeSource{
            PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{
                ClaimName: pvcName,  // ⚠️ Single PVC for all replicas
            },
        },
    },
}

This validates @jerm-dro's blocking review comment about converting to StatefulSet.

Problem: When replicas > 1:

With ReadWriteOnce (default): Only the first pod can mount the PVC. Additional pods get stuck in Pending state with Multi-Attach error
With ReadWriteMany: Works but requires specific storage classes (NFS, CephFS) that aren't available in all environments

Evidence: The CRD defaults to ReadWriteOnce (cmd/thv-operator/api/v1alpha1/embeddingserver_types.go:119), which will fail when users try to scale.

Why StatefulSet Solves This:

Uses volumeClaimTemplates - each pod gets its own PVC automatically
Each replica (embedding-0, embedding-1, embedding-2) gets its own storage (model-cache-embedding-0, model-cache-embedding-1, etc.)
Works perfectly with ReadWriteOnce (the most common storage type)
Provides stable pod identity and ordered operations

Recommendation: Use StatefulSet when modelCache.enabled=true and replicas > 1. Your MCPServer controller already uses StatefulSet (see cmd/thv-operator/controllers/mcpserver_controller.go:1384-1395), so this would align with existing patterns.

Trade-off: Each replica will download and cache the model separately (N × storage cost), but this is the correct architecture for truly independent replicas. If shared caching is critical, you'd need a more complex sidecar pattern.

3. Missing Secret Reference Validation ⛔

Location: cmd/thv-operator/controllers/embeddingserver_controller.go:649-664

After fixing issue #1 above, the controller will need to validate that referenced secrets actually exist before creating the Deployment. Otherwise, pods will fail with CreateContainerConfigError.

See cmd/thv-operator/controllers/virtualmcpserver_deployment.go:599-611 for the validation pattern used in VirtualMCPServer.

Major Issues

4. Code Quality: Silent Error Drops

Location: cmd/thv-operator/controllers/embeddingserver_controller.go:554-556, 593-595, 884-886

Multiple places silently drop errors:

if err := ctrl.SetControllerReference(embedding, pvc, r.Scheme); err != nil {
    return nil  // ❌ Error silently dropped
}

Impact: If SetControllerReference fails, the PVC/Deployment/Service is created without an owner reference, breaking cascade deletion when the EmbeddingServer is deleted.

Recommendation: At minimum, log the error. Better yet, return it to the caller.

5. Architecture: Multiple Status Updates Per Reconciliation

Location: Throughout embeddingserver_controller.go (lines 329, 369, 402-404, 997)

The controller calls r.Status().Update() multiple times per reconciliation loop, which:

Creates race conditions when status changes between reads
Increases API server load
Can cause reconciliation thrashing

Recommendation: Adopt the StatusCollector pattern used in MCPRegistry controller (see cmd/thv-operator/controllers/mcpregistry_controller.go:162-164):

statusCollector := embeddingserverstatus.NewCollector(embedding)
// Make all status changes through collector
statusCollector.SetPhase(...)
statusCollector.SetCondition(...)
// Single batch update at end
statusCollector.Apply(ctx, r.Client)

6. Test Coverage: Missing Critical Path Tests

Metrics:

Controller code: 1,031 lines
Test code: 316 lines (30.7% ratio)
Tests cover: Only helper methods (GetPort(), GetReplicas(), labelsForEmbedding())
NOT tested: Reconciliation logic, image validation, PVC creation, deployment drift detection, finalizer cleanup

This addresses @ChrisJBurns's comment about 1k lines of controller code with only 300 lines of tests.

Critical paths lacking unit tests:

❌ Reconcile() method
❌ validateImage() - what happens when image is invalid?
❌ ensurePVC() - PVC creation and binding checks
❌ deploymentNeedsUpdate() - drift detection logic
❌ handleDeletion() - finalizer cleanup

E2E tests exist (basic, with-cache, lifecycle, multi-tenancy), which is good, but unit tests are needed for:

Fast feedback during development
Testing error conditions that are hard to reproduce in e2e
Validating edge cases

Recommendation: Add unit tests using mocked Kubernetes client. Example:

func TestEmbeddingServerReconciler_ValidateImage_Invalid(t *testing.T) {
    // Test invalid image rejection and status update
}

func TestEmbeddingServerReconciler_DeploymentNeedsUpdate_ImageChange(t *testing.T) {
    // Test deployment drift detection
}

Code Quality Issues

7. DRY Violations

PVC Name Calculation (lines 452, 512, 741):

pvcName := fmt.Sprintf("%s-model-cache", embedding.Name)

Repeated 3 times. Extract to method: getPVCName(embedding).

Environment Variable Logic (lines 650-663 vs 935-942):
Building env vars is duplicated in buildEnvVars() and deploymentNeedsUpdate(). Extract to shared method.

Status Update Pattern (lines 369-372, 402-404, 417-419, etc.):
The pattern if statusErr := r.Status().Update(...) { log.Error(...) } appears 5+ times in validateImage() alone.

8. CLAUDE.md Violation: Method Organization

Issue: Public methods should be at the top of the file, private methods at the bottom (per /home/runner/work/toolhive/toolhive/CLAUDE.md).

Current order:

Line 62: Reconcile (public) ✓
Lines 127-1022: Private helper methods
Line 1024: SetupWithManager (public) ❌ Should be near top

Recommendation: Move SetupWithManager to line ~123 (right after Reconcile).

9. Inconsistent ObservedGeneration

Some conditions set ObservedGeneration (line 465), others don't (lines 372, 404, etc.). This field should always be set to indicate which spec version the condition applies to.

Minor Issues

10. ModelCache Enabled Field Design

Location: cmd/thv-operator/api/v1alpha1/embeddingserver_types.go:98-120

type ModelCacheConfig struct {
    Enabled bool `json:"enabled,omitempty"`  // ⚠️ omitempty with bool
    // +kubebuilder:default=true

Using omitempty with booleans means false values are omitted from JSON, which can cause confusion when the default is true. Users can't explicitly disable caching without removing the entire modelCache section.

Recommendation: Either remove omitempty or use *bool for tri-state logic.

11. Unused Context Parameters

Lines 562, 851 use _ context.Context for unused parameters. Either remove them or add a comment explaining they're reserved for future use.

Positive Observations ✅

Despite the issues, there are many good practices:

✅ Early validation with fail-fast pattern
✅ Proper use of finalizers for cleanup
✅ Structured reconciliation with helper methods
✅ Image validation integration
✅ PodTemplateSpec validation using shared builder pattern
✅ Excellent E2E test coverage (basic, with-cache, lifecycle, multi-tenancy scenarios)
✅ Proper label management with standard Kubernetes labels
✅ Conflict resolution logic (updateDeploymentWithRetry)
✅ Resource ownership with SetControllerReference
✅ Phase-based status for user visibility

Summary and Recommendations

Addressing Existing Review Comments

@jerm-dro's blocking comments:

✅ Use Secret for HF tokens: Valid and critical - see Issue fix(typo): corrects readme #1 above
✅ Convert to StatefulSet: Valid and blocking - see Issue Do we want the container monitor? #2 above
✅ Log ignored errors: Valid - see embeddingserver_controller.go:137

@ChrisJBurns's questions:

Splitting the PR: This is feasible but would require careful sequencing:
- PR1: CRD types only (270 lines)
- PR2: Controller (1031 lines) + tests
- PR3: E2E tests and examples
However, given the blocking issues identified above, I'd recommend fixing the critical issues first before splitting, as the StatefulSet change will touch both CRD and controller.
Test coverage: Confirmed - controller has 1k lines but only 300 lines of tests covering trivial helpers. Needs unit tests for reconciliation logic.

Priority Recommendations

P0 (Must Fix Before Merge):

Add SecretKeyRef support to EnvVar type for secure token handling
Convert to StatefulSet when modelCache.enabled=true and replicas > 1
Add secret reference validation in controller
Fix silent error drops (log at minimum)
Add RBAC markers for secret access

P1 (Should Fix Soon):
6. Add unit tests for critical reconciliation paths
7. Implement StatusCollector pattern for atomic status updates
8. Fix method organization (public methods at top)
9. Extract duplicated code (PVC name, env var logic)

P2 (Nice to Have):
10. Fix ModelCache.Enabled omitempty issue
11. Make ObservedGeneration setting consistent
12. Remove unused context parameters

Overall Assessment

Functionality: ✅ The controller works for single-replica scenarios and includes excellent E2E tests.

Production Readiness: ⚠️ Blocked by critical issues:

Security vulnerability (plain text tokens)
Scaling failure (Deployment + PVC)
Missing unit tests for critical paths

The implementation shows good understanding of Kubernetes operator patterns and follows most ToolHive conventions. With the recommended fixes, this will be a solid addition to the operator.

ptelang · 2026-01-20T04:27:11Z

@ChrisJBurns

Is there any way we can split this into multiple PRs? 4k diffs is quite a large change and I feel folks will miss stuff in the cracks due to trying to review the entire thing.

I will split the PRs tomorrow.

I've also noticed that the controller has 1k lines of code but only 300 lines of unit tests. Is there a way we can increase the test coverage to ensure the code does what we want?

Good point. I added more unit tests.

Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>

…ction

github-actions bot previously requested changes Jan 16, 2026

View reviewed changes

github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Jan 16, 2026

ptelang requested a review from jerm-dro January 16, 2026 18:02

ptelang force-pushed the add-embedding-engine branch from 97bc58c to 8d13462 Compare January 16, 2026 18:04

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026

ptelang force-pushed the add-embedding-engine branch from 8d13462 to 6203f70 Compare January 16, 2026 18:08

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026

jerm-dro requested changes Jan 17, 2026

View reviewed changes

ptelang force-pushed the add-embedding-engine branch 2 times, most recently from d549ddb to e5af0d2 Compare January 20, 2026 14:09

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 20, 2026

ptelang force-pushed the add-embedding-engine branch from e5af0d2 to 149c710 Compare January 20, 2026 14:14

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 20, 2026

ptelang force-pushed the add-embedding-engine branch 2 times, most recently from bda5c39 to 03231c5 Compare January 20, 2026 15:06

github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Jan 20, 2026

Modify embeddingserver e2e tests to support slow model file downloads

931ad7c

ptelang force-pushed the add-embedding-engine branch from d19f9c5 to 931ad7c Compare January 20, 2026 21:41

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 20, 2026

jerm-dro added 2 commits January 20, 2026 15:04

add envtest for EmbeddingServer

d32eb3f

add tests that demonstrate gaps

62a039b

Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 20, 2026

Fix bugs in the tests

05e1f4f

ptelang force-pushed the add-embedding-engine branch from 5b15f05 to 05e1f4f Compare January 21, 2026 14:13