-
Notifications
You must be signed in to change notification settings - Fork 170
Add EmbeddingServer CRD for deploying HuggingFace embedding models #3323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
97bc58c to
8d13462
Compare
8d13462 to
6203f70
Compare
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
jerm-dro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good, but I have a few blocking changes:
- Let's use a Secret to plumb huggingface tokens into the EmbeddingServer
- Let's treat the server as a StatefulSet rather than attaching PVCs to a deployment
...insaw/operator/single-tenancy/test-scenarios/embeddingserver/with-cache/embeddingserver.yaml
Show resolved
Hide resolved
...hainsaw/operator/single-tenancy/test-scenarios/embeddingserver/with-cache/chainsaw-test.yaml
Outdated
Show resolved
Hide resolved
|
@ptelang Thanks for this. Is there any way we can split this into multiple PRs? 4k diffs is quite a large change and I feel folks will miss stuff in the cracks due to trying to review the entire thing. I've also noticed that the controller has 1k lines of code but only 300 lines of unit tests. Is there a way we can increase the test coverage to ensure the code does what we want? |
|
@claude Can you please review this PR using the necessary subagents in this repository. Leverage your knowledge of software engineering best practices (such as SOLID / DRY etc) and Kubernetes controller in order to support your review. |
|
Claude finished @ChrisJBurns's task in 5m 24s —— View job PR Review CompleteI've conducted a comprehensive review of the EmbeddingServer CRD implementation using the kubernetes-expert and code-reviewer specialized agents. Here are the findings: Tasks
Critical Issues (Blocking)1. Security: EnvVar Type Lacks Secret Support ⛔Location: The custom type EnvVar struct {
Name string `json:"name"`
Value string `json:"value"` // ⚠️ No support for secrets!
}Problem: HuggingFace models often require authentication tokens (especially for private models or to avoid rate limits). Storing these as plain text environment variables is a security vulnerability - tokens will be exposed in Git commits, logs, and This aligns with @jerm-dro's blocking review comment about using Secrets for HF tokens. Recommendation: Extend the type EnvVar struct {
Name string `json:"name"`
// Value is the plain text value (mutually exclusive with ValueFrom)
// +optional
Value string `json:"value,omitempty"`
// ValueFrom is a reference to a secret (mutually exclusive with Value)
// +optional
ValueFrom *EnvVarSource `json:"valueFrom,omitempty"`
}
type EnvVarSource struct {
// SecretKeyRef selects a key from a secret
// +optional
SecretKeyRef *SecretKeyRef `json:"secretKeyRef,omitempty"`
}Then update the controller to validate secret references exist and add RBAC markers: //+kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch2. Kubernetes: Deployment + PVC Scaling Issue ⛔Location: The controller creates a Deployment with a single PVC that all replicas try to mount: podTemplate.Spec.Volumes = []corev1.Volume{
{
Name: "model-cache",
VolumeSource: corev1.VolumeSource{
PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{
ClaimName: pvcName, // ⚠️ Single PVC for all replicas
},
},
},
}This validates @jerm-dro's blocking review comment about converting to StatefulSet. Problem: When
Evidence: The CRD defaults to Why StatefulSet Solves This:
Recommendation: Use StatefulSet when Trade-off: Each replica will download and cache the model separately (N × storage cost), but this is the correct architecture for truly independent replicas. If shared caching is critical, you'd need a more complex sidecar pattern. 3. Missing Secret Reference Validation ⛔Location: After fixing issue #1 above, the controller will need to validate that referenced secrets actually exist before creating the Deployment. Otherwise, pods will fail with See Major Issues4. Code Quality: Silent Error DropsLocation: Multiple places silently drop errors: if err := ctrl.SetControllerReference(embedding, pvc, r.Scheme); err != nil {
return nil // ❌ Error silently dropped
}Impact: If Recommendation: At minimum, log the error. Better yet, return it to the caller. 5. Architecture: Multiple Status Updates Per ReconciliationLocation: Throughout The controller calls
Recommendation: Adopt the StatusCollector pattern used in MCPRegistry controller (see statusCollector := embeddingserverstatus.NewCollector(embedding)
// Make all status changes through collector
statusCollector.SetPhase(...)
statusCollector.SetCondition(...)
// Single batch update at end
statusCollector.Apply(ctx, r.Client)6. Test Coverage: Missing Critical Path TestsMetrics:
This addresses @ChrisJBurns's comment about 1k lines of controller code with only 300 lines of tests. Critical paths lacking unit tests:
E2E tests exist (basic, with-cache, lifecycle, multi-tenancy), which is good, but unit tests are needed for:
Recommendation: Add unit tests using mocked Kubernetes client. Example: func TestEmbeddingServerReconciler_ValidateImage_Invalid(t *testing.T) {
// Test invalid image rejection and status update
}
func TestEmbeddingServerReconciler_DeploymentNeedsUpdate_ImageChange(t *testing.T) {
// Test deployment drift detection
}Code Quality Issues7. DRY ViolationsPVC Name Calculation (lines 452, 512, 741): pvcName := fmt.Sprintf("%s-model-cache", embedding.Name)Repeated 3 times. Extract to method: Environment Variable Logic (lines 650-663 vs 935-942): Status Update Pattern (lines 369-372, 402-404, 417-419, etc.): 8. CLAUDE.md Violation: Method OrganizationIssue: Public methods should be at the top of the file, private methods at the bottom (per Current order:
Recommendation: Move 9. Inconsistent ObservedGenerationSome conditions set Minor Issues10. ModelCache Enabled Field DesignLocation: type ModelCacheConfig struct {
Enabled bool `json:"enabled,omitempty"` // ⚠️ omitempty with bool
// +kubebuilder:default=trueUsing Recommendation: Either remove 11. Unused Context ParametersLines 562, 851 use Positive Observations ✅Despite the issues, there are many good practices:
Summary and RecommendationsAddressing Existing Review Comments@jerm-dro's blocking comments:
@ChrisJBurns's questions:
Priority RecommendationsP0 (Must Fix Before Merge):
P1 (Should Fix Soon): P2 (Nice to Have): Overall AssessmentFunctionality: ✅ The controller works for single-replica scenarios and includes excellent E2E tests. Production Readiness:
The implementation shows good understanding of Kubernetes operator patterns and follows most ToolHive conventions. With the recommended fixes, this will be a solid addition to the operator. |
I will split the PRs tomorrow.
Good point. I added more unit tests. |
d549ddb to
e5af0d2
Compare
e5af0d2 to
149c710
Compare
bda5c39 to
03231c5
Compare
d19f9c5 to
931ad7c
Compare
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
5b15f05 to
05e1f4f
Compare
Summary
This PR introduces the
EmbeddingServerCustom Resource Definition (CRD) for the ToolHive Kubernetes Operator, enabling deployment and management of HuggingFace Text Embeddings Inference (TEI) servers.Key Features
EmbeddingServerfor deploying HuggingFace Text Embeddings Inference servers in KubernetesModelReady: Indicates whether the embedding model is downloaded and readyVolumeReady: Indicates whether the PVC for model caching is readyImageValidated: Validates the container imagePodTemplateValid: Validates the pod template configurationComponents Added
cmd/thv-operator/api/v1alpha1/embeddingserver_types.gocmd/thv-operator/controllers/embeddingserver_controller.gobasic-embedding.yaml: Minimal configuration with default settingsembedding-with-cache.yaml: Configuration with persistent model cache (10Gi PVC)embedding-advanced.yaml: Advanced setup with custom resources and environment variablesTest plan
Large PR Justification
Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com