-
Notifications
You must be signed in to change notification settings - Fork 559
Description
Component(s)
collector
What happened?
Description
We use the otel operator, deployed via the helm chart. After upgrading from 0.129.1 (chart v0.92.0) to 0.136.0 (chart v0.97.1) we ran into an issue where autoscaling was no longer working, I think due to #4400. After reverting back to 0.129.1, autoscaling is no longer working, and the operator does not seem to be setting the replica count at all. I can change the spec.replicas
on the collector deployment, and the operator will never reset it (unlike in #4400, where it keeps getting set to minReplicas). I have confirmed that the HPA is still working, and the spec.replicas
field is still being changed correctly on the OpenTelemetryCollector (v1beta1). However, the operator is not copying this value over to the deployment.
I think the problem might be that the status
fields on the OpenTelemetryCollector have gotten into a bad state. In particular, status.scale.replicas
and status.scale.statusReplicas
are no longer correct, and aren't changing to match the deployment status.
status:
image: otel/opentelemetry-collector-contrib:0.136.0
scale:
replicas: 2
selector: app.kubernetes.io/component=opentelemetry-collector,app.kubernetes.io/instance=otel-collector.otel-main,app.kubernetes.io/managed-by=opentelemetry-operator,app.kubernetes.io/name=otel-main-collector,app.kubernetes.io/part-of=opentelemetry,app.kubernetes.io/version=latest,managed_by=terraform,repo=prefab,service_id=otel-collector,team=platform
statusReplicas: 0/2
version: 0.136.0
I have verified that all fields are correct, including the selector. replicas
and statusReplicas
are not correct - the deployment currently has 3 replicas, all of which are healthy.
Steps to Reproduce
- Deploy the operator
- Deploy an OpenTelemetryCollector
- Edit the
status.scale.replicas
andstatus.scale.statusReplicas
fields to no longer match the deployment - Observe that the operator stops scaling the deployment.
Expected Result
Operator should overwrite any incorrect status values.
Actual Result
Incorrect status values persist, and seem to prevent autoscaling.
Kubernetes Version
1.33.4
Operator version
0.129.1
Collector version
0.136.0
Environment information
Environment
- GKE 1.33.4-gke.1350000
Log output
No relevant log output
Additional context
Collector metadata confirms that the status is not getting updated:
- apiVersion: opentelemetry.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:image: {}
f:scale:
.: {}
f:replicas: {}
f:selector: {}
f:statusReplicas: {}
f:version: {}
manager: manager
operation: Update
subresource: status
time: "2025-10-09T20:41:00Z"
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.