-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
Component version: v1.32
What environment is this in?:
Our own custom cloud-provider, but this error is part of the core CA logic
What did you expect to happen?:
When CA calculates resource utilisation to determine underutilised nodes, it should reliably be able to calculate this utilisation value.
What happened instead?:
We see that at random time periods that CA calculates "0% utilisation". See logs below
2025-11-04 17:12:00 | {"log":"Node <node_name_1> is underutilized: memory requested (0% of allocatable) is below the scale-down utilization threshold","pid":"1","severity":"INFO","source":"eligibility.go:168"}
2025-11-04 17:12:00 | {"log":"Node <node_name_2> is underutilized: memory requested (0% of allocatable) is below the scale-down utilization threshold","pid":"1","severity":"INFO","source":"eligibility.go:168"}
2025-11-04 17:11:04 | {"log":"Node <node_name_3> is underutilized: memory requested (0% of allocatable) is below the scale-down utilization threshold","pid":"1","severity":"INFO","source":"eligibility.go:168"}
I've added diagnostic logs to the CalculateUtilizationOfResource() function here and it's clear that when this happens, nodeInfo.Pods() does not return anything which means that podsRequest ends up being 0 which is returned.
I've verified that the nodes in question do in fact contain running pods (which are not daemonset/mirror pods nor are they terminating)
I've noticed that this "0% utilisation" happens randomly during that day and does not seem to follow any pattern. Moreover, I also did not notice any error logs around the time this occurs
How to reproduce it (as minimally and precisely as possible):
I'm not sure this is relevant. I see this randomly and have not found a way to reproduce this
Anything else we need to know?:
Any pointers to what else I can look at would be helpful