-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Hello,
We experienced an incident with the certificate exporter deployed as a DaemonSet on one node. After some prerequisite issue on the node (details unclear), the exporter began exhibiting extremely high disk activity: throughput reached 1.75 GB/s and IOPS hit 4.5k io/s. For now, we cannot replicate this issue, but I'm sharing a relevant piece of configuration and asking for advice on mitigation.
Relevant configuration:
cache:
# -- Enable caching of Kubernetes objects to prevent scraping timeouts
enabled: true
# -- Maximum time an object can stay in cache unrefreshed (seconds) - it will be at least half of that
maxDuration: 300
kubeApiRateLimits:
# -- Should requests to the Kubernetes API server be rate-limited
enabled: false
# -- Maximum rate of queries sent to the API server (per second)
queriesPerSecond: 5
# -- Burst bucket size for queries sent to the API server
burstQueries: 10
Question:
Would enabling kubeApiRateLimits help prevent such behavior if the node is under pressure? Or is the uncontrolled read operation unrelated to API rate limiting?
Additionally, I suspect that, under node pressure, the following part of the code may trigger unthrottled read operations:
// internal/certificate.go
func readFile(file string) ([]byte, error) {
contents, err := os.ReadFile(file)
if err == nil || !os.IsNotExist(err) {
return contents, err
}
fsys := os.DirFS(".")
if filepath.IsAbs(file) {
fsys = os.DirFS("/")
}
realPath, err := resolveSymlink(fsys, file)
if err != nil {
return nil, err
}
return os.ReadFile(realPath)
}Any insight or recommendations to prevent this kind of incident in the future are appreciated.