-
Notifications
You must be signed in to change notification settings - Fork 558
Description
📜 Description
We recently transitioned some services over to kubernetes. These are customer facing services, so we wanted to turn on application metrics, which attached a small envoy container to our main one. However, we soon had issues where the envoy container got OOM killed, which caused an outage as the entire pod was down, and our service became unavailable to customers.
👟 Reproduction steps
I'm not entirely sure how to reproduce this. I'm guessing it's a question of traffic. For what it's worth, adding the envoy sidecar did not provide any metric information, our dashboards remained empty even as traffic came in, so maybe there was some kind of bug there that caused this.
👍 Expected behavior
The envoy container should not affect the main container this way. Any failures in metric collections should not impact resiliency
👎 Actual Behavior
described above.
☸ Kubernetes version
1.33
Cloud provider
🌍 Browser
Chrome
🧱 Your Environment
We're using a single ELB that handles TLS termination, and then passes the requests off to Traefik which works as a reverse proxy and can auto-discover services on its own.
✅ Proposed Solution
No response
👀 Have you spent some time to check if this issue has been raised before?
- I checked and didn't find any similar issue
🏢 Have you read the Code of Conduct?
- I have read the Code of Conduct