Bug: turning on "collect application metrics" can cause stability issues

### 📜 Description

We recently transitioned some services over to kubernetes. These are customer facing services, so we wanted to turn on application metrics, which attached a small envoy container to our main one. However, we soon had issues where the envoy container got OOM killed, which caused an outage as the entire pod was down, and our service became unavailable to customers.

### 👟 Reproduction steps

I'm not entirely sure how to reproduce this. I'm guessing it's a question of traffic. For what it's worth, adding the envoy sidecar did not provide any metric information, our dashboards remained empty even as traffic came in, so maybe there was some kind of bug there that caused this.

### 👍 Expected behavior

The envoy container should not affect the main container this way. Any failures in metric collections should not impact resiliency 

### 👎 Actual Behavior

described above.

### ☸ Kubernetes version

1.33

### Cloud provider

<details>
AWS
</details>


### 🌍 Browser

Chrome

### 🧱 Your Environment

We're using a single ELB that handles TLS termination, and then passes the requests off to Traefik which works as a reverse proxy and can auto-discover services on its own. 

### ✅ Proposed Solution

_No response_

### 👀 Have you spent some time to check if this issue has been raised before?

- [x] I checked and didn't find any similar issue

### 🏢 Have you read the Code of Conduct?

- [x] I have read the [Code of Conduct](https://github.com/devtron-labs/devtron/blob/main/CODE_OF_CONDUCT.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: turning on "collect application metrics" can cause stability issues #6714

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior

☸ Kubernetes version

Cloud provider

🌍 Browser

🧱 Your Environment

✅ Proposed Solution

👀 Have you spent some time to check if this issue has been raised before?

🏢 Have you read the Code of Conduct?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: turning on "collect application metrics" can cause stability issues #6714

Description

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior

☸ Kubernetes version

Cloud provider

🌍 Browser

🧱 Your Environment

✅ Proposed Solution

👀 Have you spent some time to check if this issue has been raised before?

🏢 Have you read the Code of Conduct?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions