Memory Leak using Workload Identity Authentication

## Summary

The OCI Secrets Store CSI Driver Provider exhibits a memory leak caused by HTTP connections not being properly closed or reused when authenticating via workload identity. This routinely causes several `OOMKilled` containers in high-throughput environments where secrets are mounted frequently.

## Environment

- **Provider version:** v0.4.2
- **Kubernetes:** OKE (Oracle Kubernetes Engine)
- **Authentication method:** Workload Identity (`x509FederationClientForOkeWorkloadIdentity`)
- **Secret rotation enabled:** Yes (2m interval)

## Symptoms

- Steady, linear memory growth over time (~5-6 Mi/hour under moderate load)
- Pods approaching 200Mi memory limit within ~24-36 hours
- Memory usage correlates directly with the volume of secret mount requests on the node

## Root Cause Analysis

Using Go's pprof tooling, we identified the following:

### Heap Profile Findings

Top memory consumers point to HTTP connection handling:

| Function | Memory | Percentage |
|----------|--------|------------|
| `bytes.growSlice` | 3084 kB | 14.82% |
| `bufio.NewWriterSize` | 3084 kB | 14.82% |
| `bufio.NewReaderSize` | 3084 kB | 14.82% |
| `compress/flate.NewWriter` | 1805 kB | 8.68% |
| `crypto/internal/fips140/aes/gcm.NewGCMForTLS12` | 1537 kB | 7.39% |

Critically, this function appears in the profile:
```
512.20kB  2.46%  85.23%  3596.15kB  17.28%  github.com/oracle/oci-go-sdk/v65/common/auth.(*x509FederationClientForOkeWorkloadIdentity).getSecurityToken
```

### Goroutine Profile Findings

**Leaking pod (node with high secret mount volume):**
```
goroutine profile: total 968

478 @ ... net/http.(*persistConn).readLoop
478 @ ... net/http.(*persistConn).writeLoop
```

**Healthy pod (node with zero secret mounts):**
```
goroutine profile: total 9
```

The leaking pod has **478 persistent HTTP connections** that are never closed or reused.

## Reproduction

1. Deploy the OCI provider with workload identity authentication
2. Schedule workloads that mount secrets from OCI Vault on a node
3. Over time, observe memory growth in the provider pod on that node
4. Collect goroutine profile: `curl http://localhost:6060/debug/pprof/goroutine?debug=1`
5. Compare goroutine count between high-traffic and idle nodes

## Expected Behavior

HTTP connections to OCI endpoints should be:
- Reused via connection pooling, OR
- Properly closed after use

## Suggested Fix

The issue likely resides in how `x509FederationClientForOkeWorkloadIdentity.getSecurityToken` creates HTTP clients. Potential fixes:

1. **Reuse a single `http.Client`** instance instead of creating new ones per request
2. **Ensure `resp.Body.Close()`** is called after reading responses
3. **Configure connection pool limits** via `http.Transport.MaxIdleConns` and `MaxIdleConnsPerHost`

## Impact

In environments with frequent secret mounts (e.g., CI/CD pipelines running short-lived jobs), the provider becomes unusable as pods OOM within hours to days depending on workload volume.

## Workarounds

Currently, no effective workarounds exist other than:
- Increasing memory limits (delays but doesn't prevent the issue)
- Use alternative

## Attachments

Happy to provide full heap and goroutine profile dumps if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Leak using Workload Identity Authentication #50

Summary

Environment

Symptoms

Root Cause Analysis

Heap Profile Findings

Goroutine Profile Findings

Reproduction

Expected Behavior

Suggested Fix

Impact

Workarounds

Attachments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function	Memory	Percentage
`bytes.growSlice`	3084 kB	14.82%
`bufio.NewWriterSize`	3084 kB	14.82%
`bufio.NewReaderSize`	3084 kB	14.82%
`compress/flate.NewWriter`	1805 kB	8.68%
`crypto/internal/fips140/aes/gcm.NewGCMForTLS12`	1537 kB	7.39%

Memory Leak using Workload Identity Authentication #50

Description

Summary

Environment

Symptoms

Root Cause Analysis

Heap Profile Findings

Goroutine Profile Findings

Reproduction

Expected Behavior

Suggested Fix

Impact

Workarounds

Attachments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions