-
Notifications
You must be signed in to change notification settings - Fork 22
feat: add database resource limit fault #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9e8e0c7
ad38f7e
8cfde82
e8f041d
40d8334
53105fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| --- | ||
| - name: Retrieve workload information for target services | ||
| kubernetes.core.k8s_info: | ||
| api_version: apps/v1 | ||
| kind: "{{ workload.kind }}" | ||
| kubeconfig: "{{ faults_cluster.kubeconfig }}" | ||
| name: "{{ workload.name }}" | ||
| namespace: "{{ spec.namespace.name }}" | ||
| register: faults_workloads_info | ||
| loop: "{{ spec.workloads }}" | ||
| loop_control: | ||
| label: "{{ workload.kind | lower }}/{{ workload.name }}" | ||
| loop_var: workload | ||
| when: | ||
| - workload.name in ['valkey', 'postgresql'] # Only target specific services | ||
|
|
||
| - name: Patch workloads with restrictive resource limits | ||
| kubernetes.core.k8s: | ||
| kubeconfig: "{{ faults_cluster.kubeconfig }}" | ||
| state: patched | ||
| api_version: "{{ result.resources[0].apiVersion }}" | ||
| kind: "{{ result.resources[0].kind }}" | ||
| name: "{{ result.resources[0].metadata.name }}" | ||
| namespace: "{{ result.resources[0].metadata.namespace }}" | ||
| definition: | ||
| spec: | ||
| template: | ||
| spec: | ||
| containers: | ||
| - name: "{{ result.resources[0].spec.template.spec.containers[0].name }}" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe do something like this instead. Should the container being targeted not be the first in the index, this will overwrite it (and the entire array itself), which is not desired. |
||
| resources: | ||
| limits: | ||
| memory: "256Mi" | ||
| cpu: "200m" | ||
| requests: | ||
| memory: "128Mi" | ||
| cpu: "100m" | ||
| loop: "{{ faults_workloads_info.results }}" | ||
| loop_control: | ||
| label: "{{ result.resources[0].kind | lower }}/{{ result.resources[0].metadata.name }}" | ||
| loop_var: result | ||
| when: | ||
| - faults_workloads_info is defined | ||
| - result.resources | length == 1 | ||
|
|
||
| - name: Restart workloads to apply new resource limits | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Off my head, I'm pretty sure this unnecessary. Once the deployment has been patched, Kubernetes should already redeploy the pod. |
||
| kubernetes.core.k8s: | ||
| kubeconfig: "{{ faults_cluster.kubeconfig }}" | ||
| state: patched | ||
| api_version: "{{ result.resources[0].apiVersion }}" | ||
| kind: "{{ result.resources[0].kind }}" | ||
| name: "{{ result.resources[0].metadata.name }}" | ||
| namespace: "{{ result.resources[0].metadata.namespace }}" | ||
| definition: | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| kubectl.kubernetes.io/restartedAt: "{{ ansible_date_time.iso8601 }}" | ||
| loop: "{{ faults_workloads_info.results }}" | ||
| loop_control: | ||
| label: "{{ result.resources[0].kind | lower }}/{{ result.resources[0].metadata.name }}" | ||
| loop_var: result | ||
| when: | ||
| - faults_workloads_info is defined | ||
| - result.resources | length == 1 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| --- | ||
| fault: [] | ||
| alerts: | ||
| - id: CPUSpend | ||
| group_id: otel-demo-namespace-1 | ||
| metadata: | ||
| description: CPU spend increased by 20 percent | ||
| - id: MemorySpend | ||
| group_id: otel-demo-namespace-1 | ||
| metadata: | ||
| description: Memory spend has increased by 20 percent | ||
| groups: | ||
| - id: otel-demo-namespace-1 | ||
| kind: Namespace | ||
| name: otel-demo | ||
| namespace: otel-demo | ||
| root_cause: true | ||
| aliases: | ||
| - - otel-demo-namespace-1 | ||
| propagations: [] | ||
| recommended_actions: | ||
| - solution: | ||
| id: "no_action" | ||
| actions: | ||
| - no changes is needed in application | ||
| - update opencost alert to prevent false alerts |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| --- | ||
| metadata: | ||
| complexity: Low | ||
| id: 32 | ||
| name: Low Resource Limits | ||
| platform: kubernetes | ||
| spec: | ||
| environment: | ||
| applications: | ||
| otel_demo: | ||
| enabled: true | ||
| tools: | ||
| category: sre | ||
| selected: | ||
| - kubernetes-topology-monitor | ||
| faults: | ||
| - custom: | ||
| name: misconfigured-resource-quota | ||
| misconfigured_network_policy: | ||
| workload: | ||
| kind: Deployment | ||
| name: frontend | ||
| namespace: "{{ applications_helm_releases.otel_demo.namespace }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vincent (@VincentCCandela), Thank you for taking a pass at this.
The intention here would be 2-fold:
Please let me know if that is not the case.