feat: add database resource limit fault #336

VincentCCandela · 2025-10-21T14:56:34Z

Limits on memory and CPU access which would simulate database strain.

…ch-Scenarios into mem_cpu_limits new fault limiting cpu and memory

VincentCCandela · 2025-10-21T15:02:42Z

I'll fix the linting issues in a bit.

rohanarora · 2025-10-21T16:44:57Z

Also @VincentCCandela can you please consider adding a guide to the docs directory based on our Slack discussion on what needs to be done to add a fault mechanism?

Red-GV · 2025-10-21T17:09:10Z

Please update the section in the Contributing.md instead. There's already a section there.

rohanarora · 2025-10-22T23:06:40Z

sre/.ansible-lint-ignore

Not sure why these files were deleted? @VincentCCandela

It may have been been deleted when I ran "ansible-lint roles/faults roles/incidents --fix" to address the linting issues. I did not intentionally delete. Same with sre/ansible.cfg

rohanarora · 2025-10-22T23:07:03Z

sre/ansible.cfg

Not sure why these files were deleted? @VincentCCandela

rohanarora · 2025-10-22T23:16:41Z

sre/roles/faults/tasks/inject_low_resource_limits.yaml

Vincent (@VincentCCandela), Thank you for taking a pass at this.

The intention here would be 2-fold:

Bump the the load such that the underlying valkey /PostgreSQL service starts to suffer

Introduce limits on the workload itself (again valkey / PosgreSQL) [and not a resource quota on the namespace]. What you have here is almost the same fault mechanism as https://github.com/itbench-hub/ITBench-Scenarios/blob/main/sre/roles/faults/tasks/inject_custom_misconfigured_resource_quota.yaml

Please let me know if that is not the case.

rohanarora

Dropped a couple of comments for you to take a look at, @VincentCCandela!

rohanarora · 2025-10-22T23:17:58Z

sre/dev/local_cluster/kind-config.yaml.1

Please remove this from the commit.

rohanarora · 2025-10-22T23:18:10Z

sre/playbooks/roles

Please remove this file.

Red-GV

A few additional comments on this side.

Red-GV · 2025-10-23T01:02:57Z

sre/a.out

Please remove this file.

Red-GV · 2025-10-23T01:04:34Z

sre/roles/faults/tasks/inject_custom.yaml

+  vars:
+    spec: "{{ fault.custom.misconfigured_service_port }}"
+  when:
+    - fault.custom.name == 'low-mem-cpu-constraints'


Maybe let's change the name to this one. Should be more in line with the suggestions proposed by @rohanarora .

Suggested change

- fault.custom.name == 'low-mem-cpu-constraints'

- fault.custom.name == 'low_resource_limits'

VincentCCandela · 2025-10-27T04:26:17Z

Made some updates. Did not update CONTRIBUTING.MD because it looks like it already has the necessary information.

Added pod-level resource limits for requirement (2).
Need to work on requirement (1): Bump the the load such that the underlying valkey /PostgreSQL service starts to suffer

Red-GV

Just a few comments from me. Also, although it will probably just be a copy and paste of the same warning. Please go ahead and add a remove step for this fault as well.

Red-GV · 2025-10-28T15:43:49Z

sre/roles/faults/tasks/inject_low_resource_limits.yaml

+        template:
+          spec:
+            containers:
+            - name: "{{ result.resources[0].spec.template.spec.containers[0].name }}"


Maybe do something like this instead. Should the container being targeted not be the first in the index, this will overwrite it (and the entire array itself), which is not desired.

Red-GV · 2025-10-28T15:45:18Z

sre/roles/faults/tasks/inject_low_resource_limits.yaml

+    - faults_workloads_info is defined
+    - result.resources | length == 1
+
+- name: Restart workloads to apply new resource limits


Off my head, I'm pretty sure this unnecessary. Once the deployment has been patched, Kubernetes should already redeploy the pod.

VincentCCandela added 2 commits October 20, 2025 20:59

limits on memory and cpu fault

9e8e0c7

Merge branch 'mem_cpu_limits' of https://github.com/itbench-hub/ITBen…

ad38f7e

…ch-Scenarios into mem_cpu_limits new fault limiting cpu and memory

VincentCCandela requested review from Red-GV and rohanarora as code owners October 21, 2025 14:56

Red-GV changed the title ~~Memory and CPU limits (Reduce Database Resources #302)~~ feat: add database resource limit fault Oct 21, 2025

Red-GV linked an issue Oct 21, 2025 that may be closed by this pull request

Reduce Database Resources #302

Open

rohanarora reviewed Oct 22, 2025

View reviewed changes

sre/ansible.cfg Outdated

Copy link

Collaborator

rohanarora Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why these files were deleted? @VincentCCandela

rohanarora reviewed Oct 22, 2025

View reviewed changes

sre/dev/local_cluster/kind-config.yaml.1 Outdated

Copy link

Collaborator

rohanarora Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from the commit.

rohanarora reviewed Oct 22, 2025

View reviewed changes

sre/playbooks/roles Outdated

Copy link

Collaborator

rohanarora Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file.

Red-GV reviewed Oct 23, 2025

View reviewed changes

VincentCCandela added 4 commits October 24, 2025 13:36

basic fixes

8cfde82

basic fixes

e8f041d

pod-level resource limits instead of namespace level

40d8334

minor filename changes

53105fc

Red-GV requested changes Oct 28, 2025

View reviewed changes

Red-GV mentioned this pull request Nov 7, 2025

feat: add exhaust node resources fault #329

Open

	- fault.custom.name == 'low-mem-cpu-constraints'
	- fault.custom.name == 'low_resource_limits'

feat: add database resource limit fault #336

Are you sure you want to change the base?

feat: add database resource limit fault #336

Uh oh!

Conversation

VincentCCandela commented Oct 21, 2025

Uh oh!

VincentCCandela commented Oct 21, 2025

Uh oh!

rohanarora commented Oct 21, 2025

Uh oh!

Red-GV commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohanarora left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Red-GV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VincentCCandela commented Oct 27, 2025

Uh oh!

Red-GV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants