e2e: PP: cover ExecCPUAffinity support in tests by shajmakh · Pull Request #1432 · openshift/cluster-node-tuning-operator

shajmakh · 2025-11-13T13:45:17Z

Add basic e2e tests that checks the default behavior of performance-profile with default enabled ExecCPUAffinity: first.

openshift-ci · 2025-11-13T13:46:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shajmakh
Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shajmakh · 2025-11-13T15:13:51Z

depend on #1426

shajmakh · 2026-01-20T11:14:22Z

regarding ci/prow/e2e-gcp-pao-updating-profile the newly added test in the PR is failing because the exec process was always (for 20 retries) pinned to the first CPU of the set although the execCPUAffinity feature is disabled.
this was tested locally several times and it passed. looking deeper in the mustgather, we can see that the PP cpu config is as follow:
cpu: isolated: 1-3 reserved: "0"
while the test logs show us that the exclusive CPUs that were assigned to the running (guaranteed) container were:
first exclusive CPU: 1, all exclusive CPUs: 1,4
which means CPU 4 is likely offline which leaves only CPU 1 for the process to be pinned to.
looking at the node's allocatable cpus:
allocatable: cpu: "5"
which means that the PP didn't distribute the rest of the unreserved CPUs. The thing that caused unalignments when scheduling a workload.
Investigation is ongoing to solve this.

/test e2e-gcp-pao-workloadhints

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-22T07:53:12Z

/retest

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-22T13:22:18Z

when temporarly removed the failing test due to misaligning node topology with PP cpu section,
ci/prow/e2e-gcp-pao-updating-profile lane passed. A fix for the infra issue is proposed here: openshift/release#73835

shajmakh · 2026-01-23T10:50:00Z

/hold
for prow change to be merged

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-26T10:55:32Z

/test e2e-aws-ovn
/test e2e-aws-operator

shajmakh · 2026-01-28T07:49:06Z

/retest

shajmakh · 2026-01-29T14:27:53Z

/unhold

SargunNarula

Thanks for the tests, IMO some tests are redundant which can be removed.

SargunNarula · 2026-01-30T11:32:59Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+				}
+
+				var err error
+				testPod := pods.MakePodWithResources(ctx, workerRTNode, qos, containersResources)


Can we use the MakePod util function instead to keep the pattern same in this suite -
func - link, can also be configured with resources - link

That is an option, indeed, but since I want to use the same function in different suites (not only on mixed cpus), I added one to provide this functionality with QoS and multi-containers for the pod.

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

SargunNarula · 2026-01-30T11:44:31Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+							sharedCpusResource:    resource.MustParse("1"),
+						},
+					}),
+				Entry("best-effort pod with shared CPU request",


I think it would be better to keep the best-effort scenario under cpu_management only, since it does not depend on shared CPUs.

I think we need to keep isolation and make sure that the test pass when mixed cpus is enabled too

SargunNarula · 2026-01-30T11:45:14Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+						//cnt1 resources
+						{},
+					}),
+				Entry("burstable pod with shared CPU request",


Same case with burstable

I see your point, but when shared cpus is enabled the flow becomes different when execCPUAffinity is enabled as well. so I believe we should ensure that the same set of tests (BE and BU) pass also on a cluster with that config. wdyt?

test/e2e/performanceprofile/functests/1_performance/cpu_management.go

test/e2e/performanceprofile/functests/utils/pods/pods.go

…#73835) GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-02-18T07:21:10Z

@SargunNarula Thanks for your valuable review. I've addressed your comments. Let me know if the new version addresses your concerns. Thanks!

…#73835) GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

SargunNarula · 2026-02-25T12:19:06Z

@shajmakh I think all the concerns are covered now and the tests look good to me. /lgtm

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

ffromani · 2026-02-25T14:52:08Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+				updatedIsolated = *mustParse(string(*profile.Spec.CPU.Isolated))
+				currentShared := mustParse(string(*profile.Spec.CPU.Shared))
+				if len(currentShared.List()) < 2 {
+					testlog.Info("shared cpuset has less than 2 cpus; this test requires at least 2 shared cpus; update the profile")


shouldn't we abort/skip here?

the intention here was to allow this test to run even if it requires PP update, as much as the cluster topology allows. I do not assume that the PP config when the test run would have the least required shared CPUs (2), in case it doesn't the test will reconfigure PP to allow this test to run. there is no point of running it when the shared cpus are 1 which is how the suite setu sets this setting in the profile:
https://github.com/shajmakh/cluster-node-tuning-operator/blob/b8bd51eab141d263aa3d95aa37b36f1e6cfece5c/test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go#L770

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

ffromani · 2026-02-25T15:08:59Z

test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

+				By("Run exec command on the pod and verify the process is pinned not only to the first exclusive CPU")
+
+				for i := 0; i < retries; i++ {
+					cmd := []string{"/bin/bash", "-c", "sleep 10 & SLPID=$!; ps -o psr -p $SLPID;"}


we really need a more robust approach here

test/e2e/performanceprofile/functests/utils/pods/pods.go

Tal-or · 2026-02-26T11:39:20Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+					val, ok := profile.Annotations[performancev2.PerformanceProfileExecCPUAffinityAnnotation]
+					if ok && val == performancev2.PerformanceProfileExecCPUAffinityDisable {
+						// fail loudly because the default should be enabled
+						Fail("exec-cpu-affinity is disabled in the profile")


Why not using Expect as we always do?

Tal-or · 2026-02-26T11:45:33Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+					cpusIncludingShared, err := cpuset.Parse(cpusetCfg.Cpus)
+					Expect(err).ToNot(HaveOccurred(), "Failed to parse cpuset config for test pod cpus=%q", cpusetCfg.Cpus)
+					testlog.Infof("all CPUs dedicated for the container (including shared if requested): %s", cpusIncludingShared.String())
+					firstCPU := cpusIncludingShared.List()[0]


We should check if not empty first. the cpuset.Parse(cpusetCfg.Cpus) can succeed (no error) and return an empty CPUset

Tal-or · 2026-02-26T11:46:11Z

test/e2e/performanceprofile/functests/11_mixedcpus/mixedcpus.go

+					isSharedCPUsRequested := container.Resources.Limits.Name(sharedCpusResource, resource.DecimalSI).Value() > 0
+					if isSharedCPUsRequested {
+						cntShared := cpusIncludingShared.Difference(updatedIsolated)
+						firstCPU = cntShared.List()[0]


Tal-or · 2026-02-26T11:51:47Z

test/e2e/performanceprofile/functests/1_performance/cpu_management.go


 		AfterEach(func() {
-			deleteTestPod(context.TODO(), testpod)
+			Expect(pods.Delete(context.TODO(), testpod)).To(BeTrue(), "Failed to delete pod")


if you're already adding an assertion I would add the pod name as well.

Tal-or · 2026-02-26T11:52:24Z

test/e2e/performanceprofile/functests/1_performance/cpu_management.go

+					val, ok := profile.Annotations[performancev2.PerformanceProfileExecCPUAffinityAnnotation]
+					if ok && val == performancev2.PerformanceProfileExecCPUAffinityDisable {
+						// fail loudly because the default should be enabled
+						Fail("exec-cpu-affinity is disabled in the profile")


you can use Expect here as we always do

Tal-or · 2026-02-26T11:55:30Z

test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

+				Expect(getter).ToNot(BeNil())
+
+				By("Checking if exec-cpu-affinity is disabled, if not disable it")
+				initialProfile, _ = profiles.GetByNodeLabels(testutils.NodeSelectorLabels)


error check is missing

Tal-or · 2026-02-26T11:58:56Z

test/e2e/performanceprofile/functests/2_performance_update/updating_profile.go

+				retries := 20
+				By("Run exec command on the pod and verify the process is pinned not only to the first exclusive CPU")
+
+				for i := 0; i < retries; i++ {


why not using Eventually? you can have better control on the timing and you'll be using a built in method

Tal-or · 2026-02-26T12:01:57Z

test/e2e/performanceprofile/functests/utils/pods/pods.go

+}
+
+func Delete(ctx context.Context, pod *corev1.Pod) bool {
+	err := testclient.DataPlaneClient.Get(ctx, client.ObjectKeyFromObject(pod), pod)


I would pass the client as an argument just for the case that we might need to use a different one in the future

Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>

Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>

openshift-ci · 2026-02-26T15:17:43Z

@shajmakh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-pao-updating-profile	`305d0c1`	link	true	`/test e2e-gcp-pao-updating-profile`
ci/prow/e2e-aws-ovn-techpreview	`305d0c1`	link	true	`/test e2e-aws-ovn-techpreview`
ci/prow/e2e-aws-ovn	`305d0c1`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-upgrade	`305d0c1`	link	true	`/test e2e-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from MarSik and swatisehgal November 13, 2025 13:46

shajmakh mentioned this pull request Nov 13, 2025

CNF-18941: perfprof: enable exec-cpu-affinity by default (annotation) #1426

Merged

shajmakh force-pushed the exec-affinity-pp-e2e branch 3 times, most recently from 6ef4f1a to beeea3d Compare November 18, 2025 10:34

shajmakh force-pushed the exec-affinity-pp-e2e branch 4 times, most recently from 565820a to 0af067c Compare January 20, 2026 11:14

shajmakh mentioned this pull request Jan 22, 2026

telco: PP: configure the isolated and reserved cpus on gcp openshift/release#73835

Merged

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2026

shajmakh force-pushed the exec-affinity-pp-e2e branch 2 times, most recently from 0af067c to 41afeca Compare January 26, 2026 06:56

shajmakh force-pushed the exec-affinity-pp-e2e branch 2 times, most recently from acdb51a to a8b7158 Compare January 27, 2026 20:50

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2026

SargunNarula reviewed Jan 30, 2026

View reviewed changes

shajmakh force-pushed the exec-affinity-pp-e2e branch 2 times, most recently from 571923e to b8bd51e Compare February 17, 2026 16:19

ffromani reviewed Feb 25, 2026

View reviewed changes

Tal-or reviewed Feb 26, 2026

View reviewed changes

shajmakh added 2 commits February 26, 2026 14:16

e2e: PP: cover ExecCPUAffinity support in tests

de96123

Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>

PP: e2e utils:add unit tests

305d0c1

Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh force-pushed the exec-affinity-pp-e2e branch from b8bd51e to 305d0c1 Compare February 26, 2026 12:16

Conversation

shajmakh commented Nov 13, 2025

Uh oh!

openshift-ci bot commented Nov 13, 2025

Uh oh!

shajmakh commented Nov 13, 2025

Uh oh!

shajmakh commented Jan 20, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 23, 2026

Uh oh!

shajmakh commented Jan 26, 2026

Uh oh!

shajmakh commented Jan 28, 2026

Uh oh!

shajmakh commented Jan 29, 2026

Uh oh!

SargunNarula left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shajmakh commented Feb 18, 2026

Uh oh!

SargunNarula commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shajmakh Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Feb 26, 2026

shajmakh Feb 26, 2026 •

edited

Loading