Skip to content

NO-JIRA: e2e: improve failure diagnostics#484

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
theobarberbany:tb/fix-cluster-state-dump
Mar 3, 2026
Merged

NO-JIRA: e2e: improve failure diagnostics#484
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
theobarberbany:tb/fix-cluster-state-dump

Conversation

@theobarberbany
Copy link
Contributor

@theobarberbany theobarberbany commented Mar 2, 2026

Waging more war on flakes in our migration e2es.

Wire trackResource into create helpers so tracked resources (and their sync controller mirrors) are dumped on failure. Add namespace-wide event dumps for both CAPI and MAPI namespaces, and list all AWSMachineTemplates regardless of name. All dump functions are best-effort with panic recovery.

Remove --junit-report from hack/test.sh for e2e runs since the custom ReportAfterSuite handles JUnit generation with diagnostics inlined into the failure element for Spyglass.

Currently we've got a dump that dumps everything (see below), including things not related to our tests (and that are covered by must-gathers) this tries to scope that down, and make the dump more useful (yaml rather than lists, scoped to the failure)

Machine machine-auth-capi-58g4d should reach Running state
Expected success, but got an error:
    <*errors.errorString | 0xc0014051d0>: 
    CAPI Machine machine-auth-capi-58g4d: phase "Pending", want Running (conditions: [Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False])
    {
        s: "CAPI Machine machine-auth-capi-58g4d: phase \"Pending\", want Running (conditions: [Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False])",
    }
In [BeforeAll] at: /go/src/github.com/openshift/cluster-capi-operator/e2e/machine_migration_capi_authoritative.go:53 @ 03/02/26 09:32:00.957
< Exit [BeforeAll] with spec.authoritativeAPI: ClusterAPI and already existing CAPI Machine with same name - /go/src/github.com/openshift/cluster-capi-operator/e2e/machine_migration_capi_authoritative.go:51 @ 03/02/26 09:32:00.958 (15m0.272s)
> Enter [ReportAfterEach] TOP-LEVEL - /go/src/github.com/openshift/cluster-capi-operator/e2e/e2e_test.go:33 @ 03/02/26 09:32:00.958

=== Cluster State Dump (test failure) ===

[openshift-machine-api] MAPI Machines (6):
  ci-op-w9i1vbs0-c3c99-ndphw-master-0                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:54Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-1                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:55Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-2                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:55Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z

[openshift-machine-api] MAPI MachineSets (1):
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b       replicas=3/3 authAPI=MachineAPI   conditions=[Paused=False, Synchronized=True]

[openshift-cluster-api] CAPI Machines (7):
  ci-op-w9i1vbs0-c3c99-ndphw-master-0                phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-1                phase=Provisioned  conditions=[Available=False, Ready=False, BootstrapConfigReady=False, InfrastructureReady=True, NodeReady=False, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-2                phase=Provisioned  conditions=[Available=False, Ready=False, BootstrapConfigReady=False, InfrastructureReady=True, NodeReady=False, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:24Z
  machine-auth-capi-58g4d                            phase=Pending      conditions=[Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False] created=2026-03-02T09:17:00Z

[openshift-cluster-api] CAPI MachineSets (1):
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b       replicas=3/3 conditions=[Paused=True]

[openshift-cluster-api] Events (last 10min, 1):
  2026-03-02T09:31:33Z AWSMachine/machine-auth-capi-58g4d Warning  FailedGetBootstrapData failed to retrieve bootstrap data secret for AWSMachine openshift-cluster-api/machine-auth-capi-58g4d: Secret "master...

[openshift-cluster-api] AWSMachines (7):
  ci-op-w9i1vbs0-c3c99-ndphw-master-0                instanceType=m6a.xlarge   instanceID=i-0a4ae174bf5ffb56d    providerID=aws:///us-west-2b/i-0a4ae174bf5ffb56d created=2026-03-02T08:04:24Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-1                instanceType=m6a.xlarge   instanceID=i-072a247547d6d5fb9    providerID=aws:///us-west-2b/i-072a247547d6d5fb9 created=2026-03-02T08:04:25Z
  ci-op-w9i1vbs0-c3c99-ndphw-master-2                instanceType=m6a.xlarge   instanceID=i-0a7864fcf10f6ac81    providerID=aws:///us-west-2b/i-0a7864fcf10f6ac81 created=2026-03-02T08:04:25Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd instanceType=m6a.xlarge   instanceID=i-022e5f35e41bc83f8    providerID=aws:///us-west-2b/i-022e5f35e41bc83f8 created=2026-03-02T08:04:35Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl instanceType=m6a.xlarge   instanceID=i-041ee8ba909566c95    providerID=aws:///us-west-2b/i-041ee8ba909566c95 created=2026-03-02T08:04:35Z
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs instanceType=m6a.xlarge   instanceID=i-01c729cf9ab4c983b    providerID=aws:///us-west-2b/i-01c729cf9ab4c983b created=2026-03-02T08:04:36Z
  machine-auth-capi-58g4d                            instanceType=m6a.xlarge   instanceID=                       providerID= created=2026-03-02T09:17:00Z

[openshift-cluster-api] AWSMachineTemplates (1):
  ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-2e687e26 instanceType=m6a.xlarge   created=2026-03-02T08:04:03Z
=== End Cluster State Dump ===

Summary by CodeRabbit

Release Notes

  • New Features

    • Added resource tracking for e2e tests to monitor resources created during test execution
    • Introduced JUnit XML report generation for test suite results with captured diagnostic information
  • Tests

    • Enhanced diagnostic output with targeted, YAML-based formatting and improved error reporting
    • Improved error handling in diagnostics with panic recovery to prevent test failures during reporting
    • Diagnostics now focus on tracked resources rather than full cluster state

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

The changes introduce a resource-tracking system for e2e tests that monitors created resources and replaces cluster-wide state dumps with targeted per-resource diagnostics in YAML format. JUnit report generation is moved from the test script to the test suite and triggers only on test failures.

Changes

Cohort / File(s) Summary
Resource Tracking Infrastructure
e2e/e2e_common.go, e2e/e2e_test.go
Introduces resourcesUnderTest collection and trackResource helper for registering resources. Replaces cluster-wide dumpClusterState with per-resource diagnostics via dumpTrackedResources and dumpSingleResource. Adds dumpAllAWSMachineTemplates and dumpNamespaceEvents with panic recovery. Generates JUnit XML reports in ReportAfterSuite hook when tests fail, embedding diagnostic output as failure descriptions.
Resource Tracking Integration
e2e/machine_migration_helpers.go, e2e/machineset_migration_helpers.go
Integrates trackResource calls throughout resource creation workflows to register CAPI/MAPI machines, machine sets, and AWS machine templates for diagnostic visibility and tracking.
CI Test Script
hack/test.sh
Removes automatic junit-report generation from base GINKGO_ARGS and makes it conditional: injected only for non-e2e test directories, allowing the e2e test suite to control its own JUnit report generation.

Sequence Diagram

sequenceDiagram
    participant Test as E2E Test
    participant Tracker as Resource Tracker
    participant K8s as Kubernetes API
    participant Diag as Diagnostics Engine
    participant JUnit as JUnit Reporter
    
    Test->>Tracker: trackResource(obj)
    activate Tracker
    Tracker->>Tracker: register in resourcesUnderTest
    deactivate Tracker
    
    Test->>Test: run assertions
    alt Test Fails
        Test->>Diag: dumpTrackedResources()
        activate Diag
        loop for each tracked resource
            Diag->>K8s: fetch live object
            K8s-->>Diag: object + events
            Diag->>Diag: marshal to YAML
            Diag->>Diag: render with cleanups
        end
        Diag->>Diag: capture output
        deactivate Diag
        
        Test->>JUnit: ReportAfterSuite hook
        activate JUnit
        JUnit->>JUnit: read ARTIFACT_DIR
        JUnit->>JUnit: generate junit_cluster_capi_operator.xml
        JUnit->>JUnit: append diagnostic output to failure message
        JUnit->>JUnit: write XML report
        deactivate JUnit
    else Test Passes
        Test->>Tracker: reset resourcesUnderTest
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A resourceful hop through test diagnostics fair,
We track each machine with diagnostic care,
YAML flows gentle, events laid bare,
Failures now spoken with clarity rare,
JUnit reports hop from tests to the air! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective of the PR: improving failure diagnostics in e2e tests by implementing focused, resource-tracked diagnostic dumps instead of cluster-wide state dumps.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@theobarberbany theobarberbany force-pushed the tb/fix-cluster-state-dump branch 3 times, most recently from d3d2b71 to cc19ef4 Compare March 2, 2026 17:47
@theobarberbany theobarberbany changed the title e2e: improve failure diagnostics NO-JIRA: e2e: improve failure diagnostics Mar 2, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 2, 2026
@openshift-ci-robot
Copy link

@theobarberbany: This pull request explicitly references no jira issue.

Details

In response to this:

Wire trackResource into create helpers so tracked resources (and their sync controller mirrors) are dumped on failure. Add namespace-wide event dumps for both CAPI and MAPI namespaces, and list all AWSMachineTemplates regardless of name. All dump functions are best-effort with panic recovery.

Remove --junit-report from hack/test.sh for e2e runs since the custom ReportAfterSuite handles JUnit generation with diagnostics inlined into the failure element for Spyglass.

Currently we've got a dump that dumps everything, including things not related to our tests (and that are covered by must-gathers) this tries to scope that down, and make the dump more useful (describes rather than lists)

Machine machine-auth-capi-58g4d should reach Running state
Expected success, but got an error:
   <*errors.errorString | 0xc0014051d0>: 
   CAPI Machine machine-auth-capi-58g4d: phase "Pending", want Running (conditions: [Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False])
   {
       s: "CAPI Machine machine-auth-capi-58g4d: phase \"Pending\", want Running (conditions: [Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False])",
   }
In [BeforeAll] at: /go/src/github.com/openshift/cluster-capi-operator/e2e/machine_migration_capi_authoritative.go:53 @ 03/02/26 09:32:00.957
< Exit [BeforeAll] with spec.authoritativeAPI: ClusterAPI and already existing CAPI Machine with same name - /go/src/github.com/openshift/cluster-capi-operator/e2e/machine_migration_capi_authoritative.go:51 @ 03/02/26 09:32:00.958 (15m0.272s)
> Enter [ReportAfterEach] TOP-LEVEL - /go/src/github.com/openshift/cluster-capi-operator/e2e/e2e_test.go:33 @ 03/02/26 09:32:00.958

=== Cluster State Dump (test failure) ===

[openshift-machine-api] MAPI Machines (6):
 ci-op-w9i1vbs0-c3c99-ndphw-master-0                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:54Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-1                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:55Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-2                phase=Running      authAPI=MachineAPI   conditions=[Drainable=False, InstanceExists=True, Paused=False, Synchronized=False, Terminable=True] created=2026-03-02T07:39:55Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs phase=Running      authAPI=MachineAPI   conditions=[Drainable=True, InstanceExists=True, Paused=False, Synchronized=True, Terminable=True] created=2026-03-02T07:48:49Z

[openshift-machine-api] MAPI MachineSets (1):
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b       replicas=3/3 authAPI=MachineAPI   conditions=[Paused=False, Synchronized=True]

[openshift-cluster-api] CAPI Machines (7):
 ci-op-w9i1vbs0-c3c99-ndphw-master-0                phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-1                phase=Provisioned  conditions=[Available=False, Ready=False, BootstrapConfigReady=False, InfrastructureReady=True, NodeReady=False, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-2                phase=Provisioned  conditions=[Available=False, Ready=False, BootstrapConfigReady=False, InfrastructureReady=True, NodeReady=False, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:23Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs phase=Running      conditions=[Available=True, Ready=True, BootstrapConfigReady=True, InfrastructureReady=True, NodeReady=True, Paused=True, Deleting=False] created=2026-03-02T08:04:24Z
 machine-auth-capi-58g4d                            phase=Pending      conditions=[Available=False, Ready=False, BootstrapConfigReady=True, InfrastructureReady=False, NodeHealthy=Unknown, NodeReady=Unknown, Paused=False, Deleting=False] created=2026-03-02T09:17:00Z

[openshift-cluster-api] CAPI MachineSets (1):
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b       replicas=3/3 conditions=[Paused=True]

[openshift-cluster-api] Events (last 10min, 1):
 2026-03-02T09:31:33Z AWSMachine/machine-auth-capi-58g4d Warning  FailedGetBootstrapData failed to retrieve bootstrap data secret for AWSMachine openshift-cluster-api/machine-auth-capi-58g4d: Secret "master...

[openshift-cluster-api] AWSMachines (7):
 ci-op-w9i1vbs0-c3c99-ndphw-master-0                instanceType=m6a.xlarge   instanceID=i-0a4ae174bf5ffb56d    providerID=aws:///us-west-2b/i-0a4ae174bf5ffb56d created=2026-03-02T08:04:24Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-1                instanceType=m6a.xlarge   instanceID=i-072a247547d6d5fb9    providerID=aws:///us-west-2b/i-072a247547d6d5fb9 created=2026-03-02T08:04:25Z
 ci-op-w9i1vbs0-c3c99-ndphw-master-2                instanceType=m6a.xlarge   instanceID=i-0a7864fcf10f6ac81    providerID=aws:///us-west-2b/i-0a7864fcf10f6ac81 created=2026-03-02T08:04:25Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-28tmd instanceType=m6a.xlarge   instanceID=i-022e5f35e41bc83f8    providerID=aws:///us-west-2b/i-022e5f35e41bc83f8 created=2026-03-02T08:04:35Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-ntmvl instanceType=m6a.xlarge   instanceID=i-041ee8ba909566c95    providerID=aws:///us-west-2b/i-041ee8ba909566c95 created=2026-03-02T08:04:35Z
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-xc2cs instanceType=m6a.xlarge   instanceID=i-01c729cf9ab4c983b    providerID=aws:///us-west-2b/i-01c729cf9ab4c983b created=2026-03-02T08:04:36Z
 machine-auth-capi-58g4d                            instanceType=m6a.xlarge   instanceID=                       providerID= created=2026-03-02T09:17:00Z

[openshift-cluster-api] AWSMachineTemplates (1):
 ci-op-w9i1vbs0-c3c99-ndphw-worker-us-west-2b-2e687e26 instanceType=m6a.xlarge   created=2026-03-02T08:04:03Z
=== End Cluster State Dump ===

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

… events

Wire trackResource into create helpers so tracked resources (and their
sync controller mirrors) are dumped on failure. Add namespace-wide event
dumps for both CAPI and MAPI namespaces, and list all AWSMachineTemplates
regardless of name. All dump functions are best-effort with panic recovery.

Remove --junit-report from hack/test.sh for e2e runs since the custom
ReportAfterSuite handles JUnit generation with diagnostics inlined into
the failure element for Spyglass.
@theobarberbany theobarberbany force-pushed the tb/fix-cluster-state-dump branch from cc19ef4 to 4041131 Compare March 2, 2026 18:00
@theobarberbany theobarberbany marked this pull request as ready for review March 2, 2026 18:00
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2026
@openshift-ci openshift-ci bot requested review from chrischdi and mdbooth March 2, 2026 18:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
hack/test.sh (1)

37-39: Use token-based e2e detection for JUnit gating.

*"e2e"* can match unintended directory names and skip JUnit for non-e2e runs. Prefer checking TEST_DIRS entries as path tokens.

Proposed patch
-  if [[ "${TEST_DIRS}" != *"e2e"* ]]; then
+  has_e2e=false
+  for d in ${TEST_DIRS}; do
+    if [[ "${d}" == "e2e" || "${d}" == "./e2e" || "${d}" == e2e/* || "${d}" == ./e2e/* ]]; then
+      has_e2e=true
+      break
+    fi
+  done
+  if [[ "${has_e2e}" == "false" ]]; then
     GINKGO_ARGS="${GINKGO_ARGS} --junit-report=junit_cluster_capi_operator.xml"
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/test.sh` around lines 37 - 39, The current glob check [[ "${TEST_DIRS}"
!= *"e2e"* ]] can false-match substrings; change the condition to test tokens
instead and only skip adding the JUnit flag when no path token equals "e2e". For
example, replace that condition with a tokenized check that searches TEST_DIRS
for an exact token (e.g., split on whitespace/commas or use: if ! echo
"${TEST_DIRS}" | tr ' ,;' '\n' | grep -xq "e2e"; then ... fi) and keep the
GINKGO_ARGS="${GINKGO_ARGS} --junit-report=..." assignment inside the updated if
block so GINKGO_ARGS is only modified when there is no exact "e2e" token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e/e2e_common.go`:
- Around line 172-173: The event collection currently matches events only by
name (call sites invoking describeObjectEvents(buf, key)), which mixes events
from different kinds (e.g., Machine vs AWSMachine); update the logic to scope
events to the involved object kind/UID/namespace. Change describeObjectEvents
(or add a new helper like describeObjectEventsForObject) to accept the object's
Kind and UID (or full involvedObject reference) and filter Kubernetes events by
involvedObject.kind and involvedObject.uid (and namespace/name) rather than name
alone; update all call sites (the occurrences around the shown call and the
other noted ranges 193-207, 257-258) to pass the object's kind/UID so only
events for that exact object are listed.

---

Nitpick comments:
In `@hack/test.sh`:
- Around line 37-39: The current glob check [[ "${TEST_DIRS}" != *"e2e"* ]] can
false-match substrings; change the condition to test tokens instead and only
skip adding the JUnit flag when no path token equals "e2e". For example, replace
that condition with a tokenized check that searches TEST_DIRS for an exact token
(e.g., split on whitespace/commas or use: if ! echo "${TEST_DIRS}" | tr ' ,;'
'\n' | grep -xq "e2e"; then ... fi) and keep the GINKGO_ARGS="${GINKGO_ARGS}
--junit-report=..." assignment inside the updated if block so GINKGO_ARGS is
only modified when there is no exact "e2e" token.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f21137a and 4041131.

📒 Files selected for processing (5)
  • e2e/e2e_common.go
  • e2e/e2e_test.go
  • e2e/machine_migration_helpers.go
  • e2e/machineset_migration_helpers.go
  • hack/test.sh

@theobarberbany
Copy link
Contributor Author

/test e2e-aws-capi-techpreview

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2026
@openshift-ci-robot
Copy link

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2026
Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it makes sense to me, but one question.

@theobarberbany
Copy link
Contributor Author

/pipeline required

@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@sunzhaohua2
Copy link
Contributor

/test e2e-aws-ovn-techpreview

@theobarberbany
Copy link
Contributor Author

/retest

@theobarberbany
Copy link
Contributor Author

ci looking pretty borked :( If we're still failing im inclined to override given we have good signal on the ci/prow/e2e-aws-capi-techpreview job where this is most used.

@theobarberbany
Copy link
Contributor Author

/retest

@theobarberbany
Copy link
Contributor Author

/override ci/prow/e2e-aws-ovn-techpreview
/override ci/prow/e2e-openstack-ovn-techpreview
/override ci/prow/e2e-openstack-capi-techpreview
/override ci/prow/e2e-gcp-ovn-techpreview

@theobarberbany
Copy link
Contributor Author

/verified by ci/prow/e2e-aws-capi-techpreview

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 3, 2026
@openshift-ci-robot
Copy link

@theobarberbany: This PR has been marked as verified by [ci/prow/e2e-aws-capi-techpreview](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/484/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-techpreview/2028596049530064896).

Details

In response to this:

/verified by ci/prow/e2e-aws-capi-techpreview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-gcp-ovn-techpreview, ci/prow/e2e-openstack-capi-techpreview, ci/prow/e2e-openstack-ovn-techpreview

Details

In response to this:

/override ci/prow/e2e-aws-ovn-techpreview
/override ci/prow/e2e-openstack-ovn-techpreview
/override ci/prow/e2e-openstack-capi-techpreview
/override ci/prow/e2e-gcp-ovn-techpreview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@theobarberbany
Copy link
Contributor Author

/retest

@theobarberbany
Copy link
Contributor Author

/test images

@theobarberbany
Copy link
Contributor Author

/override ci/prow/images

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/images

Details

In response to this:

/override ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

@theobarberbany: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 05e0ebe into openshift:main Mar 3, 2026
25 checks passed
@theobarberbany theobarberbany deleted the tb/fix-cluster-state-dump branch March 3, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants