NO-JIRA: e2e: improve failure diagnostics#484
NO-JIRA: e2e: improve failure diagnostics#484openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughThe changes introduce a resource-tracking system for e2e tests that monitors created resources and replaces cluster-wide state dumps with targeted per-resource diagnostics in YAML format. JUnit report generation is moved from the test script to the test suite and triggers only on test failures. Changes
Sequence DiagramsequenceDiagram
participant Test as E2E Test
participant Tracker as Resource Tracker
participant K8s as Kubernetes API
participant Diag as Diagnostics Engine
participant JUnit as JUnit Reporter
Test->>Tracker: trackResource(obj)
activate Tracker
Tracker->>Tracker: register in resourcesUnderTest
deactivate Tracker
Test->>Test: run assertions
alt Test Fails
Test->>Diag: dumpTrackedResources()
activate Diag
loop for each tracked resource
Diag->>K8s: fetch live object
K8s-->>Diag: object + events
Diag->>Diag: marshal to YAML
Diag->>Diag: render with cleanups
end
Diag->>Diag: capture output
deactivate Diag
Test->>JUnit: ReportAfterSuite hook
activate JUnit
JUnit->>JUnit: read ARTIFACT_DIR
JUnit->>JUnit: generate junit_cluster_capi_operator.xml
JUnit->>JUnit: append diagnostic output to failure message
JUnit->>JUnit: write XML report
deactivate JUnit
else Test Passes
Test->>Tracker: reset resourcesUnderTest
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
d3d2b71 to
cc19ef4
Compare
|
@theobarberbany: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
… events Wire trackResource into create helpers so tracked resources (and their sync controller mirrors) are dumped on failure. Add namespace-wide event dumps for both CAPI and MAPI namespaces, and list all AWSMachineTemplates regardless of name. All dump functions are best-effort with panic recovery. Remove --junit-report from hack/test.sh for e2e runs since the custom ReportAfterSuite handles JUnit generation with diagnostics inlined into the failure element for Spyglass.
cc19ef4 to
4041131
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hack/test.sh (1)
37-39: Use token-based e2e detection for JUnit gating.
*"e2e"*can match unintended directory names and skip JUnit for non-e2e runs. Prefer checking TEST_DIRS entries as path tokens.Proposed patch
- if [[ "${TEST_DIRS}" != *"e2e"* ]]; then + has_e2e=false + for d in ${TEST_DIRS}; do + if [[ "${d}" == "e2e" || "${d}" == "./e2e" || "${d}" == e2e/* || "${d}" == ./e2e/* ]]; then + has_e2e=true + break + fi + done + if [[ "${has_e2e}" == "false" ]]; then GINKGO_ARGS="${GINKGO_ARGS} --junit-report=junit_cluster_capi_operator.xml" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/test.sh` around lines 37 - 39, The current glob check [[ "${TEST_DIRS}" != *"e2e"* ]] can false-match substrings; change the condition to test tokens instead and only skip adding the JUnit flag when no path token equals "e2e". For example, replace that condition with a tokenized check that searches TEST_DIRS for an exact token (e.g., split on whitespace/commas or use: if ! echo "${TEST_DIRS}" | tr ' ,;' '\n' | grep -xq "e2e"; then ... fi) and keep the GINKGO_ARGS="${GINKGO_ARGS} --junit-report=..." assignment inside the updated if block so GINKGO_ARGS is only modified when there is no exact "e2e" token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/e2e_common.go`:
- Around line 172-173: The event collection currently matches events only by
name (call sites invoking describeObjectEvents(buf, key)), which mixes events
from different kinds (e.g., Machine vs AWSMachine); update the logic to scope
events to the involved object kind/UID/namespace. Change describeObjectEvents
(or add a new helper like describeObjectEventsForObject) to accept the object's
Kind and UID (or full involvedObject reference) and filter Kubernetes events by
involvedObject.kind and involvedObject.uid (and namespace/name) rather than name
alone; update all call sites (the occurrences around the shown call and the
other noted ranges 193-207, 257-258) to pass the object's kind/UID so only
events for that exact object are listed.
---
Nitpick comments:
In `@hack/test.sh`:
- Around line 37-39: The current glob check [[ "${TEST_DIRS}" != *"e2e"* ]] can
false-match substrings; change the condition to test tokens instead and only
skip adding the JUnit flag when no path token equals "e2e". For example, replace
that condition with a tokenized check that searches TEST_DIRS for an exact token
(e.g., split on whitespace/commas or use: if ! echo "${TEST_DIRS}" | tr ' ,;'
'\n' | grep -xq "e2e"; then ... fi) and keep the GINKGO_ARGS="${GINKGO_ARGS}
--junit-report=..." assignment inside the updated if block so GINKGO_ARGS is
only modified when there is no exact "e2e" token.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (5)
e2e/e2e_common.goe2e/e2e_test.goe2e/machine_migration_helpers.goe2e/machineset_migration_helpers.gohack/test.sh
|
/test e2e-aws-capi-techpreview |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
nrb
left a comment
There was a problem hiding this comment.
Overall it makes sense to me, but one question.
|
/pipeline required |
|
Scheduling tests matching the |
|
/test e2e-aws-ovn-techpreview |
|
/retest |
|
ci looking pretty borked :( If we're still failing im inclined to override given we have good signal on the ci/prow/e2e-aws-capi-techpreview job where this is most used. |
|
/retest |
|
/override ci/prow/e2e-aws-ovn-techpreview |
|
/verified by ci/prow/e2e-aws-capi-techpreview |
|
@theobarberbany: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-gcp-ovn-techpreview, ci/prow/e2e-openstack-capi-techpreview, ci/prow/e2e-openstack-ovn-techpreview DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/test images |
|
/override ci/prow/images |
|
@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/images DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@theobarberbany: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Waging more war on flakes in our migration e2es.
Wire trackResource into create helpers so tracked resources (and their sync controller mirrors) are dumped on failure. Add namespace-wide event dumps for both CAPI and MAPI namespaces, and list all AWSMachineTemplates regardless of name. All dump functions are best-effort with panic recovery.
Remove --junit-report from hack/test.sh for e2e runs since the custom ReportAfterSuite handles JUnit generation with diagnostics inlined into the failure element for Spyglass.
Currently we've got a dump that dumps everything (see below), including things not related to our tests (and that are covered by must-gathers) this tries to scope that down, and make the dump more useful (yaml rather than lists, scoped to the failure)
Summary by CodeRabbit
Release Notes
New Features
Tests