Skip to content

Conversation

Rei1010
Copy link
Collaborator

@Rei1010 Rei1010 commented May 21, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:
Improving failure handling for test.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Error logs:

  [FAILED] in [It] - /Users/rui/Documents/Repos/Rei1010/HAMi/test/e2e/pod/test_pod.go:156 @ 05/21/25 11:09:07.958
  STEP: Check pod detailed after each test @ 05/21/25 11:09:07.958
I0521 11:09:07.977833   30711 pod.go:179] Pod default/gpu-pod5729 is in Pending status
I0521 11:09:07.977850   30711 pod.go:181] Show events for default/gpu-pod5729:
I0521 11:09:07.986884   30711 pod.go:190] Reason: FailedScheduling, Message : 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. 
I0521 11:09:07.993636   30711 pod.go:200] Show logs for default/gpu-pod5729:
I0521 11:09:07.993648   30711 pod.go:201] 

Does this PR introduce a user-facing change?:

@Rei1010 Rei1010 force-pushed the enrichFailureHandling branch from 38c2ac4 to 8aa69f2 Compare May 21, 2025 07:28
Copy link

codecov bot commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
unittests 61.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances test failure diagnostics by adding utilities to fetch and log pod details across namespaces, adjusts the pod‐running wait interval for GPU workloads, and integrates detailed pod checks after any test failure.

  • Increased the polling interval in WaitForPodRunning from 5s to 30s.
  • Introduced GetNamespaceList, GetPodLogs, and CheckPodDetails in test/utils/pod.go.
  • Updated AfterEach in test/e2e/pod/test_pod.go to call CheckPodDetails on failures and removed a debug fmt.Printf.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
test/utils/pod.go Added new failure-handling helpers, updated polling interval, and imported I/O packages.
test/e2e/pod/test_pod.go Call CheckPodDetails on test failures and remove leftover fmt.Printf debug statement.
Comments suppressed due to low confidence (1)

test/utils/pod.go:96

  • Use the passed-in context ctx instead of context.TODO() to allow cancellation and deadlines to propagate correctly.
pod, err := clientSet.CoreV1().Pods(namespace).Get(context.TODO(), podName, metav1.GetOptions{})

events, err := GetPodEvents(clientSet, ns, pod.Name)
if err != nil {
klog.Errorf("Failed to get events for %s/%s: %v", ns, pod.Name, err)
return
Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning here stops logging details for other pods. Consider using continue to proceed to the next pod and log all failures.

Suggested change
return
continue

Copilot uses AI. Check for mistakes.

logs, err := GetPodLogs(clientSet, ns, pod.Name)
if err != nil {
klog.Errorf("Failed to get logs for %s/%s: %v", ns, pod.Name, err)
return
Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with events, use continue instead of return so that other pods are still checked and logged.

Suggested change
return
continue

Copilot uses AI. Check for mistakes.

}

klog.Infof("Show logs for %s/%s:", ns, pod.Name)
klog.Infof(logs)
Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Passing raw logs to Infof can misinterpret formatting verbs—use klog.Info(logs) or klog.Infof("%s", logs) instead.

Suggested change
klog.Infof(logs)
klog.Infof("%s", logs)

Copilot uses AI. Check for mistakes.

return false, nil
})
}

Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Add a doc comment to describe the purpose and behavior of this public function for better maintainability.

Suggested change
// GetNamespaceList retrieves a list of all namespaces in the Kubernetes cluster.
// It takes a Kubernetes clientset as input and returns a slice of namespace names
// or an error if the operation fails.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant