-
Notifications
You must be signed in to change notification settings - Fork 384
Enrich failure handling #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enrich failure handling #1065
Conversation
Signed-off-by: wen.rui <[email protected]>
38c2ac4
to
8aa69f2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances test failure diagnostics by adding utilities to fetch and log pod details across namespaces, adjusts the pod‐running wait interval for GPU workloads, and integrates detailed pod checks after any test failure.
- Increased the polling interval in
WaitForPodRunning
from 5s to 30s. - Introduced
GetNamespaceList
,GetPodLogs
, andCheckPodDetails
intest/utils/pod.go
. - Updated
AfterEach
intest/e2e/pod/test_pod.go
to callCheckPodDetails
on failures and removed a debugfmt.Printf
.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
test/utils/pod.go | Added new failure-handling helpers, updated polling interval, and imported I/O packages. |
test/e2e/pod/test_pod.go | Call CheckPodDetails on test failures and remove leftover fmt.Printf debug statement. |
Comments suppressed due to low confidence (1)
test/utils/pod.go:96
- Use the passed-in context
ctx
instead ofcontext.TODO()
to allow cancellation and deadlines to propagate correctly.
pod, err := clientSet.CoreV1().Pods(namespace).Get(context.TODO(), podName, metav1.GetOptions{})
events, err := GetPodEvents(clientSet, ns, pod.Name) | ||
if err != nil { | ||
klog.Errorf("Failed to get events for %s/%s: %v", ns, pod.Name, err) | ||
return |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here stops logging details for other pods. Consider using continue
to proceed to the next pod and log all failures.
return | |
continue |
Copilot uses AI. Check for mistakes.
logs, err := GetPodLogs(clientSet, ns, pod.Name) | ||
if err != nil { | ||
klog.Errorf("Failed to get logs for %s/%s: %v", ns, pod.Name, err) | ||
return |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with events, use continue
instead of return
so that other pods are still checked and logged.
return | |
continue |
Copilot uses AI. Check for mistakes.
} | ||
|
||
klog.Infof("Show logs for %s/%s:", ns, pod.Name) | ||
klog.Infof(logs) |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Passing raw logs to Infof
can misinterpret formatting verbs—use klog.Info(logs)
or klog.Infof("%s", logs)
instead.
klog.Infof(logs) | |
klog.Infof("%s", logs) |
Copilot uses AI. Check for mistakes.
return false, nil | ||
}) | ||
} | ||
|
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add a doc comment to describe the purpose and behavior of this public function for better maintainability.
// GetNamespaceList retrieves a list of all namespaces in the Kubernetes cluster. | |
// It takes a Kubernetes clientset as input and returns a slice of namespace names | |
// or an error if the operation fails. |
Copilot uses AI. Check for mistakes.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Improving failure handling for test.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Error logs:
Does this PR introduce a user-facing change?: