-
Notifications
You must be signed in to change notification settings - Fork 2k
[WIP] Reduce ipi-install-powervs-install step grace period to 10m and add resource dump on termination #71105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Neha-dot-Yadav The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[REHEARSALNOTIFIER]
A total of 33 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@Neha-dot-Yadav: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR reduces the grace_period for the ipi-install-powervs-install step from 2 hours to 10 minutes and adds a trap 'dump_resources' TERM handler to ensure resource collection occurs when the step terminates.
Background
Currently, our step fails if timeout is reached, but the underlying installation script continues running for an additional 2 hours, which is the configured grace period.
During this extended grace period, the installation eventually completes successfully, but the step is still marked as Failed because it exceeded the timeout.
This leads to unnecessary resource consumption and confusion, as clusters are being created successfully but marked as failed runs.
The dump_resources function previously executed only after openshift-install wait-for install-complete, which can take up to 2 hours. With the reduced grace period(10m), the script would now terminate before reaching that point, resulting in missing debug artifacts. Adding a TERM trap ensures dump_resources runs automatically on termination, allowing resource data to be collected even when the step times out or fails early.