Skip to content

Conversation

msugakov
Copy link
Contributor

@msugakov msugakov commented Sep 25, 2025

Description

prefetch-dependencies

From the Slack thread https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1758784462802249, my observations and trials in this PR it's clear that OOMs happen in /root/.cache/hermeto/go/go1.21.0/bin/go mod download -json command that's executed by Hermeto to prefetch Go dependencies.
The thread is inconclusive about the root cause of the error. On one hand, I'm sure little has changed on our side, especially in release-4.6 branch. On the other, what exactly happens in go mod download and what to do to prevent it requesting lots of memory quickly is left without the answers.

I used #17053 to trigger multiple runs and validate the change is stable.

Everything else

build-source-image tends to fail too. It's not news but the one started doing that more frequently.

Since I touched this area, I did a couple other cosmetic changes. You can see more info in the commit messages.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

No change.

How I validated my change

@msugakov msugakov added do-not-merge/work-in-progress konflux-build Run Konflux in PR. Push commit to trigger it. labels Sep 25, 2025
Copy link

openshift-ci bot commented Sep 25, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@msugakov
Copy link
Contributor Author

@msugakov
Copy link
Contributor Author

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Sep 25, 2025

Images are ready for the commit at 60a4392.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.9.x-929-g60a43929a7.

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.78%. Comparing base (01d75a3) to head (60a4392).
⚠️ Report is 27 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17032      +/-   ##
==========================================
- Coverage   48.80%   48.78%   -0.02%     
==========================================
  Files        2707     2712       +5     
  Lines      202201   202407     +206     
==========================================
+ Hits        98682    98743      +61     
- Misses      95748    95878     +130     
- Partials     7771     7786      +15     
Flag Coverage Δ
go-unit-tests 48.78% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@msugakov
Copy link
Contributor Author

/cancel

@msugakov
Copy link
Contributor Author

/retest

@stackrox stackrox deleted a comment from red-hat-konflux bot Sep 26, 2025
@msugakov
Copy link
Contributor Author

/test main-on-push

@msugakov
Copy link
Contributor Author

/test roxctl-on-push

@msugakov
Copy link
Contributor Author

/test operator-on-push

@msugakov
Copy link
Contributor Author

/test scanner-v4-on-push

@stackrox stackrox deleted a comment from openshift-ci bot Sep 26, 2025
@stackrox stackrox deleted a comment from openshift-ci bot Sep 26, 2025
@stackrox stackrox deleted a comment from openshift-ci bot Sep 26, 2025
@stackrox stackrox deleted a comment from openshift-ci bot Sep 26, 2025
@msugakov
Copy link
Contributor Author

/test scanner-v4-on-push

@msugakov
Copy link
Contributor Author

/test main-on-push

@msugakov
Copy link
Contributor Author

/test operator-on-push

@msugakov
Copy link
Contributor Author

/test roxctl-on-push

@msugakov
Copy link
Contributor Author

/cancel

@msugakov
Copy link
Contributor Author

/test scanner-v4-on-push

@msugakov
Copy link
Contributor Author

/test roxctl-on-push

@msugakov
Copy link
Contributor Author

/test operator-on-push

@msugakov
Copy link
Contributor Author

/test main-on-push

1 similar comment
@msugakov
Copy link
Contributor Author

/test main-on-push

@msugakov
Copy link
Contributor Author

/cancel

@msugakov
Copy link
Contributor Author

/test main-on-push

@stackrox stackrox deleted a comment from openshift-ci bot Oct 1, 2025
@stackrox stackrox deleted a comment from openshift-ci bot Oct 1, 2025
@msugakov msugakov requested a review from tommartensen October 1, 2025 11:46
@msugakov msugakov force-pushed the misha/repro-prefetch-oom branch from c6ed81a to 5f633df Compare October 1, 2025 17:16
@msugakov msugakov changed the title ROX-31079: Bump tasks memory to reduce OOM kills ROX-31079: Reduce OOM kills in Konflux pipelines Oct 1, 2025
@msugakov
Copy link
Contributor Author

msugakov commented Oct 2, 2025

/test gke-nongroovy-e2e-tests gke-upgrade-tests

Copy link
Contributor

@tommartensen tommartensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, let's try 🤷
Not the traditional approach in prefetch-dependencies, but if it works, it works.

@msugakov
Copy link
Contributor Author

msugakov commented Oct 2, 2025

/retest-required

@msugakov msugakov force-pushed the misha/repro-prefetch-oom branch 3 times, most recently from 93ed86b to 2a440fe Compare October 2, 2025 13:10
@msugakov
Copy link
Contributor Author

msugakov commented Oct 2, 2025

/retest-required

The one also tends to be OOM killed. See
https://konflux-ui.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/ns/rh-acs-tenant/applications/acs/taskruns/scanner-v4-on-push-j9qxs-build-source-image/logs

From `describe pod`:
```
  step-build:
    State:          Terminated
      Reason:       OOMKilled
      Message:      [{"key":"BUILD_RESULT","value":"{\"status\": \"failure\", \"message\": \"[Errno 2] No such file or directory: '/var/workdir/source-build/bsi_output/index.json'\"}","type":1},{"key":"StartedAt","value":"2025-09-29T09:32:26.186Z","type":3}]
      Exit Code:    1
      Started:      Mon, 29 Sep 2025 11:31:53 +0200
      Finished:     Mon, 29 Sep 2025 11:44:00 +0200
    Limits:
      memory:  2Gi
    Requests:
      cpu:     250m
      memory:  512Mi
```
and update corresponding comments around that.
If by default they prefer to go without limits, why shouldn't we too.
This could make builds faster when the cluster is less loaded and
the build process can utilize more cpu cores.
I don't think any more it's a good idea to remove them.
At the very least, we'll get different behavior when nodes are
fully loaded and when not.
At worst, we may see OOM kills due to this variability. See
https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1759216841405689?thread_ts=1758784462.802249&cid=C04PZ7H0VA8
@msugakov msugakov force-pushed the misha/repro-prefetch-oom branch from 2a440fe to 60a4392 Compare October 2, 2025 16:02
Copy link

openshift-ci bot commented Oct 2, 2025

@msugakov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-qa-e2e-tests 60a4392 link false /test gke-qa-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@msugakov msugakov merged commit d1abe24 into master Oct 2, 2025
117 of 127 checks passed
@msugakov msugakov deleted the misha/repro-prefetch-oom branch October 2, 2025 20:50
rhacs-bot pushed a commit that referenced this pull request Oct 2, 2025
rhacs-bot pushed a commit that referenced this pull request Oct 2, 2025
rhacs-bot pushed a commit that referenced this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-4.6 Create a PR to backport this PR to release-4.6 backport release-4.7 backport release-4.8 konflux-build Run Konflux in PR. Push commit to trigger it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants