Skip to content

Conversation

@Deepam02
Copy link

@Deepam02 Deepam02 commented Oct 2, 2025

Fixes #2873

What this PR does / why we need it:

Quick fix for the data race when calling RuntimeInfo concurrently.

The issue was a global syncPodSets variable getting hammered by multiple goroutines at once. Moved it to an instance field in the Info struct so each call gets its own copy - no more shared state, no more race conditions.

Pretty straightforward change but I did an intensive code review to make sure it actually solves the problem. Traced through all the call sites and execution flows to verify each RuntimeInfo() creates an isolated instance with no shared mutable state between concurrent calls.

Which issue(s) this PR fixes:
Fixes #2873

Checklist:

  • Docs included if any changes are user facing

@Deepam02 Deepam02 changed the title fix(runtime): make RuntimeInfo thread-safe by using instance field fix(operator): make RuntimeInfo thread-safe by using instance field Oct 2, 2025
@Deepam02 Deepam02 force-pushed the fix/runtime-info-thread-safety branch from 5239894 to 4afa87b Compare October 2, 2025 13:25
@Deepam02
Copy link
Author

Deepam02 commented Oct 2, 2025

Test Failures

The unit tests are failing because cmp.Diff cannot compare the new unexported syncPodSets field.

I'm not really sure how to proceed - should I change the tests or export it as SyncPodSets so tests work without changes?

@andreyvelich
Copy link
Member

@Deepam02 You might need to explore why unit tests are failing

@tenzen-y
Copy link
Member

Test Failures

The unit tests are failing because cmp.Diff cannot compare the new unexported syncPodSets field.

I'm not really sure how to proceed - should I change the tests or export it as SyncPodSets so tests work without changes?

You can leverage the cmp ignore options to ignore syncPods functions in tests.

@Deepam02 Deepam02 force-pushed the fix/runtime-info-thread-safety branch from 4afa87b to 5f9a7f5 Compare October 17, 2025 16:33
@google-oss-prow google-oss-prow bot added size/L and removed size/S labels Oct 17, 2025
Signed-off-by: Deepam02 <[email protected]>
@Deepam02 Deepam02 force-pushed the fix/runtime-info-thread-safety branch from 5f9a7f5 to ab4ef92 Compare October 17, 2025 16:36
@Deepam02
Copy link
Author

Deepam02 commented Oct 23, 2025

@tenzen-y @andreyvelich please put a review

@andreyvelich
Copy link
Member

/milestone v2.1

@google-oss-prow google-oss-prow bot added this to the v2.1 milestone Oct 29, 2025
@coveralls
Copy link

coveralls commented Oct 29, 2025

Pull Request Test Coverage Report for Build 18921020248

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 5 of 9 (55.56%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.08%) to 51.754%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/runtime/runtime.go 5 9 55.56%
Totals Coverage Status
Change from base Build 18685746799: 0.08%
Covered Lines: 1269
Relevant Lines: 2452

💛 - Coveralls

@andreyvelich
Copy link
Member

/assign @tenzen-y @astefanutti As we discussed we might want to merge this PR before v2.1

cc @kaisoz @mimowo

@mimowo
Copy link

mimowo commented Oct 29, 2025

As we discussed we might want to merge this PR before v2.1

Nice, but TBH it is not a blocker for Kueue, Also, I think it does not really impact API so could also be released in 2.1.1 for example.

@andreyvelich
Copy link
Member

Make sense! As we talked today, this is not blocking, but "nice to have" feature.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this PR. This mostly LGTM
Could you fix the formatting errors?

You can fix with the following make command:

trainer/Makefile

Lines 157 to 160 in 6109d36

# Instructions for code formatting.
.PHONY: fmt
fmt: ## Run go fmt against the code.
go fmt ./...

Deepam02 and others added 2 commits October 30, 2025 01:15
Co-authored-by: Yuki Iwai <[email protected]>
Signed-off-by: Deepam Goyal <[email protected]>
Signed-off-by: Deepam02 <[email protected]>
@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Oct 29, 2025
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming CI passed.
Thank you!
/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member

@Deepam02 Please can you rebase your PR to fix CI ?

@andreyvelich
Copy link
Member

Thank you for this contribution @Deepam02!
This PR is superseded by: #2877
/close

@google-oss-prow google-oss-prow bot closed this Oct 31, 2025
@google-oss-prow
Copy link

@andreyvelich: Closed this PR.

In response to this:

Thank you for this contribution @Deepam02!
This PR is superseded by: #2877
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make RuntimeInfo thread safe

5 participants