-
Notifications
You must be signed in to change notification settings - Fork 701
cl/tests: ClusterLinkingProgressVerifier fix #28436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cl/tests: ClusterLinkingProgressVerifier fix #28436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a race condition in the ClusterLinkingProgressVerifier test that caused timeout errors when the consumer tried to read before the producer's offset map file was available. The fix ensures the producer's offset map is ready before creating the consumer and runs the consumer on the same node as the producer to avoid file access issues.
- Adds a wait for the producer's offset map file before creating the consumer
- Configures the consumer to run on the same nodes as the producer
- Reduces cluster node count from 8-10 to 7 across multiple tests (now that consumer runs on producer's node)
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/rptest/tests/cluster_linking_test_base.py | Adds wait for offset map and configures consumer to use producer's nodes |
| tests/rptest/tests/cluster_linking_e2e_test.py | Reduces num_nodes from 8-10 to 7 for multiple test methods |
| tests/rptest/scale_tests/cluster_linking_many_partitions_test.py | Reduces num_nodes from 8 to 7 for scale test |
Retry command for Build#75863please wait until all jobs are finished before running the slash command |
There are a class of failures that result in errors such as: ``` ducktape.errors.TimeoutError: Timed out waiting for status endpoint KgoVerifierConsumerGroupConsumer-0-140503148073056 to be available ``` This is due to: ``` time="2025-11-07T15:15:05Z" level=info msg="Reading with consumer group source-cg-source-topic" time="2025-11-07T15:15:05Z" level=error msg="More partitions in valid_offsets file than in topic!" ``` And the reason for that is that the consumer is running on a different node than the producer. The fix is in two parts: * Wait for the file before creating the consumer * Run the consumer on the same node as the producer Signed-off-by: Ben Pope <[email protected]>
6018b22 to
b916060
Compare
|
/ci-repeat 1 |
There are a class of failures that result in errors such as:
This is due to:
And the reason for that is that the consumer is running on a different node than the producer.
The fix is in two parts:
Fixes https://redpandadata.atlassian.net/browse/CORE-14327
Fixes https://redpandadata.atlassian.net/browse/CORE-14518
Backports Required
Release Notes