Skip to content

Conversation

@andrwng
Copy link
Contributor

@andrwng andrwng commented Nov 5, 2025

Sync errors may happen for a variety of reasons, but generally they are transient. This was causing the test to fail:

[WARNING - 2025-10-08 10:01:40,588 - admin - _request - lineno:805]: Response 500: {"message": "Could not sync with log", "code": 500}
[INFO  - 2025-10-08 10:01:40,588 - e2e_shadow_indexing_test - test_reset_from_cloud - lineno:524]: Reset from cloud failed: 500 Server Error: Internal Server Error for url: http://docker-rp-6:9644/v1/cloud_storage/unsafe_reset_metadata_from_cloud/kafka/panda-topic/0
[DEBUG - 2025-10-08 10:01:43,407 - kgo_verifier_services - _ingest_status - lineno:433]: KgoVerifierProducer-0-139894623949760 status: [{'topic': 'panda-topic', 'sent': 89864, 'acked': 88839, 'bad_offsets': 0, 'max_offsets_produced': {'0': 88838}, 'restarts': 0, 'fails': 0, 'tombstones_produced': 0, 'failed_transactions': 0, 'aborted_transaction_msgs': 0, 'latency': {'p50': 13798.5, 'p90': 21826, 'p99': 370127}, 'active': True}]
...
[ERROR - 2025-10-08 10:02:05,643 - cluster - _do_post_test_checks - lineno:136]: Test failed, doing failure checks on RedpandaService-0-139894623572640...
Traceback (most recent call last):
  File "/root/tests/rptest/services/cluster.py", line 246, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/e2e_shadow_indexing_test.py", line 537, in test_reset_from_cloud
    assert resets_failed == 0, f"{resets_failed} resets failed during the test"
AssertionError: 1 resets failed during the test

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

  • None

Sync errors may happen for a variety of reasons, but generally they are
transient. This was causing the test to fail:

```
[WARNING - 2025-10-08 10:01:40,588 - admin - _request - lineno:805]: Response 500: {"message": "Could not sync with log", "code": 500}
[INFO  - 2025-10-08 10:01:40,588 - e2e_shadow_indexing_test - test_reset_from_cloud - lineno:524]: Reset from cloud failed: 500 Server Error: Internal Server Error for url: http://docker-rp-6:9644/v1/cloud_storage/unsafe_reset_metadata_from_cloud/kafka/panda-topic/0
[DEBUG - 2025-10-08 10:01:43,407 - kgo_verifier_services - _ingest_status - lineno:433]: KgoVerifierProducer-0-139894623949760 status: [{'topic': 'panda-topic', 'sent': 89864, 'acked': 88839, 'bad_offsets': 0, 'max_offsets_produced': {'0': 88838}, 'restarts': 0, 'fails': 0, 'tombstones_produced': 0, 'failed_transactions': 0, 'aborted_transaction_msgs': 0, 'latency': {'p50': 13798.5, 'p90': 21826, 'p99': 370127}, 'active': True}]
...
[ERROR - 2025-10-08 10:02:05,643 - cluster - _do_post_test_checks - lineno:136]: Test failed, doing failure checks on RedpandaService-0-139894623572640...
Traceback (most recent call last):
  File "/root/tests/rptest/services/cluster.py", line 246, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/e2e_shadow_indexing_test.py", line 537, in test_reset_from_cloud
    assert resets_failed == 0, f"{resets_failed} resets failed during the test"
AssertionError: 1 resets failed during the test
```
Copilot AI review requested due to automatic review settings November 5, 2025 20:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses test failures caused by transient "Could not sync with log" errors during cloud metadata reset operations in the e2e_shadow_indexing_test. The fix adds logic to ignore these benign sync errors, preventing them from being counted as unexpected failures.

Key changes:

  • Added specific handling for "Could not sync with log" errors in the reset operation
  • Distinguished transient sync errors from actual failures to improve test stability

@andrwng andrwng changed the title rptest: ignore sync errors when resetting manifest [CORE-8556] rptest: ignore sync errors when resetting manifest Nov 5, 2025
@andrwng andrwng requested a review from Lazin November 5, 2025 23:38
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#75690
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_replication_basic {"shuffle_leadership": true, "source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebef-4a5d-bbab-34edef408494 FLAKY 20/21 upstream reliability is '98.30287206266318'. current run reliability is '95.23809523809523'. drift is 3.06478 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_basic
ShadowLinkingReplicationTests test_replication_timestamps_match {"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}, "timestamp_type": "CreateTime"} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebf0-44b0-8dd3-c0f70c882f85 FLAKY 20/21 upstream reliability is '95.99483204134367'. current run reliability is '95.23809523809523'. drift is 0.75674 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_timestamps_match
ShadowLinkingReplicationTests test_replication_with_failures null integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55cd-d32f-44b4-8fdc-e87bb273aa5b FLAKY 18/21 upstream reliability is '100.0'. current run reliability is '85.71428571428571'. drift is 14.28571 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_with_failures
NodesDecommissioningTest test_decommissioning_finishes_after_manual_cancellation {"cloud_topic": true, "delete_topic": false} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebf8-4385-8b73-82e49c2ca436 FLAKY 20/21 upstream reliability is '99.76580796252928'. current run reliability is '95.23809523809523'. drift is 4.52771 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_finishes_after_manual_cancellation
NodesDecommissioningTest test_recommissioning_node_finishes {"cloud_topic": false} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebf6-469e-bc51-d3214b1ccb97 FLAKY 19/21 upstream reliability is '98.61660079051383'. current run reliability is '90.47619047619048'. drift is 8.14041 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_recommissioning_node_finishes
PrefixTruncateRecoveryTest test_prefix_truncate_recovery {"acks": -1, "start_empty": true} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebf2-4714-be01-4a558354236f FLAKY 20/21 upstream reliability is '99.34924078091106'. current run reliability is '95.23809523809523'. drift is 4.11115 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PrefixTruncateRecoveryTest&test_method=test_prefix_truncate_recovery
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55cd-d331-411d-8962-238d40503047 FLAKY 10/21 upstream reliability is '90.99526066350711'. current run reliability is '47.61904761904761'. drift is 43.37621 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/75690#019a55c6-ebf8-4385-8b73-82e49c2ca436 FLAKY 17/21 upstream reliability is '89.17480035492457'. current run reliability is '80.95238095238095'. drift is 8.22242 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@andrwng andrwng enabled auto-merge November 6, 2025 02:02
@andrwng andrwng requested a review from dotnwat November 7, 2025 19:38
@andrwng
Copy link
Contributor Author

andrwng commented Nov 11, 2025

/ci-repeat 4
dt-repeat=25
skip-units
skip-redpanda-build
tests/rptest/tests/e2e_shadow_indexing_test.py::EndToEndShadowIndexingTest.test_reset_from_cloud

@andrwng andrwng merged commit c5280f5 into redpanda-data:dev Nov 11, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants