-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Helm Chart Version
2.0
What step the error happened?
During the Sync
Relevant information
Airbyte Platform Version: 2.0.1
Source Connector: MySQL CDC (3.51.5)
Destination Connector: BigQuery (3.0.16)
Sync Mode: Change Data Capture (CDC)
Target Write Schema: Append
🐞 Bug Description
When a MySQL CDC sync job (Run 1) fails after starting the data emission, the subsequent job (Run 2) incorrectly restarts from the initial binlog offset of Run 1 instead of the last committed offset. This leads to the re-processing and re-writing of records already sent to the BigQuery destination, causing data duplication due to the Append write mode. Attached you can find replication jobs logs (one was on error, the subsequent completed but generates duplication on destination)
Steps to Reproduce
- Configure a MySQL CDC Source Connector (3.51.5) syncing to a BigQuery Destination (3.0.16) using the Append write mode on Airbyte Platform 2.0.1.
- Start Run 1 (sync) which successfully begins streaming from a specific position:
2025-11-30 11:46:50 source ERROR : Requesting streaming from position filename: db05-slave.087542, position: 87056536 - Force Run 1 to fail shortly after it starts streaming, specifically triggering the Debezium/MySQL Error 1236 (replica ID conflict).
- The failure log excerpt:
2025-11-30 11:46:56 source ERROR blc-db05-slave.bravofly.intra:3306 i.d.p.ErrorHandler(setProducerThrowable):52 Producer failure io.debezium.DebeziumException: A replica with the same server_uuid/server_id as this replica has connected to the source; the first event 'db05-slave.087542' at 87056536... Error code: 1236; SQLSTATE: HY000.
- The failure log excerpt:
- Verify that BigQuery received records before Run 1 terminated.
- Correct the failure cause (e.g., resolve the server ID conflict) and start Run 2.
- Observe the Run 2 log: The connector logs immediately confirm it is restarting from the exact same binlog position where Run 1 started (
db05-slave.087542, position=87056536), demonstrating the incorrect offset retrieval:- Run 2 Log excerpt:
2025-11-30 12:11:19 source INFO DefaultDispatcher-worker-3#global-round-1-create-partitions i.a.c.r.c.CdcPartitionsCreator(run):144 Current position 'MySqlSourceCdcPosition(fileName=db05-slave.087542, position=87056536)' does not exceed target position 'MySqlSourceCdcPosition(fileName=db05-slave.087546, position=19566958)'.
- Run 2 Log excerpt:
- Check the BigQuery table: the initial batch of records (those processed between Run 1 start and failure) is duplicated.
Expected Behavior
The subsequent job (Run 2) should resume from the last binlog offset that was successfully confirmed (committed state) by the BigQuery destination connector. This ensures that records already written to the target are not re-processed and duplicated.
db05_volagratis_soft_logs_2404_txt.txt
db05_volagratis_soft_logs_2395_txt.txt