draft: reduce likelihood of occurrence of pagination issue #66583
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Mitigate #58666
This seems to be confirmed by various people here and here.
The easiest way to mitigate that would be the set a
step
underincrementalSync
. What it would do is that instead of pagination over all the records fromstart_date
tonow
, it would iterate over multiple chunk of datetime (for example if this step is yearly,2023
,2024
and2025
would be fetch independently and each of those would have its own 100k records limit. Another benefit is that the connector would be faster if we can increase the concurrency. Note that this does not completely solve the issue as if there are more than 100k records within this month, users will still face this issue.There is a big drawback to this solution which might not make it viable: if a range of 1 month has more than 100k records, the other chunks of 1 month will still be synced but the next sync will start back from the earlier month that failed. It will also probably fail on each subsequent run although it'll try syncing the subsequent months every time from now on. To fix that, we need to:
Possible next steps:
How
Add a 1 month step to incremental syncs for streams visits, visitors, visitor_activities, visitor_page_views, and list_membership
Review guide
User Impact
Hopefully, we should see less of incomplete syncs that stops at 100 000k records.
Can this PR be safely reverted and rolled back?