Skip to content

Split BatchCopy and add parallel COPY for ingest#560

Open
aditya1702 wants to merge 10 commits intooptimize-live-ingestfrom
parallel-copy-inserts
Open

Split BatchCopy and add parallel COPY for ingest#560
aditya1702 wants to merge 10 commits intooptimize-live-ingestfrom
parallel-copy-inserts

Conversation

@aditya1702
Copy link
Copy Markdown
Contributor

No description provided.

Refactor bulk COPY logic: separate account participant inserts into BatchCopyAccounts for operations and transactions, keep main BatchCopy for table rows only, and record metrics per COPY. Update tests and benchmarks to call new BatchCopyAccounts. In ingest service, perform parallel COPYs for transactions, operations, state_changes and their participant tables using errgroup and copyWithPoolConn (per-connection transactions), treating unique_violation (23505) as idempotent. Also add error helper isUniqueViolation and golang.org/x/sync dependency.
Refactor ingestion to run parallel COPYs and balance upserts and to split persistence into clear phases. insertIntoDB was renamed to insertAndUpsertParallel and now launches 9 goroutines via errgroup (5 COPYs + 4 token upserts), each on its own pool connection; UniqueViolation is treated as success for idempotent retries. PersistLedgerData was reworked into three phases: persistFKPrerequisites (commit trustline assets and contract tokens so FKs are visible), parallel insert/upsert, and persistFinalize (unlock channel accounts and advance the cursor as the idempotency marker). The TokenIngestionService interface exposes individual processors (trustline, contract, native, SAC) and the concrete service and mocks were updated accordingly. Error messages and transaction boundaries were adjusted to improve crash-recovery semantics and clarity.
Make ingestion backfill mode skip balance upserts and remove channel unlock logic. insertAndUpsertParallel now conditionally omits the token/balance upsert goroutines when IngestionModeBackfill (balance tables represent current state and should not be updated during backfill). Updated function comment to reflect the behavior. Removed the unlockChannelAccounts call from flushBatchBufferWithRetry in backfill flow. Adjusted tests to remove mocks and expectations for ChannelAccountStore and TokenIngestionService that were only needed for live-mode balance upserts/unlocks.
Consolidate metric recording in BatchCopy methods across models by using deferred functions to observe QueryDuration, BatchSize (where applicable) and increment QueriesTotal. Remove duplicated metrics updates at multiple error/success return sites. Minor optimizations and renames: preallocate rows slices for account copy helpers, track rowCount for metrics, and use local Value() results (e.g. addrVal, accountVal, hashVal) with explicit []byte casts. Also tidy up duplicate/commented docstrings in operations/transactions.
Increase the number of buckets used by ExponentialBuckets from 12 to 15 for DB batch size and ingestion participants histograms. Updated internal/metrics/db.go and internal/metrics/ingestion.go to use ExponentialBuckets(1, 2, 15), and adjusted tests in internal/metrics/db_test.go and internal/metrics/ingestion_test.go to expect 15 buckets. This expands the histogram range/resolution for those metrics.
Add two additional histogram bucket boundaries (0.075 and 0.15) to wallet_ingestion_phase_duration_seconds to provide finer resolution between 0.05-0.1s and 0.1-0.2s. Update the test expectation to reflect the new total of 13 custom buckets.
Replace per-row pgx.Batch queuing with UNNEST-based bulk operations in native_balances, sac_balances, and trustline_balances. Each BatchUpsert now builds column slices and performs a single INSERT ... SELECT FROM UNNEST(... ) ... ON CONFLICT for upserts and a single DELETE using ANY or UNNEST for deletes. Preserves existing metrics/error handling and includes appropriate type casts (UUIDs, int32/int64) and ledger number conversions to reduce DB round-trips and simplify batch logic.
@aditya1702 aditya1702 force-pushed the optimize-live-ingest branch from 6d2eab4 to d3b6c89 Compare April 2, 2026 14:23
@aditya1702 aditya1702 force-pushed the parallel-copy-inserts branch from 1ebaffc to 340b684 Compare April 2, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant