Skip to content

Generate a random int64 ID for state changes within an operation instead of using sort order#530

Merged
aditya1702 merged 835 commits intomainfrom
fix-statechanges-order
Mar 18, 2026
Merged

Generate a random int64 ID for state changes within an operation instead of using sort order#530
aditya1702 merged 835 commits intomainfrom
fix-statechanges-order

Conversation

@aditya1702
Copy link
Copy Markdown
Contributor

@aditya1702 aditya1702 commented Mar 11, 2026

What

Replace positional state_change_order with a random 63 bit crypto/rand generated state_change_id.

Step 1: crypto/rand.Read — This is NOT math/rand. It reads from the OS's cryptographic random source (/dev/urandom on macOS/Linux, CryptGenRandom on Windows). It's the same entropy source used for TLS keys and UUIDs. It's safe for concurrent use and never needs seeding.

Step 2: binary.BigEndian.Uint64 — Interprets the 8 random bytes as a single uint64 in big-endian byte order. This gives a uniform random value across the full 0 to 2^64-1 range.

Step 3: The bitmask & 0x7FFFFFFFFFFFFFFF — This is the clever part. In binary, 0x7FFF... is 0111 1111 ... 1111 — it zeroes out the highest bit (bit 63).
Why?
  - Go's int64 is signed — if bit 63 is set, the number is negative
  - PostgreSQL's BIGINT is also signed
  - The mask forces the result to always be positive (0 to 2^63-1)
  - You lose 1 bit of entropy (63 bits instead of 64), but 2^63 ≈ 9.2 quintillion is still an enormous keyspace

Why not `math/rand`? — math/rand uses a PRNG (pseudorandom number generator) seeded from a single value. Two goroutines seeded at similar times could produce correlated sequences. crypto/rand draws from OS-level entropy, making collisions truly independent — important when you're relying on collision probability math (birthday problem).

Why

state_change_order was a positional counter (1, 2, 3…) assigned per-transaction during ingestion. We cannot use a deterministic ID because determinism breaks for state changes generated by multiple soroban sub-invocations which can generate the same state change column values but for different sub-invocations.

A random 63-bit int64 from crypto/rand avoids this entirely — IDs are independent of processing order, category count, or schema changes. Collision probability is extremely low within an operation.

Collision Case: While the probability of collisions within an operation is very very low, on the off chance if collision occurs, since IDs for state changes are now generated in the BatchCopy function when inserting, an PK uniqueness issues will result in retries and generation of new IDs when reinserting. This makes our ingestion path collision resistant.

Known limitations

N/A

Issue that this PR addresses

Closes #541

This simplifies the XDR storage flow by storing raw bytes directly
instead of encoding to base64 and then decoding. The String() method
now handles base64 encoding for external representation.
Skip the intermediate base64 encoding step by using MarshalBinary()
instead of MarshalBase64(). The raw bytes are now stored directly
in XDRBytea.
Remove unnecessary Value() calls since XDRBytea is now []byte.
Access raw bytes directly via type conversion.
Decode expected base64 XDR string to raw bytes for comparison
since XDRBytea now uses []byte underlying type.
Use raw bytes directly instead of base64-encoded strings when
creating test data for XDRBytea fields.
Use raw bytes directly for test XDR data instead of base64-encoding.
The String() method will handle base64 encoding for assertions.
Use raw bytes directly instead of pre-encoded base64 string.
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column. Fix .String() assertion to
compare base64 values via opXdr1.String().
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column in BatchGetByOperationIDs and
BatchGetByStateChangeIDs tests.
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column in BatchGetByOperationIDs test.
Copy the byte slice from the database driver instead of
referencing it directly. The pgx driver reuses its internal
buffer across rows, so without copying, all scanned XDRBytea
values end up pointing to the same (overwritten) buffer.
… tests)

Bring in BYTEA type definitions, indexer changes, GraphQL schema/resolver
updates, processor changes, and test files from opxdr-bytea-2 branch.
These files had no modifications on the timescale branch.
…hemas

Change hash, account_id, operation_xdr, and address columns from TEXT
to BYTEA type while preserving all TimescaleDB hypertable configuration,
composite primary keys, and chunk settings.
… statechanges)

Convert hash, account_id, operation_xdr, and address columns from TEXT
to BYTEA in BatchInsert, BatchCopy, and query methods. Uses HashBytea,
AddressBytea, and pgtypeBytesFromNullAddressBytea for type conversions.
Preserves TimescaleDB junction table patterns (ledger_created_at columns,
composite primary keys, parameter numbering).
Adopt BYTEA types (HashBytea, AddressBytea, XDRBytea, NullAddressBytea)
in test data while preserving TimescaleDB-specific patterns:
- Keep ledger_created_at in junction table INSERTs
- Use generic "duplicate key value violates unique constraint" assertions
  (TimescaleDB chunk-based constraint names differ from standard PG)
- Keep 5-value return from processLedgersInBatch (includes startTime/endTime)
Rename inner err to addrErr in address BYTEA conversion loops to avoid
shadowing the outer err variable from pgx.CopyFrom.
@aditya1702 aditya1702 marked this pull request as ready for review March 16, 2026 15:24
Copilot AI review requested due to automatic review settings March 16, 2026 15:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the positional state_change_order identifier with a deterministic, content-derived state_change_id (FNV-64a) across the indexer, DB schema, data access layer, and GraphQL pagination/cursors to prevent re-ingestion PK collisions when state change categories evolve.

Changes:

  • Introduce StateChangeID and HashKey and remove StateChangeOrder/sorting-based ordering in ingestion.
  • Update DB schema and all query/pagination cursor logic to use state_change_id.
  • Update GraphQL resolvers, dataloaders, and tests to use the new ID/cursor format.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
internal/services/ingest_test.go Updates test state change construction to use StateChangeID.
internal/serve/graphql/resolvers/utils.go Updates state change cursor parsing error messages and cursor struct field to StateChangeID.
internal/serve/graphql/resolvers/transaction_resolvers_test.go Updates assertions to read StateChangeID instead of StateChangeOrder.
internal/serve/graphql/resolvers/transaction.resolvers.go Updates cursor encoding to include StateChangeID.
internal/serve/graphql/resolvers/test_utils.go Updates test DB inserts and fixtures to use state_change_id.
internal/serve/graphql/resolvers/statechange_resolvers_test.go Updates resolver tests to populate StateChangeID.
internal/serve/graphql/resolvers/statechange.resolvers.go Updates resolver lookups to pass StateChangeID.
internal/serve/graphql/resolvers/resolver.go Updates state change composite loader key construction to incorporate stateChangeID.
internal/serve/graphql/resolvers/queries_resolvers_test.go Updates query resolver tests to assert StateChangeID.
internal/serve/graphql/resolvers/queries.resolvers.go Updates cursor encoding to include StateChangeID.
internal/serve/graphql/resolvers/operation_resolvers_test.go Updates operation resolver tests to assert StateChangeID.
internal/serve/graphql/resolvers/operation.resolvers.go Updates cursor encoding to include StateChangeID.
internal/serve/graphql/resolvers/account_resolvers_test.go Updates account resolver tests and expected ordering commentary for state_change_id.
internal/serve/graphql/resolvers/account.resolvers.go Updates cursor encoding to include StateChangeID.
internal/serve/graphql/dataloaders/utils.go Updates composite state change ID parsing docs/errors to reference state_change_id.
internal/indexer/types/types.go Renames model field to StateChangeID, updates cursor struct, and replaces SortKey with HashKey.
internal/indexer/processors/state_change_builder.go Generates HashKey and derives StateChangeID via FNV-64a.
internal/indexer/indexer_test.go Updates indexer tests to use HashKey and removes ordering assumptions.
internal/indexer/indexer_buffer_test.go Updates buffer tests to use StateChangeID.
internal/indexer/indexer.go Removes state change sorting/order assignment logic and adjusts logging.
internal/db/migrations/2025-06-10.4-statechanges.sql Updates state_changes table schema to use state_change_id and updates PK/orderby/index.
internal/data/transactions_test.go Updates tests inserting/querying by state_change_id and composite ID comments.
internal/data/transactions.go Updates joins/tuple IN clause and composite ID concatenation to use state_change_id.
internal/data/statechanges_test.go Updates fixtures and assertions to use StateChangeID and state_change_id columns.
internal/data/statechanges.go Updates all state change queries (cursor columns, ordering, tuple conditions, copy columns) to use state_change_id.
internal/data/query_utils_test.go Updates cursor condition tests to use state_change_id.
internal/data/query_utils.go Updates comment referencing ID columns for state changes.
internal/data/operations_test.go Updates tests inserting/querying by state_change_id and composite ID comments.
internal/data/operations.go Updates joins/tuple IN clause and composite ID concatenation to use state_change_id.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aditya1702 and others added 8 commits March 16, 2026 12:07
Replace the ad-hoc HashKey approach with a deterministic, content-based FNV-64a hash computed in StateChangeBuilder.Build. The builder now computes StateChangeID by hashing all relevant fields with explicit null-tags and length-prefixing (including canonical JSON marshaling for KeyValue), and masks the high bit to ensure a positive ID. Removed the internal HashKey field from StateChange, added NullString() helper on NullAddressBytea, and updated tests to stop relying on HashKey. Added comprehensive state_change_builder tests covering determinism, uniqueness across single-field mutations, NULL vs zero distinctions, positive IDs, and fluent API behavior. Also removed an older DB-level re-ingestion test which is now superseded by the builder unit tests and adjusted other tests to reflect the API change.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@aditya1702 aditya1702 marked this pull request as draft March 16, 2026 19:05
@aditya1702 aditya1702 changed the title Use hash to generate state change ID instead of sort order Generate a random int64 ID for state changes within an operation instead of using sort order Mar 16, 2026
@aditya1702 aditya1702 marked this pull request as ready for review March 16, 2026 21:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the per-operation positional state_change_order identifier with a random 63-bit state_change_id (generated via crypto/rand) to avoid determinism problems with Soroban sub-invocations, updating the DB schema usage, ingestion, data access, and GraphQL pagination/cursors accordingly.

Changes:

  • Replace state_change_order with state_change_id across schema usage, Go types, queries, and GraphQL cursor encoding/decoding.
  • Generate state_change_id at insert time in StateChangeModel.BatchCopy and remove deterministic sort/order assignment from the indexer pipeline.
  • Update/extend tests to reflect the new identifier and cursor fields.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/services/ingest_test.go Updates state change test fixtures to use StateChangeID.
internal/serve/graphql/resolvers/utils.go Updates cursor parsing to expect/return state_change_id.
internal/serve/graphql/resolvers/transaction_resolvers_test.go Updates assertions to validate StateChangeID in cursors.
internal/serve/graphql/resolvers/transaction.resolvers.go Encodes cursors using StateChangeID instead of order.
internal/serve/graphql/resolvers/test_utils.go Updates DB test setup inserts and fixtures to use state_change_id.
internal/serve/graphql/resolvers/statechange_resolvers_test.go Updates resolver tests to use StateChangeID.
internal/serve/graphql/resolvers/statechange.resolvers.go Switches resolver calls to pass StateChangeID.
internal/serve/graphql/resolvers/resolver.go Updates composite state-change loader key to include state_change_id.
internal/serve/graphql/resolvers/queries_resolvers_test.go Updates query resolver tests for StateChangeID cursors.
internal/serve/graphql/resolvers/queries.resolvers.go Encodes query cursors using StateChangeID.
internal/serve/graphql/resolvers/operation_resolvers_test.go Updates operation resolver tests for StateChangeID.
internal/serve/graphql/resolvers/operation.resolvers.go Encodes operation cursors using StateChangeID.
internal/serve/graphql/resolvers/account_resolvers_test.go Updates account resolver tests and expected ordering comment for state_change_id.
internal/serve/graphql/resolvers/account.resolvers.go Encodes account cursors using StateChangeID.
internal/serve/graphql/dataloaders/utils.go Updates composite ID parsing to use state_change_id.
internal/indexer/types/types.go Renames fields/cursor structs to StateChangeID; removes SortKey; adds NullAddressBytea.NullString().
internal/indexer/processors/state_change_builder_test.go Adds tests for fluent builder behavior (independent of ID assignment).
internal/indexer/processors/state_change_builder.go Removes deterministic sort-key generation; documents ID assignment at insertion time.
internal/indexer/processors/contracts_test_utils.go Updates test helper docs re: normalization with random IDs.
internal/indexer/indexer_test.go Removes ordering assertions tied to the old deterministic ordering mechanism.
internal/indexer/indexer_buffer_test.go Updates buffer tests to populate StateChangeID instead of StateChangeOrder.
internal/indexer/indexer.go Removes sort/order assignment logic; updates error logging context.
internal/db/migrations/2025-06-10.4-statechanges.sql Updates table definition/PK/index ordering to use state_change_id.
internal/data/transactions_test.go Updates test inserts/comments to use state_change_id.
internal/data/transactions.go Updates join/tuple filtering and state-change composite ID generation to use state_change_id.
internal/data/statechanges_test.go Updates state change model tests to use StateChangeID and updated cursor fields.
internal/data/statechanges.go Updates queries to use state_change_id; generates random state_change_id during BatchCopy.
internal/data/query_utils_test.go Updates cursor-condition tests to use state_change_id.
internal/data/query_utils.go Updates prepareColumnsWithID docs to reflect composite key usage.
internal/data/operations_test.go Updates test inserts/comments to use state_change_id.
internal/data/operations.go Updates join/tuple filtering and state-change composite ID generation to use state_change_id.
Comments suppressed due to low confidence (1)

internal/data/transactions.go:214

  • In BatchGetByStateChangeIDs, the third slice parameter is still named scOrders even though it now holds state_change_id values. Renaming it (and related locals like tuples := make(..., len(scOrders))) would reduce confusion and prevent misuse.
// BatchGetByStateChangeIDs gets the transactions that are associated with the given state changes
func (m *TransactionModel) BatchGetByStateChangeIDs(ctx context.Context, scToIDs []int64, scOpIDs []int64, scOrders []int64, columns string) ([]*types.TransactionWithStateChangeID, error) {
	columns = prepareColumnsWithID(columns, types.Transaction{}, "transactions", "to_id")

	// Build tuples for the IN clause. Since (to_id, operation_id, state_change_id) is the primary key of state_changes,
	// it will be faster to search on this tuple.
	tuples := make([]string, len(scOrders))
	for i := range scOrders {
		tuples[i] = fmt.Sprintf("(%d, %d, %d)", scToIDs[i], scOpIDs[i], scOrders[i])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

aditya1702 and others added 3 commits March 18, 2026 15:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@aditya1702 aditya1702 merged commit 79822d3 into main Mar 18, 2026
9 checks passed
@aditya1702 aditya1702 deleted the fix-statechanges-order branch March 18, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate a random int64 ID for state changes within an operation instead of using sort order

3 participants