Skip to content

Conversation

m-Peter
Copy link
Collaborator

@m-Peter m-Peter commented Jan 21, 2025

Closes: #727

Description


For contributor use:

  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the standards mentioned here.
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Summary by CodeRabbit

  • New Features
    • Added experimental soft finality mode to speed up event availability.
    • Added optional sealing verification to detect mismatches between unsealed and sealed data.
    • Event ingestion can follow finalized headers and backfill from earlier periods.
    • New CLI flags: --experimental-soft-finality-enabled (default: false) and --experimental-sealing-verification-enabled (default: true), both with safety warnings.
  • Bug Fixes
    • Clearer error messages when collecting or storing transaction traces.
  • Chores
    • Improved module tidy checks in the build process.

@m-Peter m-Peter self-assigned this Jan 21, 2025
Copy link
Contributor

coderabbitai bot commented Jan 21, 2025

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly communicates the primary intent—adding a subscriber to index finalized block results—which matches the core contribution of the changeset. The diff implements that intent by adding a subscriber that consumes finalized block headers and polls GetEventsForBlockIDs (with supporting CrossSporkClient methods), plus related storage and verifier wiring, so the title maps to the main change. The title’s exact type name (RPCBlockHeaderSubscriber) differs from the concrete type added (RPCBlockTrackingSubscriber), which is a minor naming mismatch but does not make the title misleading.
Linked Issues Check ✅ Passed The changes fulfill the PoC objectives from issue #727: the PR subscribes to finalized block headers, adds CrossSporkClient methods to fetch events for a given block, and introduces an RPCBlockTrackingSubscriber that emits per-block EVM events for ingestion. It also acknowledges PoC limitations and provides optional sealing verification plus EventsHash storage and processed-sealed-height handling to bridge soft-finality and later sealed verification. The PR notes that integration tests cannot run for this endpoint and does not include explicit rollback verification in the diff, so operational testing and rollback validation remain necessary.
Out of Scope Changes Check ✅ Passed I found no significant out-of-scope changes: the new subscriber, CrossSporkClient methods, SealingVerifier, EventsHash storage, and bootstrap/config/CLI wiring all directly support the PoC. Minor unrelated housekeeping appears (a Makefile tidy command tweak and clearer engine error messages), but these are low-risk and do not alter the PoC functionality. No large unrelated features or risky cross-cutting changes were introduced.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mpeter/poc-index-finalized-block-results

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efd12b0 and a404eab.

📒 Files selected for processing (2)
  • cmd/run/cmd.go (3 hunks)
  • config/config.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • config/config.go
  • cmd/run/cmd.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a couple small comments, but otherwise this looks good. let's get it running against one of the live networks.

@m-Peter
Copy link
Collaborator Author

m-Peter commented Jan 23, 2025

@peterargue Thanks for the review 🙌 I've addressed all the comments.
Did you perhaps have the chance to take a look at the behavior described in #727 (comment) ? This is puzzling to me 🤔

@m-Peter m-Peter marked this pull request as ready for review January 23, 2025 15:29
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@m-Peter m-Peter force-pushed the mpeter/poc-index-finalized-block-results branch from 314e7cc to bfe6188 Compare January 27, 2025 09:43
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

…k-port-soft-finality

[Backport] Implement `BatchTxPool` to handle nonce mismatch issues
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
services/requester/tx_pool.go (2)

12-14: Consider making the regex more robust.

The current regex assumes the error message ends with a newline, which might not always be the case.

 const (
-	evmErrorRegex = `evm_error=(.*)\n`
+	evmErrorRegex = `evm_error=([^\n]+)`
 )

This pattern will capture everything after "evm_error=" up to but not including a newline, making it more flexible.


26-34: Consider pre-compiling the regex for better performance.

The regex is compiled on every function call. For better performance, especially under high load, consider compiling it once.

+var evmErrorPattern = regexp.MustCompile(evmErrorRegex)
+
 func parseInvalidError(err error) (error, bool) {
-	r := regexp.MustCompile(evmErrorRegex)
-	matches := r.FindStringSubmatch(err.Error())
+	matches := evmErrorPattern.FindStringSubmatch(err.Error())
 	if len(matches) != 2 {
 		return nil, false
 	}
 
 	return errs.NewFailedTransactionError(matches[1]), true
 }
tests/tx_batching_test.go (2)

111-112: Consider reducing test flakiness from timing dependencies.

The conditional sleep based on index could lead to flaky tests if the batch interval timing changes. Consider using a more deterministic approach or waiting for specific events.

Instead of fixed sleep times, consider waiting for transaction pool events or using a mock time source for more deterministic testing.


322-327: Consider extracting test addresses to constants or helper function.

The hardcoded test addresses and private keys could be generated dynamically or moved to test constants for better maintainability.

Consider creating a helper function to generate test EOA addresses and keys:

func generateTestEOAs(count int) map[common.Address]string {
    // Generate deterministic test addresses and keys
}
services/requester/batch_tx_pool.go (1)

109-111: Consider adding batch size limits per sender.

The current implementation can accumulate unlimited transactions per sender, which could lead to memory issues or oversized Flow transactions.

Consider adding a check and handling for maximum batch size:

const maxBatchSizePerSender = 100 // configurable

userTx := pooledEvmTx{txPayload: hexEncodedTx, nonce: tx.Nonce()}
if len(t.pooledTxs[from]) >= maxBatchSizePerSender {
    // Either reject the transaction or trigger immediate batch submission
    return fmt.Errorf("batch size limit reached for sender %s", from.Hex())
}
t.pooledTxs[from] = append(t.pooledTxs[from], userTx)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 092753c and 9364624.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (14)
  • bootstrap/bootstrap.go (6 hunks)
  • cmd/run/cmd.go (3 hunks)
  • config/config.go (1 hunks)
  • go.mod (4 hunks)
  • services/requester/batch_tx_pool.go (1 hunks)
  • services/requester/cadence/batch_run.cdc (1 hunks)
  • services/requester/pool.go (0 hunks)
  • services/requester/requester.go (5 hunks)
  • services/requester/single_tx_pool.go (1 hunks)
  • services/requester/tx_pool.go (1 hunks)
  • tests/go.mod (4 hunks)
  • tests/helpers.go (4 hunks)
  • tests/integration_test.go (1 hunks)
  • tests/tx_batching_test.go (1 hunks)
💤 Files with no reviewable changes (1)
  • services/requester/pool.go
✅ Files skipped from review due to trivial changes (1)
  • tests/go.mod
🚧 Files skipped from review as they are similar to previous changes (4)
  • config/config.go
  • bootstrap/bootstrap.go
  • go.mod
  • cmd/run/cmd.go
🔇 Additional comments (13)
tests/integration_test.go (1)

558-558: LGTM! Excellent resource management improvement.

The addition of explicit cleanup for both boot instances ensures proper resource management and prevents potential leaks. The deferred cleanup with nil-check is well-structured and follows Go best practices.

Also applies to: 565-569

tests/helpers.go (3)

95-96: Configuration changes look good.

Enabling EVM setup while disabling VM bridge aligns with the PR's focus on EVM functionality.


246-258: Proper COA initialization with existence checks.

Good defensive programming - checking for existing resources before creating new ones prevents runtime errors and follows Cadence best practices.


468-481: Clean implementation of getBalance RPC method.

The method follows the existing pattern and properly handles the JSON response with appropriate type conversion.

services/requester/requester.go (2)

38-39: Proper embedding of batch run script.

Good use of go:embed directive for the new batch transaction script.


237-239: Clean delegation to transaction pool.

The simplified implementation properly delegates transaction submission to the pool while maintaining all necessary validation.

tests/tx_batching_test.go (2)

128-164: Excellent test coverage for nonce validation errors.

The test properly validates both "nonce too low" and "nonce too high" scenarios, and correctly asserts that the batch fails with the appropriate error message.


347-382: Well-structured concurrent testing with proper synchronization.

Excellent use of errgroup for managing concurrent goroutines and proper error propagation. The random delays effectively simulate real-world concurrent transaction submissions.

services/requester/single_tx_pool.go (3)

154-198: Well-implemented transaction building with proper resource management.

The method correctly handles keystore locking, metric updates, and error scenarios. The pattern of taking a key and returning it on error is particularly good.


200-226: Efficient concurrent fetching implementation.

Good use of errgroup for parallel fetching of block and account information, which optimizes the transaction preparation time.


117-121: Fix defer placement to avoid deleting non-existent entries.

The defer t.pool.Delete(tx.Hash()) statement will execute even if an error occurs before the transaction is stored in the pool. This could lead to unnecessary deletion attempts on non-existent entries.

Move the defer statement after the Store operation:

 if t.config.TxStateValidation == config.TxSealValidation {
     // add to pool and delete after transaction is sealed or errored out
     t.pool.Store(tx.Hash(), tx)
-    defer t.pool.Delete(tx.Hash())
+    defer t.pool.Delete(tx.Hash())

Actually, the better approach would be to restructure the code to ensure defer is only set after successful storage:

 if t.config.TxStateValidation == config.TxSealValidation {
-    // add to pool and delete after transaction is sealed or errored out
-    t.pool.Store(tx.Hash(), tx)
-    defer t.pool.Delete(tx.Hash())
+    // add to pool
+    t.pool.Store(tx.Hash(), tx)
+    // ensure cleanup after transaction is sealed or errored out
+    defer func() {
+        t.pool.Delete(tx.Hash())
+    }()

Likely an incorrect or invalid review comment.

services/requester/batch_tx_pool.go (2)

131-138: Excellent concurrent design with atomic map swapping.

The pattern of atomically swapping the pooled transactions map allows new transactions to be added without blocking during batch submission. This is a well-designed approach for high-throughput scenarios.


165-169: Critical nonce sorting ensures transaction order preservation.

Excellent implementation of nonce-based sorting to maintain transaction order within a batch. This is essential for preventing nonce-related errors.

Comment on lines 28 to 33
for txResult in txResults {
assert(
txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful,
message: "evm_error=".concat(txResult.errorMessage).concat("\n")
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

The loop is redundant - it will always fail on the first iteration.

Since this code is only reached when all transactions have invalid status (neither failed nor successful), the assertion txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful will always be false on the first iteration, causing the transaction to fail immediately. The loop structure is unnecessary.

Consider simplifying to directly assert on the first transaction result:

-        for txResult in txResults {
-            assert(
-                txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful,
-                message: "evm_error=".concat(txResult.errorMessage).concat("\n")
-            )
-        }
+        assert(
+            false,
+            message: "evm_error=".concat(txResults[0].errorMessage).concat("\n")
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for txResult in txResults {
assert(
txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful,
message: "evm_error=".concat(txResult.errorMessage).concat("\n")
)
}
assert(
false,
message: "evm_error=".concat(txResults[0].errorMessage).concat("\n")
)
🤖 Prompt for AI Agents
In services/requester/cadence/batch_run.cdc around lines 28 to 33, the for loop
asserting each txResult's status is redundant because the code path is only
reached when all transactions have invalid statuses, causing the assertion to
fail immediately on the first iteration. Replace the loop with a single
assertion on the first txResult's status to simplify the code and avoid
unnecessary iteration.

txMux: sync.Mutex{},
}

go batchPool.processPooledTransactions(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add panic recovery for background goroutine.

The background goroutine processing transactions could panic and terminate batching functionality. Add panic recovery to ensure resilience.

-go batchPool.processPooledTransactions(ctx)
+go func() {
+    defer func() {
+        if r := recover(); r != nil {
+            logger.Error().Interface("panic", r).Msg("panic in batch transaction processor")
+            // Optionally restart the goroutine or implement a circuit breaker
+        }
+    }()
+    batchPool.processPooledTransactions(ctx)
+}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go batchPool.processPooledTransactions(ctx)
go func() {
defer func() {
if r := recover(); r != nil {
logger.Error().Interface("panic", r).Msg("panic in batch transaction processor")
// Optionally restart the goroutine or implement a circuit breaker
}
}()
batchPool.processPooledTransactions(ctx)
}()
🤖 Prompt for AI Agents
In services/requester/batch_tx_pool.go at line 76, the background goroutine
started with go batchPool.processPooledTransactions(ctx) lacks panic recovery.
Wrap the call to processPooledTransactions in an anonymous function that defers
a recover function to catch and handle any panics, preventing the goroutine from
crashing and ensuring the batching functionality remains resilient.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
services/ingestion/block_tracking_subscriber.go (2)

177-177: Fix typo in comment.

-					// typically we receive an error in the errChan before the channels are closes
+					// typically we receive an error in the errChan before the channels are closed

359-359: Use the actual event type constant instead of string comparison.

Using string contains for event type comparison is fragile. Consider using the actual constant or a more precise comparison.

-	if strings.Contains(eventType, string(events.EventTypeBlockExecuted)) && len(event.Events) != 1 {
+	if eventType == string(events.EventTypeBlockExecuted) && len(event.Events) != 1 {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34f6efd and 88e6d7e.

📒 Files selected for processing (4)
  • bootstrap/bootstrap.go (5 hunks)
  • cmd/run/cmd.go (3 hunks)
  • config/config.go (1 hunks)
  • services/ingestion/block_tracking_subscriber.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • config/config.go
  • bootstrap/bootstrap.go
  • cmd/run/cmd.go
🧰 Additional context used
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)

Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway project, within the Go file api/stream.go, the prepareBlockResponse method includes the Bloom filter as the field LogsBloom in the returned Block struct.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx), perform deep copies by iterating over the map and copying each slice individually using make() and copy() to avoid shared references that could lead to race conditions and data corruption.

Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway Go project, the validation ensuring that startHeight is less than or equal to endHeight is performed before the StartTraceDownloader method in bootstrap/bootstrap.go, so additional checks in this method are unnecessary.

Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go, the e.indexBlockTraces function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (1)
services/ingestion/block_tracking_subscriber.go (1)

114-114: No change needed for logger.Fatal() here

The use of logger.Fatal() to terminate on a verifier failure is consistent with the rest of the codebase, where critical errors trigger a process exit:

  • storage/pebble/db.go: log.Fatal().Err(err).Msg("failed to close batch")
  • services/ingestion/sealing_verifier.go: v.logger.Fatal().Err(err)...
  • services/ingestion/engine.go: e.log.Fatal().Err(err).Msg("failed to close batch")
  • bootstrap/bootstrap.go: l.Fatal().Err(err).Msg("engine failed to run")

Terminate-on-error for the sealing verifier aligns with the intended failure semantics elsewhere.

Comment on lines +331 to +347
for {
evts, err = r.client.GetEventsForBlockHeader(
ctx,
eventType,
blockHeader,
)
if err != nil {
// retry after a short pause
if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted {
time.Sleep(200 * time.Millisecond)
continue
}

return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err)
}
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add maximum retry attempts to prevent infinite loops.

The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. Consider adding a maximum retry count with exponential backoff.

+	const maxRetries = 10
+	retryCount := 0
+	retryDelay := 200 * time.Millisecond
+
 	// retry until we get the block from an execution node that has the events
 	for {
 		evts, err = r.client.GetEventsForBlockHeader(
 			ctx,
 			eventType,
 			blockHeader,
 		)
 		if err != nil {
 			// retry after a short pause
 			if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted {
-				time.Sleep(200 * time.Millisecond)
+				retryCount++
+				if retryCount > maxRetries {
+					return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err)
+				}
+				time.Sleep(retryDelay)
+				retryDelay = retryDelay * 2 // exponential backoff
+				if retryDelay > 5*time.Second {
+					retryDelay = 5*time.Second // cap at 5 seconds
+				}
 				continue
 			}
 
 			return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err)
 		}
 		break
 	}
🤖 Prompt for AI Agents
In services/ingestion/block_tracking_subscriber.go around lines 331 to 347, the
retry loop for fetching events can run indefinitely on NotFound or
ResourceExhausted errors. To fix this, introduce a maximum retry count to limit
the number of attempts and implement exponential backoff by increasing the sleep
duration after each retry. This will prevent infinite loops and reduce load on
the system during persistent errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
services/ingestion/block_tracking_subscriber.go (1)

343-359: Add maximum retry attempts to prevent infinite loops.

The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. This was mentioned in a previous review but wasn't addressed.

Consider implementing the suggested retry limit with exponential backoff:

+	const maxRetries = 10
+	retryCount := 0
+	retryDelay := 200 * time.Millisecond
+
 	// retry until we get the block from an execution node that has the events
 	for {
 		evts, err = r.client.GetEventsForBlockHeader(
 			ctx,
 			eventType,
 			blockHeader,
 		)
 		if err != nil {
 			// retry after a short pause
 			if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted {
-				time.Sleep(200 * time.Millisecond)
+				retryCount++
+				if retryCount > maxRetries {
+					return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err)
+				}
+				time.Sleep(retryDelay)
+				retryDelay = retryDelay * 2 // exponential backoff
+				if retryDelay > 5*time.Second {
+					retryDelay = 5*time.Second // cap at 5 seconds
+				}
 				continue
 			}
 
 			return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err)
 		}
 		break
 	}
🧹 Nitpick comments (1)
services/ingestion/block_tracking_subscriber.go (1)

235-235: Consider making the block header queue size configurable.

The hardcoded value of 5 blocks for the notification delay might need adjustment based on network conditions or chain configuration. Consider making this a configurable parameter.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88e6d7e and 7575806.

📒 Files selected for processing (1)
  • services/ingestion/block_tracking_subscriber.go (1 hunks)
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)

Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway project, within the Go file api/stream.go, the prepareBlockResponse method includes the Bloom filter as the field LogsBloom in the returned Block struct.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx), perform deep copies by iterating over the map and copying each slice individually using make() and copy() to avoid shared references that could lead to race conditions and data corruption.

Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway Go project, the validation ensuring that startHeight is less than or equal to endHeight is performed before the StartTraceDownloader method in bootstrap/bootstrap.go, so additional checks in this method are unnecessary.

Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go, the e.indexBlockTraces function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.

🧰 Additional context used
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)

Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway project, within the Go file api/stream.go, the prepareBlockResponse method includes the Bloom filter as the field LogsBloom in the returned Block struct.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx), perform deep copies by iterating over the map and copying each slice individually using make() and copy() to avoid shared references that could lead to race conditions and data corruption.

Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway Go project, the validation ensuring that startHeight is less than or equal to endHeight is performed before the StartTraceDownloader method in bootstrap/bootstrap.go, so additional checks in this method are unnecessary.

Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.

Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go, the e.indexBlockTraces function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (3)
services/ingestion/block_tracking_subscriber.go (3)

28-65: Well-structured implementation with appropriate warnings.

The struct definition and constructor are clean and follow Go best practices. The extensive documentation warning about unsealed data limitations is excellent and helps users understand the risks involved.


67-129: Sound implementation of subscription with proper backfilling logic.

The method correctly handles backfilling from previous sporks before subscribing to live events. The verifier is appropriately started after backfilling since that data is already sealed. Error handling during backfilling is properly implemented.


286-332: Robust event retrieval with proper system transaction failure handling.

The method correctly handles the edge case where system transactions fail but EVM transactions may still exist. The logic for checking TransactionHashRoot and combining events is sound.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
services/ingestion/block_tracking_subscriber.go (1)

343-359: Add maximum retry attempts to prevent infinite loops.

The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. This was previously flagged but not addressed.

+const maxRetries = 10
+retryCount := 0
+retryDelay := 200 * time.Millisecond
+
 // retry until we get the block from an execution node that has the events
 for {
     evts, err = r.client.GetEventsForBlockHeader(
         ctx,
         eventType,
         blockHeader,
     )
     if err != nil {
         // retry after a short pause
         if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted {
-            time.Sleep(200 * time.Millisecond)
+            retryCount++
+            if retryCount > maxRetries {
+                return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err)
+            }
+            time.Sleep(retryDelay)
+            retryDelay = retryDelay * 2 // exponential backoff
+            if retryDelay > 5*time.Second {
+                retryDelay = 5*time.Second // cap at 5 seconds
+            }
             continue
         }
 
         return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err)
     }
     break
 }
🧹 Nitpick comments (2)
services/ingestion/block_tracking_subscriber.go (2)

110-118: Consider using Error logging instead of Fatal for verifier failures.

Using Fatal logging will terminate the entire application if the verifier encounters an error. This might be too severe for a verification component that's optional and experimental.

 go func() {
     r.verifier.SetStartHeight(r.height)
     if err := r.verifier.Run(ctx); err != nil {
-        r.logger.Fatal().Err(err).Msg("failure running sealing verifier")
+        r.logger.Error().Err(err).Msg("failure running sealing verifier")
         return
     }
 }()

235-240: Consider making the block notification delay configurable.

The hardcoded value of 15 blocks for the notification delay might need adjustment based on network conditions or AN sync performance.

+const blockNotificationDelay = 15
+
 // ...
-if len(blockHeadersQueue) > 15 {
+if len(blockHeadersQueue) > blockNotificationDelay {
     earliestBlockHeader := blockHeadersQueue[0]
     r.keyLock.NotifyBlock(earliestBlockHeader)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7575806 and 6f5b3c9.

📒 Files selected for processing (1)
  • services/ingestion/block_tracking_subscriber.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test
  • GitHub Check: Lint
🔇 Additional comments (1)
services/ingestion/block_tracking_subscriber.go (1)

28-45: Well-documented experimental feature with appropriate warnings.

The extensive documentation clearly outlines the limitations and risks of using unsealed finalized blocks. The warning about manual tracking requirements and Proof of Concept usage is appropriately prominent.

coderabbitai[bot]

This comment was marked as outdated.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
go.mod (1)

3-3: Invalid go directive: use major.minor (no patch).

The go version in go.mod must be like 1.25, not 1.25.0. This breaks tooling.

Apply:

-go 1.25.0
+go 1.25
tests/go.mod (1)

3-3: Invalid go directive in tests module.

Same as root: use 1.25, not 1.25.0.

Apply:

-go 1.25.0
+go 1.25
🧹 Nitpick comments (4)
tests/go.mod (4)

14-14: Version drift: zerolog differs between root (v1.33.0) and tests (v1.34.0).

Not harmful, but avoid dual versions unless needed. Consider aligning to one version across modules.

Apply (either direction):

-	github.com/rs/zerolog v1.34.0
+	github.com/rs/zerolog v1.33.0

181-184: Prometheus stack minor bumps only in tests: consider aligning with root.

To reduce duplicate versions and metric wiring surprises, align client_golang, common, and procfs with the root module.


233-233: Duplicate YAML modules in graph.

Both go.yaml.in/yaml/v2 (v2.4.2) and gopkg.in/yaml.v2 (v2.4.0 at Line 252) appear. Prefer a single path if possible; if transitive, consider bumping deps to converge, then go mod tidy.


249-250: gRPC/protobuf patch drift vs root.

Tests use grpc v1.75.1 and protobuf v1.36.8; root has v1.75.0 and v1.36.7. Harmless, but aligning reduces duplicate loads and avoids subtle ABI issues.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8f7437 and efd12b0.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • go.mod (2 hunks)
  • tests/go.mod (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Test
🔇 Additional comments (4)
go.mod (2)

14-14: flow-go bump to v0.43.0: confirm API compatibility.

v0.43.0 likely changes Access API/proto shapes. Please confirm RPCBlockHeaderSubscriber/CrossSporkClient compile and run against this version in CI, and that protobuf-gen outputs were refreshed.


148-156: Contract/template/protobuf version sync looks good; verify event types.

Upgrading core-contracts, templates, NFT, and flow protobuf to newer versions is consistent with finalized-header indexing. Please verify emitted EVM event type identifiers and payload fields used by the indexer didn’t change, and run go mod tidy -compat=1.25 to lock a coherent graph.

tests/go.mod (2)

10-10: Emulator pseudo‑version: does it expose finalized‑header subscription?

You noted E2E is blocked because the Emulator lacks this endpoint. This update suggests a very recent pseudo‑version; please confirm whether it now implements finalized‑header subscription. If not, gate/skip tests that depend on it to keep CI green.


12-12: Align flow-go version with root module (OK).

Tests now use v0.43.0, matching the root. Good for API parity.

Run go mod tidy -compat=1.25 in both modules to ensure the graph is minimal and consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PoC to allow indexing unsealed finalized execution results

4 participants