-
Notifications
You must be signed in to change notification settings - Fork 11
Implement the RPCBlockHeaderSubscriber
for indexing finalized results
#728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
10a8bdd
to
4fd4a5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple small comments, but otherwise this looks good. let's get it running against one of the live networks.
@peterargue Thanks for the review 🙌 I've addressed all the comments. |
…ersFromStartHeight
…or function resuse
314e7cc
to
bfe6188
Compare
Co-authored-by: Peter Argue <[email protected]>
…k-port-soft-finality [Backport] Implement `BatchTxPool` to handle nonce mismatch issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
services/requester/tx_pool.go (2)
12-14
: Consider making the regex more robust.The current regex assumes the error message ends with a newline, which might not always be the case.
const ( - evmErrorRegex = `evm_error=(.*)\n` + evmErrorRegex = `evm_error=([^\n]+)` )This pattern will capture everything after "evm_error=" up to but not including a newline, making it more flexible.
26-34
: Consider pre-compiling the regex for better performance.The regex is compiled on every function call. For better performance, especially under high load, consider compiling it once.
+var evmErrorPattern = regexp.MustCompile(evmErrorRegex) + func parseInvalidError(err error) (error, bool) { - r := regexp.MustCompile(evmErrorRegex) - matches := r.FindStringSubmatch(err.Error()) + matches := evmErrorPattern.FindStringSubmatch(err.Error()) if len(matches) != 2 { return nil, false } return errs.NewFailedTransactionError(matches[1]), true }tests/tx_batching_test.go (2)
111-112
: Consider reducing test flakiness from timing dependencies.The conditional sleep based on index could lead to flaky tests if the batch interval timing changes. Consider using a more deterministic approach or waiting for specific events.
Instead of fixed sleep times, consider waiting for transaction pool events or using a mock time source for more deterministic testing.
322-327
: Consider extracting test addresses to constants or helper function.The hardcoded test addresses and private keys could be generated dynamically or moved to test constants for better maintainability.
Consider creating a helper function to generate test EOA addresses and keys:
func generateTestEOAs(count int) map[common.Address]string { // Generate deterministic test addresses and keys }services/requester/batch_tx_pool.go (1)
109-111
: Consider adding batch size limits per sender.The current implementation can accumulate unlimited transactions per sender, which could lead to memory issues or oversized Flow transactions.
Consider adding a check and handling for maximum batch size:
const maxBatchSizePerSender = 100 // configurable userTx := pooledEvmTx{txPayload: hexEncodedTx, nonce: tx.Nonce()} if len(t.pooledTxs[from]) >= maxBatchSizePerSender { // Either reject the transaction or trigger immediate batch submission return fmt.Errorf("batch size limit reached for sender %s", from.Hex()) } t.pooledTxs[from] = append(t.pooledTxs[from], userTx)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
go.sum
is excluded by!**/*.sum
tests/go.sum
is excluded by!**/*.sum
📒 Files selected for processing (14)
bootstrap/bootstrap.go
(6 hunks)cmd/run/cmd.go
(3 hunks)config/config.go
(1 hunks)go.mod
(4 hunks)services/requester/batch_tx_pool.go
(1 hunks)services/requester/cadence/batch_run.cdc
(1 hunks)services/requester/pool.go
(0 hunks)services/requester/requester.go
(5 hunks)services/requester/single_tx_pool.go
(1 hunks)services/requester/tx_pool.go
(1 hunks)tests/go.mod
(4 hunks)tests/helpers.go
(4 hunks)tests/integration_test.go
(1 hunks)tests/tx_batching_test.go
(1 hunks)
💤 Files with no reviewable changes (1)
- services/requester/pool.go
✅ Files skipped from review due to trivial changes (1)
- tests/go.mod
🚧 Files skipped from review as they are similar to previous changes (4)
- config/config.go
- bootstrap/bootstrap.go
- go.mod
- cmd/run/cmd.go
🔇 Additional comments (13)
tests/integration_test.go (1)
558-558
: LGTM! Excellent resource management improvement.The addition of explicit cleanup for both boot instances ensures proper resource management and prevents potential leaks. The deferred cleanup with nil-check is well-structured and follows Go best practices.
Also applies to: 565-569
tests/helpers.go (3)
95-96
: Configuration changes look good.Enabling EVM setup while disabling VM bridge aligns with the PR's focus on EVM functionality.
246-258
: Proper COA initialization with existence checks.Good defensive programming - checking for existing resources before creating new ones prevents runtime errors and follows Cadence best practices.
468-481
: Clean implementation of getBalance RPC method.The method follows the existing pattern and properly handles the JSON response with appropriate type conversion.
services/requester/requester.go (2)
38-39
: Proper embedding of batch run script.Good use of go:embed directive for the new batch transaction script.
237-239
: Clean delegation to transaction pool.The simplified implementation properly delegates transaction submission to the pool while maintaining all necessary validation.
tests/tx_batching_test.go (2)
128-164
: Excellent test coverage for nonce validation errors.The test properly validates both "nonce too low" and "nonce too high" scenarios, and correctly asserts that the batch fails with the appropriate error message.
347-382
: Well-structured concurrent testing with proper synchronization.Excellent use of errgroup for managing concurrent goroutines and proper error propagation. The random delays effectively simulate real-world concurrent transaction submissions.
services/requester/single_tx_pool.go (3)
154-198
: Well-implemented transaction building with proper resource management.The method correctly handles keystore locking, metric updates, and error scenarios. The pattern of taking a key and returning it on error is particularly good.
200-226
: Efficient concurrent fetching implementation.Good use of errgroup for parallel fetching of block and account information, which optimizes the transaction preparation time.
117-121
: Fix defer placement to avoid deleting non-existent entries.The
defer t.pool.Delete(tx.Hash())
statement will execute even if an error occurs before the transaction is stored in the pool. This could lead to unnecessary deletion attempts on non-existent entries.Move the defer statement after the Store operation:
if t.config.TxStateValidation == config.TxSealValidation { // add to pool and delete after transaction is sealed or errored out t.pool.Store(tx.Hash(), tx) - defer t.pool.Delete(tx.Hash()) + defer t.pool.Delete(tx.Hash())Actually, the better approach would be to restructure the code to ensure defer is only set after successful storage:
if t.config.TxStateValidation == config.TxSealValidation { - // add to pool and delete after transaction is sealed or errored out - t.pool.Store(tx.Hash(), tx) - defer t.pool.Delete(tx.Hash()) + // add to pool + t.pool.Store(tx.Hash(), tx) + // ensure cleanup after transaction is sealed or errored out + defer func() { + t.pool.Delete(tx.Hash()) + }()Likely an incorrect or invalid review comment.
services/requester/batch_tx_pool.go (2)
131-138
: Excellent concurrent design with atomic map swapping.The pattern of atomically swapping the pooled transactions map allows new transactions to be added without blocking during batch submission. This is a well-designed approach for high-throughput scenarios.
165-169
: Critical nonce sorting ensures transaction order preservation.Excellent implementation of nonce-based sorting to maintain transaction order within a batch. This is essential for preventing nonce-related errors.
for txResult in txResults { | ||
assert( | ||
txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful, | ||
message: "evm_error=".concat(txResult.errorMessage).concat("\n") | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
The loop is redundant - it will always fail on the first iteration.
Since this code is only reached when all transactions have invalid status (neither failed nor successful), the assertion txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful
will always be false on the first iteration, causing the transaction to fail immediately. The loop structure is unnecessary.
Consider simplifying to directly assert on the first transaction result:
- for txResult in txResults {
- assert(
- txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful,
- message: "evm_error=".concat(txResult.errorMessage).concat("\n")
- )
- }
+ assert(
+ false,
+ message: "evm_error=".concat(txResults[0].errorMessage).concat("\n")
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for txResult in txResults { | |
assert( | |
txResult.status == EVM.Status.failed || txResult.status == EVM.Status.successful, | |
message: "evm_error=".concat(txResult.errorMessage).concat("\n") | |
) | |
} | |
assert( | |
false, | |
message: "evm_error=".concat(txResults[0].errorMessage).concat("\n") | |
) |
🤖 Prompt for AI Agents
In services/requester/cadence/batch_run.cdc around lines 28 to 33, the for loop
asserting each txResult's status is redundant because the code path is only
reached when all transactions have invalid statuses, causing the assertion to
fail immediately on the first iteration. Replace the loop with a single
assertion on the first txResult's status to simplify the code and avoid
unnecessary iteration.
txMux: sync.Mutex{}, | ||
} | ||
|
||
go batchPool.processPooledTransactions(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add panic recovery for background goroutine.
The background goroutine processing transactions could panic and terminate batching functionality. Add panic recovery to ensure resilience.
-go batchPool.processPooledTransactions(ctx)
+go func() {
+ defer func() {
+ if r := recover(); r != nil {
+ logger.Error().Interface("panic", r).Msg("panic in batch transaction processor")
+ // Optionally restart the goroutine or implement a circuit breaker
+ }
+ }()
+ batchPool.processPooledTransactions(ctx)
+}()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
go batchPool.processPooledTransactions(ctx) | |
go func() { | |
defer func() { | |
if r := recover(); r != nil { | |
logger.Error().Interface("panic", r).Msg("panic in batch transaction processor") | |
// Optionally restart the goroutine or implement a circuit breaker | |
} | |
}() | |
batchPool.processPooledTransactions(ctx) | |
}() |
🤖 Prompt for AI Agents
In services/requester/batch_tx_pool.go at line 76, the background goroutine
started with go batchPool.processPooledTransactions(ctx) lacks panic recovery.
Wrap the call to processPooledTransactions in an anonymous function that defers
a recover function to catch and handle any panics, preventing the goroutine from
crashing and ensuring the batching functionality remains resilient.
Sync soft-finality with changes from `main` branch
Sync soft-finality branch with changes from main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
services/ingestion/block_tracking_subscriber.go (2)
177-177
: Fix typo in comment.- // typically we receive an error in the errChan before the channels are closes + // typically we receive an error in the errChan before the channels are closed
359-359
: Use the actual event type constant instead of string comparison.Using string contains for event type comparison is fragile. Consider using the actual constant or a more precise comparison.
- if strings.Contains(eventType, string(events.EventTypeBlockExecuted)) && len(event.Events) != 1 { + if eventType == string(events.EventTypeBlockExecuted) && len(event.Events) != 1 {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
bootstrap/bootstrap.go
(5 hunks)cmd/run/cmd.go
(3 hunks)config/config.go
(1 hunks)services/ingestion/block_tracking_subscriber.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- config/config.go
- bootstrap/bootstrap.go
- cmd/run/cmd.go
🧰 Additional context used
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)
Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway
project, within the Go file api/stream.go
, the prepareBlockResponse
method includes the Bloom filter as the field LogsBloom
in the returned Block
struct.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx
), perform deep copies by iterating over the map and copying each slice individually using make()
and copy()
to avoid shared references that could lead to race conditions and data corruption.
Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway
Go project, the validation ensuring that startHeight
is less than or equal to endHeight
is performed before the StartTraceDownloader
method in bootstrap/bootstrap.go
, so additional checks in this method are unnecessary.
Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).
Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go
, the e.indexBlockTraces
function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test
🔇 Additional comments (1)
services/ingestion/block_tracking_subscriber.go (1)
114-114
: No change needed forlogger.Fatal()
hereThe use of
logger.Fatal()
to terminate on a verifier failure is consistent with the rest of the codebase, where critical errors trigger a process exit:
- storage/pebble/db.go:
log.Fatal().Err(err).Msg("failed to close batch")
- services/ingestion/sealing_verifier.go:
v.logger.Fatal().Err(err)...
- services/ingestion/engine.go:
e.log.Fatal().Err(err).Msg("failed to close batch")
- bootstrap/bootstrap.go:
l.Fatal().Err(err).Msg("engine failed to run")
Terminate-on-error for the sealing verifier aligns with the intended failure semantics elsewhere.
for { | ||
evts, err = r.client.GetEventsForBlockHeader( | ||
ctx, | ||
eventType, | ||
blockHeader, | ||
) | ||
if err != nil { | ||
// retry after a short pause | ||
if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted { | ||
time.Sleep(200 * time.Millisecond) | ||
continue | ||
} | ||
|
||
return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err) | ||
} | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add maximum retry attempts to prevent infinite loops.
The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. Consider adding a maximum retry count with exponential backoff.
+ const maxRetries = 10
+ retryCount := 0
+ retryDelay := 200 * time.Millisecond
+
// retry until we get the block from an execution node that has the events
for {
evts, err = r.client.GetEventsForBlockHeader(
ctx,
eventType,
blockHeader,
)
if err != nil {
// retry after a short pause
if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted {
- time.Sleep(200 * time.Millisecond)
+ retryCount++
+ if retryCount > maxRetries {
+ return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err)
+ }
+ time.Sleep(retryDelay)
+ retryDelay = retryDelay * 2 // exponential backoff
+ if retryDelay > 5*time.Second {
+ retryDelay = 5*time.Second // cap at 5 seconds
+ }
continue
}
return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err)
}
break
}
🤖 Prompt for AI Agents
In services/ingestion/block_tracking_subscriber.go around lines 331 to 347, the
retry loop for fetching events can run indefinitely on NotFound or
ResourceExhausted errors. To fix this, introduce a maximum retry count to limit
the number of attempts and implement exponential backoff by increasing the sleep
duration after each retry. This will prevent infinite loops and reduce load on
the system during persistent errors.
Improve release frequency of signing keys for soft-finality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
services/ingestion/block_tracking_subscriber.go (1)
343-359
: Add maximum retry attempts to prevent infinite loops.The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. This was mentioned in a previous review but wasn't addressed.
Consider implementing the suggested retry limit with exponential backoff:
+ const maxRetries = 10 + retryCount := 0 + retryDelay := 200 * time.Millisecond + // retry until we get the block from an execution node that has the events for { evts, err = r.client.GetEventsForBlockHeader( ctx, eventType, blockHeader, ) if err != nil { // retry after a short pause if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted { - time.Sleep(200 * time.Millisecond) + retryCount++ + if retryCount > maxRetries { + return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err) + } + time.Sleep(retryDelay) + retryDelay = retryDelay * 2 // exponential backoff + if retryDelay > 5*time.Second { + retryDelay = 5*time.Second // cap at 5 seconds + } continue } return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err) } break }
🧹 Nitpick comments (1)
services/ingestion/block_tracking_subscriber.go (1)
235-235
: Consider making the block header queue size configurable.The hardcoded value of 5 blocks for the notification delay might need adjustment based on network conditions or chain configuration. Consider making this a configurable parameter.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
services/ingestion/block_tracking_subscriber.go
(1 hunks)
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)
Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway
project, within the Go file api/stream.go
, the prepareBlockResponse
method includes the Bloom filter as the field LogsBloom
in the returned Block
struct.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx
), perform deep copies by iterating over the map and copying each slice individually using make()
and copy()
to avoid shared references that could lead to race conditions and data corruption.
Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway
Go project, the validation ensuring that startHeight
is less than or equal to endHeight
is performed before the StartTraceDownloader
method in bootstrap/bootstrap.go
, so additional checks in this method are unnecessary.
Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).
Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go
, the e.indexBlockTraces
function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.
🧰 Additional context used
🧠 Learnings (1)
services/ingestion/block_tracking_subscriber.go (9)
Learnt from: peterargue
PR: #617
File: api/stream.go:62-67
Timestamp: 2024-10-18T19:26:37.579Z
Learning: In the flow-evm-gateway
project, within the Go file api/stream.go
, the prepareBlockResponse
method includes the Bloom filter as the field LogsBloom
in the returned Block
struct.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.
Learnt from: m-Peter
PR: #831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like map[gethCommon.Address][]pooledEvmTx
), perform deep copies by iterating over the map and copying each slice individually using make()
and copy()
to avoid shared references that could lead to race conditions and data corruption.
Learnt from: peterargue
PR: #615
File: bootstrap/bootstrap.go:167-197
Timestamp: 2024-10-17T18:04:04.165Z
Learning: In the flow-evm-gateway
Go project, the validation ensuring that startHeight
is less than or equal to endHeight
is performed before the StartTraceDownloader
method in bootstrap/bootstrap.go
, so additional checks in this method are unnecessary.
Learnt from: peterargue
PR: #772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).
Learnt from: peterargue
PR: #682
File: services/ingestion/event_subscriber.go:85-90
Timestamp: 2025-01-24T20:15:10.908Z
Learning: When calculating blocks to catch up in Flow, initialize the counter to 0 and only set it when the latest chain height is greater than the current height (latestOnChainHeight > currentHeight). There's no need to handle the else case as 0 blocks to catch up is the correct default when we're at the same height or ahead.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:175-178
Timestamp: 2024-10-17T18:05:27.084Z
Learning: In the trace backfill process, returning upon encountering an error when retrieving a block or its Cadence ID is intentional because future blocks are unlikely to exist if these blocks don't.
Learnt from: peterargue
PR: #615
File: services/traces/engine.go:190-190
Timestamp: 2024-10-17T18:04:41.062Z
Learning: In services/traces/engine.go
, the e.indexBlockTraces
function limits the number of goroutines to a maximum of 5 and blocks until all are complete, so during backfill, concurrency is already managed to prevent resource exhaustion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test
🔇 Additional comments (3)
services/ingestion/block_tracking_subscriber.go (3)
28-65
: Well-structured implementation with appropriate warnings.The struct definition and constructor are clean and follow Go best practices. The extensive documentation warning about unsealed data limitations is excellent and helps users understand the risks involved.
67-129
: Sound implementation of subscription with proper backfilling logic.The method correctly handles backfilling from previous sporks before subscribing to live events. The verifier is appropriately started after backfilling since that data is already sealed. Error handling during backfilling is properly implemented.
286-332
: Robust event retrieval with proper system transaction failure handling.The method correctly handles the edge case where system transactions fail but EVM transactions may still exist. The logic for checking TransactionHashRoot and combining events is sound.
Give blocks a better chance to become sealed before calling `NotifyBlock`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
services/ingestion/block_tracking_subscriber.go (1)
343-359
: Add maximum retry attempts to prevent infinite loops.The retry loop could run indefinitely if NotFound or ResourceExhausted errors persist. This was previously flagged but not addressed.
+const maxRetries = 10 +retryCount := 0 +retryDelay := 200 * time.Millisecond + // retry until we get the block from an execution node that has the events for { evts, err = r.client.GetEventsForBlockHeader( ctx, eventType, blockHeader, ) if err != nil { // retry after a short pause if status.Code(err) == codes.NotFound || status.Code(err) == codes.ResourceExhausted { - time.Sleep(200 * time.Millisecond) + retryCount++ + if retryCount > maxRetries { + return flow.BlockEvents{}, fmt.Errorf("max retries exceeded after %d attempts: %w", maxRetries, err) + } + time.Sleep(retryDelay) + retryDelay = retryDelay * 2 // exponential backoff + if retryDelay > 5*time.Second { + retryDelay = 5*time.Second // cap at 5 seconds + } continue } return flow.BlockEvents{}, fmt.Errorf("failed to get events from access node: %w", err) } break }
🧹 Nitpick comments (2)
services/ingestion/block_tracking_subscriber.go (2)
110-118
: Consider using Error logging instead of Fatal for verifier failures.Using
Fatal
logging will terminate the entire application if the verifier encounters an error. This might be too severe for a verification component that's optional and experimental.go func() { r.verifier.SetStartHeight(r.height) if err := r.verifier.Run(ctx); err != nil { - r.logger.Fatal().Err(err).Msg("failure running sealing verifier") + r.logger.Error().Err(err).Msg("failure running sealing verifier") return } }()
235-240
: Consider making the block notification delay configurable.The hardcoded value of 15 blocks for the notification delay might need adjustment based on network conditions or AN sync performance.
+const blockNotificationDelay = 15 + // ... -if len(blockHeadersQueue) > 15 { +if len(blockHeadersQueue) > blockNotificationDelay { earliestBlockHeader := blockHeadersQueue[0] r.keyLock.NotifyBlock(earliestBlockHeader)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
services/ingestion/block_tracking_subscriber.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test
- GitHub Check: Lint
🔇 Additional comments (1)
services/ingestion/block_tracking_subscriber.go (1)
28-45
: Well-documented experimental feature with appropriate warnings.The extensive documentation clearly outlines the limitations and risks of using unsealed finalized blocks. The warning about manual tracking requirements and Proof of Concept usage is appropriately prominent.
Back-port changes from `main` to soft-finality
Sync Soft Finality with changes from `main` branch
…lized-block-results
…lized-block-results
…lized-block-results
…lized-block-results
…finality [Back-port] Bump flow-go version to `v0.43.0`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
go.mod (1)
3-3
: Invalid go directive: use major.minor (no patch).The
go
version in go.mod must be like1.25
, not1.25.0
. This breaks tooling.Apply:
-go 1.25.0 +go 1.25tests/go.mod (1)
3-3
: Invalid go directive in tests module.Same as root: use
1.25
, not1.25.0
.Apply:
-go 1.25.0 +go 1.25
🧹 Nitpick comments (4)
tests/go.mod (4)
14-14
: Version drift: zerolog differs between root (v1.33.0) and tests (v1.34.0).Not harmful, but avoid dual versions unless needed. Consider aligning to one version across modules.
Apply (either direction):
- github.com/rs/zerolog v1.34.0 + github.com/rs/zerolog v1.33.0
181-184
: Prometheus stack minor bumps only in tests: consider aligning with root.To reduce duplicate versions and metric wiring surprises, align
client_golang
,common
, andprocfs
with the root module.
233-233
: Duplicate YAML modules in graph.Both
go.yaml.in/yaml/v2
(v2.4.2) andgopkg.in/yaml.v2
(v2.4.0 at Line 252) appear. Prefer a single path if possible; if transitive, consider bumping deps to converge, thengo mod tidy
.
249-250
: gRPC/protobuf patch drift vs root.Tests use
grpc v1.75.1
andprotobuf v1.36.8
; root hasv1.75.0
andv1.36.7
. Harmless, but aligning reduces duplicate loads and avoids subtle ABI issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
go.sum
is excluded by!**/*.sum
tests/go.sum
is excluded by!**/*.sum
📒 Files selected for processing (2)
go.mod
(2 hunks)tests/go.mod
(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint
- GitHub Check: Test
🔇 Additional comments (4)
go.mod (2)
14-14
: flow-go bump to v0.43.0: confirm API compatibility.v0.43.0 likely changes Access API/proto shapes. Please confirm RPCBlockHeaderSubscriber/CrossSporkClient compile and run against this version in CI, and that protobuf-gen outputs were refreshed.
148-156
: Contract/template/protobuf version sync looks good; verify event types.Upgrading core-contracts, templates, NFT, and flow protobuf to newer versions is consistent with finalized-header indexing. Please verify emitted EVM event type identifiers and payload fields used by the indexer didn’t change, and run
go mod tidy -compat=1.25
to lock a coherent graph.tests/go.mod (2)
10-10
: Emulator pseudo‑version: does it expose finalized‑header subscription?You noted E2E is blocked because the Emulator lacks this endpoint. This update suggests a very recent pseudo‑version; please confirm whether it now implements finalized‑header subscription. If not, gate/skip tests that depend on it to keep CI green.
12-12
: Align flow-go version with root module (OK).Tests now use
v0.43.0
, matching the root. Good for API parity.Run
go mod tidy -compat=1.25
in both modules to ensure the graph is minimal and consistent.
…lized-block-results
…lized-block-results
…lized-block-results
Closes: #727
Description
For contributor use:
master
branchFiles changed
in the Github PR explorerSummary by CodeRabbit