Skip to content

Complete const ledger state refactor#5141

Open
SirTyson wants to merge 6 commits intostellar:masterfrom
SirTyson:complete-const-ledger-state-refactor
Open

Complete const ledger state refactor#5141
SirTyson wants to merge 6 commits intostellar:masterfrom
SirTyson:complete-const-ledger-state-refactor

Conversation

@SirTyson
Copy link
Copy Markdown
Contributor

Description

Resolves #5073

This is the 2nd PR in the ledger state factor. Specifically, this PR covers steps 2-4.

The change is a bit larger than I would have liked, but it was difficult to break this down into smaller chunks that were still safe for master. While the diff is large, I've tried to make the commits easy to follow along. The commits are as follows:

  • 3476e384c Remove LedgerHeader from BucketListSnapshotData

This reduces the duplication of LedgerHeader by removing it from the immutable BucketListSnapshotData struct. The header is moved to a centralized location in BucketSnapshotManager to prepare for a future commit, which will centralize the header to just CompleteConstLedgerState

  • 9bec0791f Consolidated state access to LedgerStateSnapshot

This introduces LedgerStateSnapshot as the unified, searchable ledger state snapshot struct. This is a lightweight wrapper over a centralized CompleteConstLedgerState, which provides BucketList search functionality via local file streams. BucketSnapshotManager has not yet been removed yet, but basically is just a wrapper around the LedgerManager to serve LedgerStateSnapshot.

  • 2b1957028 Make LedgerStateSnapshot threadsafe

Implements copy semantics for LedgerStateSnapshot and makes the default interface a value type. This removes footguns where two threads would have a pointer to the same snapshot, resulting in race conditions.

  • b77fb68fe Remove BucketSnapshotManager and manage snapshot from LedgerManager

BucketSnapshotManager is removed entirely, and the LedgerManager maintains the canonical lcl ledger state snapshot, as well as provides getters for that snapshot. This hardens the distinction between the "apply state" snapshot (used only during ledger apply) and the ledger snapshot published and maintained by the main thread. The apply state, and all apply time functions, must use the 'apply state" snapshot, since this may be ahead of the main thread's snapshot. The main thread snapshot is what is copied by other non-apply threads, such as the query server and overlay thread.

  • d143cc7f3 Fix minor race condition in invariant check

This has various small cleanups. It fixes a few dangling reference issues (that I introduced by removing a couple asserts) and adds a check back to a unit test I forgot I removed during development. Finally, this launches the snapshot invariant thread from the apply thread, not the main thread. This ensures that we always have consistent state between the in-memory state and the ledger state snapshot used for the invariant.

There are still some outstanding cleanups. For example, I still want to get rid of the weird mutable header copy in BucketSnapshotState (and maybe get rid of that class entirely). I also want to move historical snapshots outside of CompleteConstLedgerState and have the query server manage those directly, since it's the only subsystem that cares about it.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Copilot AI review requested due to automatic review settings February 17, 2026 02:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the ledger state refactoring by removing BucketSnapshotManager and centralizing state management in LedgerManager. The refactor introduces LedgerStateSnapshot as a unified, thread-safe snapshot type with copy semantics, preventing accidental sharing across threads.

Changes:

  • Removed BucketSnapshotManager and moved snapshot management to LedgerManager
  • Introduced LedgerStateSnapshot (copyable value type) and ApplyLedgerStateSnapshot (strong typedef for apply-time snapshots)
  • Implemented copy semantics with fresh file stream caches per copy for thread safety
  • Fixed snapshot invariant race condition by launching it from apply thread instead of main thread

Reviewed changes

Copilot reviewed 62 out of 62 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/ledger/LedgerStateSnapshot.h/cpp New unified snapshot types with copy semantics
src/ledger/LedgerManagerImpl.h/cpp Centralized snapshot management with thread-safe access
src/bucket/BucketSnapshotManager.h/cpp Removed entirely
src/bucket/BucketListSnapshot.h/cpp Added copy constructors for thread-safe copying
src/transactions/ParallelApplyUtils.h/cpp Updated to use ApplyLedgerStateSnapshot
src/invariant/* Updated invariant interfaces to use new snapshot types
src/main/QueryServer.h/cpp Updated to use LedgerManager instead of BucketSnapshotManager
src/overlay/OverlayManagerImpl.h/cpp Updated to use LedgerStateSnapshot value type
Test files Updated to use new snapshot APIs

@SirTyson SirTyson force-pushed the complete-const-ledger-state-refactor branch 4 times, most recently from e38ed3f to 99bf7d0 Compare February 17, 2026 20:59
@SirTyson
Copy link
Copy Markdown
Contributor Author

I think the rest of @dmkozh comments are all regarding the LCL state getters, so I'll answer them all here. Previously, the LedgerManager maintained a main-thread only ledger snapshot, and had a collection of main-thread only getters into that snapshot. This was part of the fragmentation issue, where we had multiple different snapshots with multiple different interfaces depending on which subsystem was accessing the snapshot.

I've done a first path of unifying these by having the main thread getters all call into the LedgerStateSnapshot object. Given that we have a thread-safety model surrounding LedgerStateSnapshot, I've dropped the main thread assert from the getters and use locks instead (with the exception of the interfaces that return references. I've kept the assert and added a lock for completeness sake, which is redundant because only the main thread can update the snapshot).

Ideally, I'd like to get rid of these functions entirely. The unified interface involves callers getting/maintaining a snapshot and calling the snapshot geters. However, this is is a lot of LOC in a PR that already is pretty big, so I'm saving it for a future refactor. This is why the interface looks kinda weird and inconsistent. It was inconsistent before too, but now at least everything calls into the same snapshot interface internally.

Wrt to the transition period, we have a few options on what to do with the old main-thread only LedgerManager getters:

  1. Just keep the main thread assert with no lock. We'll have to do something about the thread safety static-analysis, but this is safe since these values can only be changed by the main thread.
  2. Remove the main thread assert and add locks to make these functions thread safe. (this is what I've already done). In my mind this keeps the outdated interface the "closest" semantically to our new snapshot interface. This may have a performance hit though. Additionally, if we want to get rid of these functions eventually, it may not be a good idea to make them more available to other threads, instead of less available.

I went with option 2 initially because there was nothing preventing these getters from not being thread safe in the model, so I just added it for completeness sake. This may not be worthwhile if we end up getting rid of them soon anyway though

@dmkozh
Copy link
Copy Markdown
Contributor

dmkozh commented Feb 24, 2026

Ideally, I'd like to get rid of these functions entirely. The unified interface involves callers getting/maintaining a snapshot and calling the snapshot geters. However, this is is a lot of LOC in a PR that already is pretty big, so I'm saving it for a future refactor.

Alright, I think it's ok to address this in the followup, as long as we don't keep this around for too long.

I went with option 2 initially because there was nothing preventing these getters from not being thread safe in the model, so I just added it for completeness sake.

Yeah, this seems fine to me for the transition period. I was mostly concerned about the references, but since we're hopefully not going to keep these around for a while there isn't much risk that these get misused due to incorrect thread safety assumptions.

marta-lokhova
marta-lokhova previously approved these changes Mar 4, 2026
Copy link
Copy Markdown
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lots of good architectural changes. I agree follow-ups make sense. We should probably do some randomized testing (maybe with jitter as well), given that it's a massive change that is pretty high-risk.

@SirTyson SirTyson force-pushed the complete-const-ledger-state-refactor branch 2 times, most recently from ead2bc5 to 21e7e05 Compare March 19, 2026 17:48
@SirTyson SirTyson force-pushed the complete-const-ledger-state-refactor branch from 21e7e05 to 0cc0925 Compare April 6, 2026 20:55
@SirTyson SirTyson force-pushed the complete-const-ledger-state-refactor branch from 0cc0925 to 149364a Compare April 7, 2026 17:26
@SirTyson SirTyson enabled auto-merge April 7, 2026 21:34
@SirTyson SirTyson added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 7, 2026
@SirTyson SirTyson added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforce copy semantics for BucketList snapshots

4 participants