Complete const ledger state refactor#5141
Conversation
There was a problem hiding this comment.
Pull request overview
This PR completes the ledger state refactoring by removing BucketSnapshotManager and centralizing state management in LedgerManager. The refactor introduces LedgerStateSnapshot as a unified, thread-safe snapshot type with copy semantics, preventing accidental sharing across threads.
Changes:
- Removed
BucketSnapshotManagerand moved snapshot management toLedgerManager - Introduced
LedgerStateSnapshot(copyable value type) andApplyLedgerStateSnapshot(strong typedef for apply-time snapshots) - Implemented copy semantics with fresh file stream caches per copy for thread safety
- Fixed snapshot invariant race condition by launching it from apply thread instead of main thread
Reviewed changes
Copilot reviewed 62 out of 62 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/ledger/LedgerStateSnapshot.h/cpp |
New unified snapshot types with copy semantics |
src/ledger/LedgerManagerImpl.h/cpp |
Centralized snapshot management with thread-safe access |
src/bucket/BucketSnapshotManager.h/cpp |
Removed entirely |
src/bucket/BucketListSnapshot.h/cpp |
Added copy constructors for thread-safe copying |
src/transactions/ParallelApplyUtils.h/cpp |
Updated to use ApplyLedgerStateSnapshot |
src/invariant/* |
Updated invariant interfaces to use new snapshot types |
src/main/QueryServer.h/cpp |
Updated to use LedgerManager instead of BucketSnapshotManager |
src/overlay/OverlayManagerImpl.h/cpp |
Updated to use LedgerStateSnapshot value type |
| Test files | Updated to use new snapshot APIs |
e38ed3f to
99bf7d0
Compare
|
I think the rest of @dmkozh comments are all regarding the LCL state getters, so I'll answer them all here. Previously, the LedgerManager maintained a main-thread only ledger snapshot, and had a collection of main-thread only getters into that snapshot. This was part of the fragmentation issue, where we had multiple different snapshots with multiple different interfaces depending on which subsystem was accessing the snapshot. I've done a first path of unifying these by having the main thread getters all call into the Ideally, I'd like to get rid of these functions entirely. The unified interface involves callers getting/maintaining a snapshot and calling the snapshot geters. However, this is is a lot of LOC in a PR that already is pretty big, so I'm saving it for a future refactor. This is why the interface looks kinda weird and inconsistent. It was inconsistent before too, but now at least everything calls into the same snapshot interface internally. Wrt to the transition period, we have a few options on what to do with the old main-thread only LedgerManager getters:
I went with option 2 initially because there was nothing preventing these getters from not being thread safe in the model, so I just added it for completeness sake. This may not be worthwhile if we end up getting rid of them soon anyway though |
Alright, I think it's ok to address this in the followup, as long as we don't keep this around for too long.
Yeah, this seems fine to me for the transition period. I was mostly concerned about the references, but since we're hopefully not going to keep these around for a while there isn't much risk that these get misused due to incorrect thread safety assumptions. |
marta-lokhova
left a comment
There was a problem hiding this comment.
LGTM, lots of good architectural changes. I agree follow-ups make sense. We should probably do some randomized testing (maybe with jitter as well), given that it's a massive change that is pretty high-risk.
ead2bc5 to
21e7e05
Compare
21e7e05 to
0cc0925
Compare
0cc0925 to
149364a
Compare
Description
Resolves #5073
This is the 2nd PR in the ledger state factor. Specifically, this PR covers steps 2-4.
The change is a bit larger than I would have liked, but it was difficult to break this down into smaller chunks that were still safe for master. While the diff is large, I've tried to make the commits easy to follow along. The commits are as follows:
3476e384cRemove LedgerHeader from BucketListSnapshotDataThis reduces the duplication of
LedgerHeaderby removing it from the immutableBucketListSnapshotDatastruct. The header is moved to a centralized location inBucketSnapshotManagerto prepare for a future commit, which will centralize the header to justCompleteConstLedgerState9bec0791fConsolidated state access to LedgerStateSnapshotThis introduces
LedgerStateSnapshotas the unified, searchable ledger state snapshot struct. This is a lightweight wrapper over a centralizedCompleteConstLedgerState, which provides BucketList search functionality via local file streams.BucketSnapshotManagerhas not yet been removed yet, but basically is just a wrapper around theLedgerManagerto serveLedgerStateSnapshot.2b1957028Make LedgerStateSnapshot threadsafeImplements copy semantics for
LedgerStateSnapshotand makes the default interface a value type. This removes footguns where two threads would have a pointer to the same snapshot, resulting in race conditions.b77fb68feRemove BucketSnapshotManager and manage snapshot from LedgerManagerBucketSnapshotManageris removed entirely, and theLedgerManagermaintains the canonical lcl ledger state snapshot, as well as provides getters for that snapshot. This hardens the distinction between the "apply state" snapshot (used only during ledger apply) and the ledger snapshot published and maintained by the main thread. The apply state, and all apply time functions, must use the 'apply state" snapshot, since this may be ahead of the main thread's snapshot. The main thread snapshot is what is copied by other non-apply threads, such as the query server and overlay thread.d143cc7f3Fix minor race condition in invariant checkThis has various small cleanups. It fixes a few dangling reference issues (that I introduced by removing a couple asserts) and adds a check back to a unit test I forgot I removed during development. Finally, this launches the snapshot invariant thread from the apply thread, not the main thread. This ensures that we always have consistent state between the in-memory state and the ledger state snapshot used for the invariant.
There are still some outstanding cleanups. For example, I still want to get rid of the weird mutable header copy in
BucketSnapshotState(and maybe get rid of that class entirely). I also want to move historical snapshots outside ofCompleteConstLedgerStateand have the query server manage those directly, since it's the only subsystem that cares about it.Checklist
clang-formatv8.0.0 (viamake formator the Visual Studio extension)