Skip to content

Commit e99111e

Browse files
authored
[replay] Refactored data store to a separate crate (#24436)
## Description This PR refactors data stores used by the replay tool into a separate crate so that it can be more easily used by other clients, in particular, the upcoming forking tool. It's mostly about shuffling files around, with dependency updates and some minor renamings. The exception here is that file cache directory is has been renamed from `~/.replay_data_store` to `~/.sui_data_store` While the forking tool likely will only need object retrieval, it seemed to make sense to refactor all existing stores as perhaps they could be of use to other clients (and forking tool can simply skip using the stores/queries it does not need) ## Test plan Tested manually that the replay tool still builds and replays transactions correctly (as does Sui CLI)
1 parent b154b6f commit e99111e

File tree

23 files changed

+354
-263
lines changed

23 files changed

+354
-263
lines changed

Cargo.lock

Lines changed: 22 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ members = [
110110
"crates/sui-cost",
111111
"crates/sui-data-ingestion",
112112
"crates/sui-data-ingestion-core",
113+
"crates/sui-data-store",
113114
"crates/sui-deepbook-indexer",
114115
"crates/sui-default-config",
115116
"crates/sui-display",
@@ -693,6 +694,7 @@ sui-core = { path = "crates/sui-core" }
693694
sui-cost = { path = "crates/sui-cost" }
694695
sui-data-ingestion = { path = "crates/sui-data-ingestion" }
695696
sui-data-ingestion-core = { path = "crates/sui-data-ingestion-core" }
697+
sui-data-store = { path = "crates/sui-data-store" }
696698
sui-default-config = { path = "crates/sui-default-config" }
697699
sui-display = { path = "crates/sui-display" }
698700
sui-e2e-tests = { path = "crates/sui-e2e-tests" }

crates/sui-data-store/Cargo.toml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
[package]
2+
name = "sui-data-store"
3+
version = "0.1.0"
4+
authors = ["Mysten Labs <[email protected]>"]
5+
license = "Apache-2.0"
6+
publish = false
7+
edition = "2024"
8+
9+
[dependencies]
10+
anyhow.workspace = true
11+
bcs.workspace = true
12+
chrono.workspace = true
13+
csv.workspace = true
14+
cynic.workspace = true
15+
fastcrypto.workspace = true
16+
lru.workspace = true
17+
reqwest = { workspace = true, features = ["json"] }
18+
serde.workspace = true
19+
sui-config.workspace = true
20+
sui-types.workspace = true
21+
tokio = { workspace = true, features = ["full"] }
22+
tracing.workspace = true
23+
24+
[build-dependencies]
25+
cynic-codegen.workspace = true

crates/sui-data-store/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# sui-data-store
2+
3+
Multi-tier caching data store for Sui blockchain data.
4+
5+
This crate provides a flexible data store abstraction for retrieving and caching
6+
Sui blockchain data (transactions, epochs, objects). The stores are loosely modeled
7+
after the GraphQL schema in `crates/sui-indexer-alt-graphql/schema.graphql`.
8+
9+
## Core Traits
10+
11+
- `TransactionStore` - Retrieve transaction data and effects by digest
12+
- `EpochStore` - Retrieve epoch information and protocol configuration
13+
- `ObjectStore` - Retrieve objects by their keys with flexible version queries
14+
15+
The read traits above have corresponding writer traits (`TransactionStoreWriter`,
16+
`EpochStoreWriter`, `ObjectStoreWriter`) for stores that support write-back caching.
17+
18+
## Store Implementations
19+
20+
| Store | Description | Read | Write |
21+
|-------|-------------|------|-------|
22+
| `DataStore` | Remote GraphQL-backed store (mainnet/testnet) | Yes | No |
23+
| `FileSystemStore` | Persistent local disk cache | Yes | Yes |
24+
| `InMemoryStore` | Unbounded in-memory cache | Yes | Yes |
25+
| `LruMemoryStore` | Bounded LRU cache | Yes | Yes |
26+
| `ReadThroughStore` | Composable two-tier caching pattern | Yes | Yes* |
27+
28+
\* `ReadThroughStore` delegates writes to its secondary (backing) store.
29+
30+
## Architecture
31+
32+
The typical 3-tier cache composition: Memory → FileSystem → GraphQL
33+
34+
```
35+
┌─────────────────────────────────────────────────────────────────┐
36+
│ Client Code │
37+
└─────────────────────────────────────────────────────────────────┘
38+
39+
40+
┌─────────────────────────────────────────────────────────────────┐
41+
│ ReadThroughStore<Memory, Inner> │
42+
│ Fast in-memory cache (LruMemoryStore or InMemoryStore) │
43+
└─────────────────────────────────────────────────────────────────┘
44+
│ cache miss
45+
46+
┌─────────────────────────────────────────────────────────────────┐
47+
│ ReadThroughStore<FileSystem, Remote> │
48+
│ Persistent disk cache (FileSystemStore) │
49+
└─────────────────────────────────────────────────────────────────┘
50+
│ cache miss
51+
52+
┌─────────────────────────────────────────────────────────────────┐
53+
│ DataStore (GraphQL) │
54+
│ Remote data source (mainnet/testnet) │
55+
└─────────────────────────────────────────────────────────────────┘
56+
```
57+
58+
## Composition Examples
59+
60+
Use `ReadThroughStore<Primary, Secondary>` to compose cache layers:
61+
62+
```rust
63+
use sui_data_store::{Node, stores::{DataStore, LruMemoryStore, ReadThroughStore, FileSystemStore}};
64+
65+
// Full 3-tier: Memory → FileSystem → GraphQL (typical production setup)
66+
let graphql = DataStore::new(Node::Mainnet);
67+
let disk = FileSystemStore::new(Node::Mainnet)?;
68+
let disk_with_remote = ReadThroughStore::new(disk, graphql);
69+
let memory = LruMemoryStore::new(Node::Mainnet);
70+
let store = ReadThroughStore::new(memory, disk_with_remote);
71+
72+
// 2-tier: Memory + FileSystem (e.g., CI testing with pre-populated disk cache)
73+
let disk = FileSystemStore::new(Node::Mainnet)?;
74+
let memory = LruMemoryStore::new(Node::Mainnet);
75+
let store = ReadThroughStore::new(memory, disk);
76+
```
77+
78+
## Version Queries
79+
80+
The `ObjectStore` trait supports three query modes via `VersionQuery`:
81+
82+
- `Version(v)` - Request object at exact version `v`
83+
- `RootVersion(v)` - Request object at version `<= v` (for dynamic field roots)
84+
- `AtCheckpoint(c)` - Request object as it existed at checkpoint `c`
85+
86+
## Network Configuration
87+
88+
Use the `Node` enum to configure which network to connect to:
89+
90+
```rust
91+
use sui_data_store::Node;
92+
93+
let mainnet = Node::Mainnet;
94+
let testnet = Node::Testnet;
95+
let custom = Node::Custom("https://my-rpc.example.com".to_string());
96+
```
File renamed without changes.

crates/sui-replay-2/src/data-stores/gql_queries.rs renamed to crates/sui-data-store/src/gql_queries.rs

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44
//! GQL Queries
55
//! Interface to the rpc for the gql schema defined in `crates\sui-indexer-alt-graphql/schema.graphql`.
66
//! Built in 3 modules: epoch_query, txn_query, object_query.
7-
//! No GQL type escapes this module. From here we return structures defined by the replay tool
7+
//! No GQL type escapes this module. From here we return structures defined in this crate
88
//! or bcs encoded data of runtime structures.
99
//!
1010
//! This module is private to the `DataStore` and packcaged in its own module for convenience.
1111
12-
use crate::{DataStore, replay_interface::EpochData};
12+
use crate::{EpochData, stores::DataStore};
1313
use anyhow::{Context, Error, anyhow};
1414
use cynic::QueryBuilder;
1515
use fastcrypto::encoding::{Base64 as CryptoBase64, Encoding};
@@ -204,7 +204,7 @@ pub(crate) mod object_query {
204204
use sui_types::object::Object;
205205

206206
use super::*;
207-
use crate::replay_interface;
207+
use crate::{ObjectKey as GqlObjectKey, VersionQuery};
208208

209209
#[derive(cynic::Scalar, Debug, Clone)]
210210
#[cynic(graphql_type = "SuiAddress")]
@@ -236,10 +236,7 @@ pub(crate) mod object_query {
236236
}
237237

238238
#[derive(cynic::QueryFragment)]
239-
#[cynic(
240-
graphql_type = "Object",
241-
schema_module = "crate::data_stores::gql_queries::schema"
242-
)]
239+
#[cynic(graphql_type = "Object", schema_module = "crate::gql_queries::schema")]
243240
pub(crate) struct ObjectFragment {
244241
#[allow(dead_code)]
245242
pub address: SuiAddress,
@@ -253,7 +250,7 @@ pub(crate) mod object_query {
253250
const MAX_KEYS_SIZE: usize = 30;
254251

255252
pub(crate) async fn query(
256-
keys: &[replay_interface::ObjectKey],
253+
keys: &[GqlObjectKey],
257254
data_store: &DataStore,
258255
) -> Result<Vec<Option<(Object, u64)>>, Error> {
259256
let mut keys = keys
@@ -306,20 +303,20 @@ pub(crate) mod object_query {
306303
Ok(objects)
307304
}
308305

309-
impl From<replay_interface::ObjectKey> for ObjectKey {
310-
fn from(key: replay_interface::ObjectKey) -> Self {
306+
impl From<GqlObjectKey> for ObjectKey {
307+
fn from(key: GqlObjectKey) -> Self {
311308
ObjectKey {
312309
address: SuiAddress(key.object_id.to_string()),
313310
version: match key.version_query {
314-
replay_interface::VersionQuery::Version(v) => Some(v),
311+
VersionQuery::Version(v) => Some(v),
315312
_ => None,
316313
},
317314
root_version: match key.version_query {
318-
replay_interface::VersionQuery::RootVersion(v) => Some(v),
315+
VersionQuery::RootVersion(v) => Some(v),
319316
_ => None,
320317
},
321318
at_checkpoint: match key.version_query {
322-
replay_interface::VersionQuery::AtCheckpoint(v) => Some(v),
319+
VersionQuery::AtCheckpoint(v) => Some(v),
323320
_ => None,
324321
},
325322
}

crates/sui-replay-2/src/replay_interface.rs renamed to crates/sui-data-store/src/lib.rs

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,39 @@
11
// Copyright (c) Mysten Labs, Inc.
22
// SPDX-License-Identifier: Apache-2.0
33

4-
//! Logical stores needed by the replay tool.
5-
//! Those stores are loosely modeled after the GQL schema in
6-
//! `crates/sui-indexer-alt-graphql/schema.graphql`.
7-
//! A `TransactionStore` is used to retrieve transaction data and effects by digest.
8-
//! An `EpochStore` is used to retrieve epoch information and protocol configuration.
9-
//! An `ObjectStore` is used to retrieve objects by their keys, with different query options.
4+
//! Multi-tier caching data store for Sui blockchain data.
105
//!
11-
//! Data is usually retrieved by getting BCS-encoded data rather than navigating the
12-
//! GQL schema.
13-
//! Essentially the code uses the schema to retrieve the data, deserializes it into runtime
14-
//! structures, and then operates on those.
6+
//! This crate provides a flexible data store abstraction for retrieving and caching
7+
//! Sui blockchain data (transactions, epochs, objects). The stores are loosely modeled
8+
//! after the GQL schema in `crates/sui-indexer-alt-graphql/schema.graphql`.
159
//!
16-
//! A `DataStore` with reasonable defaults is provided for convenience (`data_store.rs`).
17-
//! Other styles of data stores are also provided in `data_stores` for different use cases.
10+
//! ## Core Traits
11+
//!
12+
//! - [`TransactionStore`] - Retrieve transaction data and effects by digest
13+
//! - [`EpochStore`] - Retrieve epoch information and protocol configuration
14+
//! - [`ObjectStore`] - Retrieve objects by their keys with flexible version queries
15+
//!
16+
//! ## Store Implementations
17+
//!
18+
//! - [`stores::DataStore`] - Remote GraphQL-backed store (mainnet/testnet)
19+
//! - [`stores::FileSystemStore`] - Persistent local disk cache
20+
//! - [`stores::InMemoryStore`] - Unbounded in-memory cache
21+
//! - [`stores::LruMemoryStore`] - Bounded LRU cache
22+
//! - [`stores::ReadThroughStore`] - Composable two-tier caching pattern
23+
//!
24+
//! ## Composition
25+
//!
26+
//! Use `ReadThroughStore<Primary, Secondary>` to compose cache layers:
27+
//! - `ReadThroughStore<LruMemoryStore, DataStore>` - LRU + remote
28+
//! - `ReadThroughStore<InMemoryStore, FileSystemStore>` - Memory + disk
29+
//! (e.g., for testing in CI with pre-populated disk cache)
30+
31+
mod gql_queries;
32+
pub mod node;
33+
pub mod stores;
34+
35+
// Re-export commonly used types
36+
pub use node::Node;
1837

1938
use anyhow::{Error, Result};
2039
use std::io::Write;
@@ -27,7 +46,7 @@ use sui_types::{
2746
// Data store read traits
2847
// ============================================================================
2948

30-
/// Transaction data with effects and checkpoint required to replay a transaction.
49+
/// Transaction data with effects and checkpoint.
3150
#[derive(Clone, Debug)]
3251
pub struct TransactionInfo {
3352
pub data: TransactionData,
@@ -36,10 +55,9 @@ pub struct TransactionInfo {
3655
}
3756

3857
/// A `TransactionStore` has to be able to retrieve transaction data for a given digest.
39-
/// To replay a transaction the data provided to
40-
/// `sui_execution::executor::Executor::execute_transaction_to_effects` must be available.
41-
/// Some of that data is not provided by the user. It is naturally available at runtime on a
42-
/// live system and later saved in effects and in the context of a checkpoint.
58+
/// The data provided to `sui_execution::executor::Executor::execute_transaction_to_effects`
59+
/// must be available. Some of that data is not provided by the user. It is naturally available
60+
/// at runtime on a live system and later saved in effects and in the context of a checkpoint.
4361
pub trait TransactionStore {
4462
/// Given a transaction digest, return transaction info including data, effects,
4563
/// and the checkpoint that transaction was executed in.
@@ -50,7 +68,7 @@ pub trait TransactionStore {
5068
) -> Result<Option<TransactionInfo>, Error>;
5169
}
5270

53-
/// Epoch data required to reaplay a transaction.
71+
/// Epoch data.
5472
#[derive(Clone, Debug)]
5573
pub struct EpochData {
5674
pub epoch_id: u64,
@@ -95,7 +113,7 @@ pub enum VersionQuery {
95113
///
96114
/// This trait can execute a subset of what is allowed by
97115
/// `crates/sui-indexer-alt-graphql/schema.graphql::multiGetObjects`.
98-
/// That query likely allows more than what the replay tool needs, which is fairly limited in
116+
/// That query likely allows more than what most clients need, which is fairly limited in
99117
/// its usage.
100118
pub trait ObjectStore {
101119
/// Retrieve objects by their keys, with different query options.
@@ -116,7 +134,7 @@ pub trait ObjectStore {
116134
// we want to revisit in the future.
117135

118136
/// A trait to set up the data store.
119-
/// This is used to setup internal state of the data store before starting the replay.
137+
/// This is used to setup internal state of the data store before use.
120138
/// At the moment is exclusively used by the FileSystemStore to map network to chain id.
121139
pub trait SetupStore {
122140
/// Set up the data store.

0 commit comments

Comments
 (0)