-
Notifications
You must be signed in to change notification settings - Fork 190
Description
Description
Currently, py-libp2p only provides an in-memory peerstore implementation that loses all peer data (addresses, keys, metadata, protocols) when the process restarts. This limits the resilience and performance of py-libp2p nodes, especially for long-running applications.
This feature request proposes implementing persistent peer storage similar to the pstoreds (datastore-backed peerstore) implementation in go-libp2p, which allows peer information to survive process restarts and provides better performance through caching and optimized storage strategies.
Ref: #134
Motivation
Currently, py-libp2p utilizes an in-memory peerstore, which means all peer-related data (addresses, public keys, private keys, and metadata) is lost when the py-libp2p process restarts. This limitation prevents py-libp2p nodes from retaining knowledge of previously connected peers, their addresses, and other crucial information across sessions.
Implementing persistent peer storage, similar to the pstoreds (datastore-backed peerstore) in go-libp2p, is essential for:
- Improved Node Resilience: Nodes can restart without losing their peer graph, allowing for faster reconnection and network bootstrapping.
- Enhanced Performance: Reduces the need for repeated peer discovery and address resolution, as known peer information is readily available.
- Feature Parity: Brings
py-libp2pcloser to the capabilities of otherlibp2pimplementations, particularlygo-libp2p. - Support for Long-Running Nodes: Critical for applications where
py-libp2pnodes are expected to operate continuously and maintain state. - Better User Experience: Users don't need to rebuild their peer connections from scratch after restarts.
Requirements
The persistent peer storage feature should meet the following requirements:
-
Datastore Agnostic Interface: Introduce an abstract interface for datastore operations, allowing different storage backends to be plugged in (e.g., SQLite, LevelDB, RocksDB, custom file-based storage).
-
Modular Persistent Components: Implement persistent versions of
AddrBook,KeyBook,MetadataBook, andProtoBookthat utilize the datastore interface for storage. -
Persistent
PeerStoreImplementation: Create a newPersistentPeerStoreclass that orchestrates the persistent components while maintaining the sameIPeerStoreinterface. -
Functional Parity with Go
pstoreds: The persistentPeerStoreshould offer similar functionalities and data storage capabilities as thego-libp2ppstoredsimplementation, including:- Storing and retrieving
PeerIDs,Multiaddrs,PublicKeys,PrivateKeys - Storing and retrieving arbitrary metadata associated with peers
- Managing address expiration and garbage collection
- Supporting signed peer records (Envelope) persistence
- Protocol support tracking
- Storing and retrieving
-
Caching Layer: Implement an in-memory cache (similar to Go's ARC cache) to reduce disk I/O and improve performance.
-
Garbage Collection: Implement automatic cleanup of expired peer data with configurable intervals and strategies.
-
Batched Operations: Support batched write operations for improved performance.
-
Backward Compatibility: Maintain compatibility with existing in-memory
PeerStoreimplementation. -
Configuration Options: Allow users to configure:
- Datastore backend selection
- Cache size
- Garbage collection intervals
- Maximum number of records
- Storage path
-
Serialization: Implement efficient serialization for Python objects (PeerData, Multiaddr, keys, etc.).
-
Integration: Seamlessly integrate with existing
new_host()andnew_swarm()functions.
Open questions
-
Datastore Backend Priority: What specific Python datastore library should be prioritized for the initial implementation? Options include:
sqlite3(built-in, simple, widely available)plyvel(LevelDB compatibility with go-libp2p)lmdb(high performance, memory-mapped)- Custom file-based storage
-
Integration Strategy: How should the
PersistentPeerStorebe integrated into the existingpy-libp2pinitialization process? Should it be:- A new parameter in
new_host()function? - A separate factory function?
- A configuration option in existing functions?
- A new parameter in
-
Migration Path: Should we provide utilities to migrate from in-memory peerstore to persistent storage, or is this a future consideration?
-
Default Behavior: Should persistent storage be the default, or should it remain opt-in to maintain backward compatibility?
Are you planning to do it yourself in a pull request ?
Yes