Implement Persistent Storage for PeerStore

### Description

Currently, py-libp2p only provides an in-memory peerstore implementation that loses all peer data (addresses, keys, metadata, protocols) when the process restarts. This limits the resilience and performance of py-libp2p nodes, especially for long-running applications.

This feature request proposes implementing persistent peer storage similar to the `pstoreds` (datastore-backed peerstore) implementation in go-libp2p, which allows peer information to survive process restarts and provides better performance through caching and optimized storage strategies.

Ref: #134 

### Motivation

Currently, `py-libp2p` utilizes an in-memory peerstore, which means all peer-related data (addresses, public keys, private keys, and metadata) is lost when the `py-libp2p` process restarts. This limitation prevents `py-libp2p` nodes from retaining knowledge of previously connected peers, their addresses, and other crucial information across sessions.

Implementing persistent peer storage, similar to the `pstoreds` (datastore-backed peerstore) in `go-libp2p`, is essential for:

1. **Improved Node Resilience**: Nodes can restart without losing their peer graph, allowing for faster reconnection and network bootstrapping.
2. **Enhanced Performance**: Reduces the need for repeated peer discovery and address resolution, as known peer information is readily available.
3. **Feature Parity**: Brings `py-libp2p` closer to the capabilities of other `libp2p` implementations, particularly `go-libp2p`.
4. **Support for Long-Running Nodes**: Critical for applications where `py-libp2p` nodes are expected to operate continuously and maintain state.
5. **Better User Experience**: Users don't need to rebuild their peer connections from scratch after restarts.

### Requirements

The persistent peer storage feature should meet the following requirements:

1. **Datastore Agnostic Interface**: Introduce an abstract interface for datastore operations, allowing different storage backends to be plugged in (e.g., SQLite, LevelDB, RocksDB, custom file-based storage).

2. **Modular Persistent Components**: Implement persistent versions of `AddrBook`, `KeyBook`, `MetadataBook`, and `ProtoBook` that utilize the datastore interface for storage.

3. **Persistent `PeerStore` Implementation**: Create a new `PersistentPeerStore` class that orchestrates the persistent components while maintaining the same `IPeerStore` interface.

4. **Functional Parity with Go `pstoreds`**: The persistent `PeerStore` should offer similar functionalities and data storage capabilities as the `go-libp2p` `pstoreds` implementation, including:
   - Storing and retrieving `PeerID`s, `Multiaddr`s, `PublicKey`s, `PrivateKey`s
   - Storing and retrieving arbitrary metadata associated with peers
   - Managing address expiration and garbage collection
   - Supporting signed peer records (Envelope) persistence
   - Protocol support tracking

5. **Caching Layer**: Implement an in-memory cache (similar to Go's ARC cache) to reduce disk I/O and improve performance.

6. **Garbage Collection**: Implement automatic cleanup of expired peer data with configurable intervals and strategies.

7. **Batched Operations**: Support batched write operations for improved performance.

8. **Backward Compatibility**: Maintain compatibility with existing in-memory `PeerStore` implementation.

9. **Configuration Options**: Allow users to configure:
   - Datastore backend selection
   - Cache size
   - Garbage collection intervals
   - Maximum number of records
   - Storage path

10. **Serialization**: Implement efficient serialization for Python objects (PeerData, Multiaddr, keys, etc.).

11. **Integration**: Seamlessly integrate with existing `new_host()` and `new_swarm()` functions.

### Open questions

1. **Datastore Backend Priority**: What specific Python datastore library should be prioritized for the initial implementation? Options include:
   - `sqlite3` (built-in, simple, widely available)
   - `plyvel` (LevelDB compatibility with go-libp2p)
   - `lmdb` (high performance, memory-mapped)
   - Custom file-based storage

2. **Integration Strategy**: How should the `PersistentPeerStore` be integrated into the existing `py-libp2p` initialization process? Should it be:
   - A new parameter in `new_host()` function?
   - A separate factory function?
   - A configuration option in existing functions?

3. **Migration Path**: Should we provide utilities to migrate from in-memory peerstore to persistent storage, or is this a future consideration?

4. **Default Behavior**: Should persistent storage be the default, or should it remain opt-in to maintain backward compatibility?


### Are you planning to do it yourself in a pull request ?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Persistent Storage for PeerStore #945

Description

Motivation

Requirements

Open questions

Are you planning to do it yourself in a pull request ?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Persistent Storage for PeerStore #945

Description

Description

Motivation

Requirements

Open questions

Are you planning to do it yourself in a pull request ?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions