Skip to content

Implement Persistent Storage for PeerStore #945

@Winter-Soren

Description

@Winter-Soren

Description

Currently, py-libp2p only provides an in-memory peerstore implementation that loses all peer data (addresses, keys, metadata, protocols) when the process restarts. This limits the resilience and performance of py-libp2p nodes, especially for long-running applications.

This feature request proposes implementing persistent peer storage similar to the pstoreds (datastore-backed peerstore) implementation in go-libp2p, which allows peer information to survive process restarts and provides better performance through caching and optimized storage strategies.

Ref: #134

Motivation

Currently, py-libp2p utilizes an in-memory peerstore, which means all peer-related data (addresses, public keys, private keys, and metadata) is lost when the py-libp2p process restarts. This limitation prevents py-libp2p nodes from retaining knowledge of previously connected peers, their addresses, and other crucial information across sessions.

Implementing persistent peer storage, similar to the pstoreds (datastore-backed peerstore) in go-libp2p, is essential for:

  1. Improved Node Resilience: Nodes can restart without losing their peer graph, allowing for faster reconnection and network bootstrapping.
  2. Enhanced Performance: Reduces the need for repeated peer discovery and address resolution, as known peer information is readily available.
  3. Feature Parity: Brings py-libp2p closer to the capabilities of other libp2p implementations, particularly go-libp2p.
  4. Support for Long-Running Nodes: Critical for applications where py-libp2p nodes are expected to operate continuously and maintain state.
  5. Better User Experience: Users don't need to rebuild their peer connections from scratch after restarts.

Requirements

The persistent peer storage feature should meet the following requirements:

  1. Datastore Agnostic Interface: Introduce an abstract interface for datastore operations, allowing different storage backends to be plugged in (e.g., SQLite, LevelDB, RocksDB, custom file-based storage).

  2. Modular Persistent Components: Implement persistent versions of AddrBook, KeyBook, MetadataBook, and ProtoBook that utilize the datastore interface for storage.

  3. Persistent PeerStore Implementation: Create a new PersistentPeerStore class that orchestrates the persistent components while maintaining the same IPeerStore interface.

  4. Functional Parity with Go pstoreds: The persistent PeerStore should offer similar functionalities and data storage capabilities as the go-libp2p pstoreds implementation, including:

    • Storing and retrieving PeerIDs, Multiaddrs, PublicKeys, PrivateKeys
    • Storing and retrieving arbitrary metadata associated with peers
    • Managing address expiration and garbage collection
    • Supporting signed peer records (Envelope) persistence
    • Protocol support tracking
  5. Caching Layer: Implement an in-memory cache (similar to Go's ARC cache) to reduce disk I/O and improve performance.

  6. Garbage Collection: Implement automatic cleanup of expired peer data with configurable intervals and strategies.

  7. Batched Operations: Support batched write operations for improved performance.

  8. Backward Compatibility: Maintain compatibility with existing in-memory PeerStore implementation.

  9. Configuration Options: Allow users to configure:

    • Datastore backend selection
    • Cache size
    • Garbage collection intervals
    • Maximum number of records
    • Storage path
  10. Serialization: Implement efficient serialization for Python objects (PeerData, Multiaddr, keys, etc.).

  11. Integration: Seamlessly integrate with existing new_host() and new_swarm() functions.

Open questions

  1. Datastore Backend Priority: What specific Python datastore library should be prioritized for the initial implementation? Options include:

    • sqlite3 (built-in, simple, widely available)
    • plyvel (LevelDB compatibility with go-libp2p)
    • lmdb (high performance, memory-mapped)
    • Custom file-based storage
  2. Integration Strategy: How should the PersistentPeerStore be integrated into the existing py-libp2p initialization process? Should it be:

    • A new parameter in new_host() function?
    • A separate factory function?
    • A configuration option in existing functions?
  3. Migration Path: Should we provide utilities to migrate from in-memory peerstore to persistent storage, or is this a future consideration?

  4. Default Behavior: Should persistent storage be the default, or should it remain opt-in to maintain backward compatibility?

Are you planning to do it yourself in a pull request ?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions