Skip to content

Conversation

@Winter-Soren
Copy link
Contributor

What was wrong?

Currently, py-libp2p only provides an in-memory peerstore implementation that loses all peer data (addresses, keys, metadata, protocols) when the process restarts. This limits the resilience and performance of py-libp2p nodes, especially for long-running applications that need to maintain peer information across restarts.

Issue #945

How was it fixed?

Implemented a datastore-agnostic persistent peer storage system following the same architectural pattern as go-libp2p's pstoreds package. The solution provides:

Summary of approach

Core Components:

  1. Datastore Abstraction Layer (libp2p/peer/datastore/): Pluggable interface supporting multiple backends
  2. PersistentPeerStore (libp2p/peer/persistent_peerstore.py): Full async implementation with memory caching
  3. Factory Functions (libp2p/peer/persistent_peerstore_factory.py): Convenient creation methods

Supported Backends:

  1. SQLite: Simple file-based storage (default)
  2. LevelDB: High-performance key-value storage
  3. RocksDB: Advanced features with compression and bloom filters
  4. In-Memory: For testing and development
  5. Custom: User-defined implementations via IDatastore interface

To-Do

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

image

@seetadev
Copy link
Contributor

@Winter-Soren : Great effort. Reviewing the PR.

Re-ran the CI/CD pipeline today after updating the branch.

Wish if you could resolve the CI/CD issues.

@seetadev
Copy link
Contributor

@Winter-Soren : HI Soham. Thank you for resolving the CI/CD issues.

1 CI/CD test is failing.

We investigated it and documented it at #949 .

@yashksaini-coder , @sumanjeet0012 and @acul71 are fixing it. We will soon have all the CI/CD tests passing.

@Winter-Soren : Will re-run CI/CD pipeline once the issue is fixed today or in a couple of days.

@yashksaini-coder
Copy link
Contributor

yashksaini-coder commented Sep 23, 2025

Can I get access for this to check and work, @seetadev @Winter-Soren

@Winter-Soren
Copy link
Contributor Author

Can I get access for this to check and work, @seetadev @Winter-Soren

Hey @yashksaini-coder,
You can add my branch Winter-Soren:feat/945-persistent-storage-for-peerstore as a remote and contribute to it!
If you have any questions, feel free to ask, happy to help :))

@Winter-Soren Winter-Soren marked this pull request as ready for review September 25, 2025 15:37
@seetadev
Copy link
Contributor

seetadev commented Oct 6, 2025

@Winter-Soren : Updated the branch and re-ran the CI/CD pipeline. There seems to be one test issue. Looking forward to collaborating with you and resolving it soon.

@seetadev
Copy link
Contributor

seetadev commented Oct 6, 2025

@Winter-Soren : The CI/CD issue got resolved. Wonderful.

@pacrob: Hi Paul. Wish to have your feedback on this PR.

On the same note, wish to share that I will ask @Winter-Soren and @yashksaini-coder to include a comprehensive test suite and also add a newsfragment.

@pacrob
Copy link
Member

pacrob commented Oct 12, 2025

I added some questions. In this case, I do think example usage would be important. Tests and a newsfragment are also needed here.

@seetadev
Copy link
Contributor

Thanks @pacrob for the detailed review and helpful comments 🙏

@Winter-Soren — please take a look at these points and update the PR soon. The switch to Trio (instead of asyncio) is important for consistency with the rest of the lib, so let’s refactor accordingly. Also, make sure the PeerData fields (last_identified, ttl, and latmap) are properly handled so we don’t lose data integrity.

Adding a couple of example usages, along with tests and a newsfragment, will round this out nicely and make it easier for others to build on your work.

Once you’ve made the updates, tag me and @pacrob for a quick re-review — this is very close to merge-ready! 🚀

@acul71
Copy link
Contributor

acul71 commented Oct 27, 2025

Hi @Winter-Soren
I have a doubt that it can happen data corruption here:

def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # Schedule persistence for the next async operation
        # This is a workaround for the sync/async split
        # The data will be persisted when the next async method is called
        pass

What do you think?

Full review here:

PR 946 Review: Persistent Storage for PeerStore

Overview

PR: #946 - feat/945-persistent-storage-for-peerstore
Author: Winter-Soren
Status: Open
Issue: #945 - Implement Persistent Storage for PeerStore

Summary

This PR implements persistent storage for py-libp2p's peerstore, addressing the critical limitation where peer data is lost on process restart. The implementation follows go-libp2p's pstoreds pattern with a datastore-agnostic interface supporting multiple backends.

Key Components

1. Datastore Abstraction (libp2p/peer/datastore/)

  • IDatastore Interface: Abstract base class for datastore operations
  • Backend Support: SQLite, LevelDB, RocksDB, Memory, custom implementations
  • Batching Support: IBatchingDatastore for atomic operations

2. PersistentPeerStore (libp2p/peer/persistent_peerstore.py)

  • Full IPeerStore Compatibility: Same interface as in-memory peerstore
  • Hybrid Architecture: In-memory caching + persistent storage
  • Complete Data Persistence: All PeerData fields including last_identified, ttl, latmap
  • Automatic Cleanup: Periodic cleanup of expired records

3. Factory Functions (libp2p/peer/persistent_peerstore_factory.py)

  • Convenient Creation: Easy factory functions for different backends
  • Backend Selection: Simple string-based backend switching

4. Examples (examples/persistent_peerstore/)

  • Usage Demonstrations: Complete examples for all features
  • Persistence Testing: Data survival across restarts
  • Backend Comparisons: Examples for each supported backend

Comparison with Go-libp2p Implementation

Similarities ✅

  • Architecture: Both use datastore abstraction with pluggable backends
  • Caching: Both implement in-memory caching (Go uses ARC cache, Python uses dict)
  • Data Structure: Both persist addresses, keys, metadata, protocols
  • Cleanup: Both have automatic garbage collection for expired records
  • Interface: Both maintain compatibility with existing peerstore interfaces

Key Differences ⚠️

Aspect Go-libp2p pstoreds Python Implementation
Sync/Async Fully synchronous Hybrid sync/async (potential data loss)
Persistence Immediate persistence Deferred persistence via async calls
Data Format Protobuf serialization Pickle serialization
Cache Strategy ARC cache with configurable size Simple dict with no eviction
Error Handling Comprehensive error propagation Basic error handling

Go Implementation Details

// Go uses immediate persistence
func (r *addrsRecord) flush(write ds.Write) error {
    // Immediate write to datastore
    return write.Put(context.TODO(), key, data)
}

// Python uses deferred persistence
def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
    # Currently a no-op - data loss risk!
    pass

Technical Analysis

Strengths ✅

  1. Architectural Soundness: Follows go-libp2p patterns with clean datastore abstraction
  2. Complete Implementation: All PeerData fields persisted including last_identified, ttl, latmap
  3. Multiple Backends: SQLite, LevelDB, RocksDB, Memory support
  4. Good Documentation: Comprehensive examples and docstrings
  5. Trio Integration: Fixed asyncio → trio migration per reviewer feedback

Critical Issues ⚠️

  1. Sync Persistence Gap

    def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # This is currently a no-op - needs implementation
        pass

    The Core Problem: The persistent peerstore API is synchronous (methods like add_addrs() return immediately), but persistence is asynchronous (data gets saved to disk later). This creates a dangerous mismatch.

    What Happens Now:

    • User calls peerstore.add_addrs(peer_id, [addr], 3600)
    • Data is immediately stored in memory cache ✅
    • _persist_peer_data_sync() is called but does nothing ❌
    • If process crashes before async persistence occurs → data is lost 💥

    The Risk Window: There's a gap between when data is "added" (in memory) and when it's actually persisted (to disk). During this window, the data exists only in volatile memory and can be lost on any crash or restart.

    Go Comparison: Go-libp2p persists data immediately on every operation - no gap, no data loss risk.

    Impact: This breaks the fundamental promise of "persistent" storage and makes the peerstore unreliable for production use.

  2. Data Consistency

    • Sync methods modify in-memory cache only
    • Persistence happens later via async calls
    • Creates potential inconsistency between memory and storage

Review Comments Analysis

pacrob's Concerns (Status)

  1. Trio ConsistencyRESOLVED

    • Fixed: asyncio.Lock()trio.Lock() in all datastore implementations
    • Fixed: Examples use trio.run() instead of asyncio.run()
  2. trio.from_thread Usage ⚠️ PARTIALLY ADDRESSED

    • Reduced usage but sync/async split remains problematic
    • Sync methods operate on in-memory cache only
  3. PeerData Fields PersistenceRESOLVED

    • Fixed: Now persisting last_identified, ttl, latmap fields
    • Added _get_additional_key() for serialization
  4. Examples and DocumentationRESOLVED

    • Added comprehensive example and newsfragment

Recommendations

Critical Fixes Required

  1. Implement Sync Persistence

    def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # Current: no-op (data loss risk!)
        # Needed: immediate persistence or background task
  2. Add Integration Tests

    • Test persistence across restarts
    • Test with real database backends
    • Test error handling scenarios

Suggested Improvements

  • Add configuration options for cache size, cleanup intervals
  • Implement connection pooling for database backends
  • Add migration utilities from in-memory peerstore

Conclusion

Recommendation: APPROVE with critical fixes

This is a well-architected implementation that addresses a critical need in py-libp2p. The code follows go-libp2p patterns and provides comprehensive persistent storage.

Required Before Merge

  1. Fix _persist_peer_data_sync() - currently a no-op with data loss risk
  2. Add integration tests for persistence across restarts
  3. Add comprehensive error handling for persistence failures

Files Changed

  • Added: 2,609 lines across multiple files
  • Key Files: persistent_peerstore.py (724 lines), datastore implementations, examples, newsfragment

The implementation significantly improves py-libp2p's resilience and brings it closer to feature parity with other libp2p implementations.

@Winter-Soren
Copy link
Contributor Author

Hi @Winter-Soren I have a doubt that it can happen data corruption here:

def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # Schedule persistence for the next async operation
        # This is a workaround for the sync/async split
        # The data will be persisted when the next async method is called
        pass

What do you think?

Full review here:

PR 946 Review: Persistent Storage for PeerStore

Overview

PR: #946 - feat/945-persistent-storage-for-peerstore Author: Winter-Soren Status: Open Issue: #945 - Implement Persistent Storage for PeerStore

Summary

This PR implements persistent storage for py-libp2p's peerstore, addressing the critical limitation where peer data is lost on process restart. The implementation follows go-libp2p's pstoreds pattern with a datastore-agnostic interface supporting multiple backends.

Key Components

1. Datastore Abstraction (libp2p/peer/datastore/)

  • IDatastore Interface: Abstract base class for datastore operations
  • Backend Support: SQLite, LevelDB, RocksDB, Memory, custom implementations
  • Batching Support: IBatchingDatastore for atomic operations

2. PersistentPeerStore (libp2p/peer/persistent_peerstore.py)

  • Full IPeerStore Compatibility: Same interface as in-memory peerstore
  • Hybrid Architecture: In-memory caching + persistent storage
  • Complete Data Persistence: All PeerData fields including last_identified, ttl, latmap
  • Automatic Cleanup: Periodic cleanup of expired records

3. Factory Functions (libp2p/peer/persistent_peerstore_factory.py)

  • Convenient Creation: Easy factory functions for different backends
  • Backend Selection: Simple string-based backend switching

4. Examples (examples/persistent_peerstore/)

  • Usage Demonstrations: Complete examples for all features
  • Persistence Testing: Data survival across restarts
  • Backend Comparisons: Examples for each supported backend

Comparison with Go-libp2p Implementation

Similarities ✅

  • Architecture: Both use datastore abstraction with pluggable backends
  • Caching: Both implement in-memory caching (Go uses ARC cache, Python uses dict)
  • Data Structure: Both persist addresses, keys, metadata, protocols
  • Cleanup: Both have automatic garbage collection for expired records
  • Interface: Both maintain compatibility with existing peerstore interfaces

Key Differences ⚠️

Aspect Go-libp2p pstoreds Python Implementation
Sync/Async Fully synchronous Hybrid sync/async (potential data loss)
Persistence Immediate persistence Deferred persistence via async calls
Data Format Protobuf serialization Pickle serialization
Cache Strategy ARC cache with configurable size Simple dict with no eviction
Error Handling Comprehensive error propagation Basic error handling

Go Implementation Details

// Go uses immediate persistence
func (r *addrsRecord) flush(write ds.Write) error {
    // Immediate write to datastore
    return write.Put(context.TODO(), key, data)
}

// Python uses deferred persistence
def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
    # Currently a no-op - data loss risk!
    pass

Technical Analysis

Strengths ✅

  1. Architectural Soundness: Follows go-libp2p patterns with clean datastore abstraction
  2. Complete Implementation: All PeerData fields persisted including last_identified, ttl, latmap
  3. Multiple Backends: SQLite, LevelDB, RocksDB, Memory support
  4. Good Documentation: Comprehensive examples and docstrings
  5. Trio Integration: Fixed asyncio → trio migration per reviewer feedback

Critical Issues ⚠️

  1. Sync Persistence Gap

    def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # This is currently a no-op - needs implementation
        pass

    The Core Problem: The persistent peerstore API is synchronous (methods like add_addrs() return immediately), but persistence is asynchronous (data gets saved to disk later). This creates a dangerous mismatch.
    What Happens Now:

    • User calls peerstore.add_addrs(peer_id, [addr], 3600)
    • Data is immediately stored in memory cache ✅
    • _persist_peer_data_sync() is called but does nothing ❌
    • If process crashes before async persistence occurs → data is lost 💥

    The Risk Window: There's a gap between when data is "added" (in memory) and when it's actually persisted (to disk). During this window, the data exists only in volatile memory and can be lost on any crash or restart.
    Go Comparison: Go-libp2p persists data immediately on every operation - no gap, no data loss risk.
    Impact: This breaks the fundamental promise of "persistent" storage and makes the peerstore unreliable for production use.

  2. Data Consistency

    • Sync methods modify in-memory cache only
    • Persistence happens later via async calls
    • Creates potential inconsistency between memory and storage

Review Comments Analysis

pacrob's Concerns (Status)

  1. Trio ConsistencyRESOLVED

    • Fixed: asyncio.Lock()trio.Lock() in all datastore implementations
    • Fixed: Examples use trio.run() instead of asyncio.run()
  2. trio.from_thread Usage ⚠️ PARTIALLY ADDRESSED

    • Reduced usage but sync/async split remains problematic
    • Sync methods operate on in-memory cache only
  3. PeerData Fields PersistenceRESOLVED

    • Fixed: Now persisting last_identified, ttl, latmap fields
    • Added _get_additional_key() for serialization
  4. Examples and DocumentationRESOLVED

    • Added comprehensive example and newsfragment

Recommendations

Critical Fixes Required

  1. Implement Sync Persistence

    def _persist_peer_data_sync(self, peer_id: ID, peer_data: PeerData) -> None:
        """Synchronously persist peer data to the datastore."""
        # Current: no-op (data loss risk!)
        # Needed: immediate persistence or background task
  2. Add Integration Tests

    • Test persistence across restarts
    • Test with real database backends
    • Test error handling scenarios

Suggested Improvements

  • Add configuration options for cache size, cleanup intervals
  • Implement connection pooling for database backends
  • Add migration utilities from in-memory peerstore

Conclusion

Recommendation: APPROVE with critical fixes

This is a well-architected implementation that addresses a critical need in py-libp2p. The code follows go-libp2p patterns and provides comprehensive persistent storage.

Required Before Merge

  1. Fix _persist_peer_data_sync() - currently a no-op with data loss risk
  2. Add integration tests for persistence across restarts
  3. Add comprehensive error handling for persistence failures

Files Changed

  • Added: 2,609 lines across multiple files
  • Key Files: persistent_peerstore.py (724 lines), datastore implementations, examples, newsfragment

The implementation significantly improves py-libp2p's resilience and brings it closer to feature parity with other libp2p implementations.

Thank you so much @acul71 for detailed feedback.
I'll address em by the next commit!!

Hi @seetadev @pacrob @acul71,
Should I drop the hybrid async/sync approach which could have data loss, and strictly adopt go-libp2p which is Fully synchronous? As managing both async/sync will incur potential data mismatch/corrupt.

@seetadev
Copy link
Contributor

@Winter-Soren : Thank you so much for asking this question. Appreciate it.

If the hybrid async/sync approach poses a real risk of data loss or corruption, I’d recommend going with the fully synchronous go-libp2p approach.

Reliability should take priority over performance here, unless there’s a strong reason to keep async for throughput.

On a similar note, please organize the code in a modular architecture such that if someone wishes to switch from sync to async, they can write the async mechanism and make a switch. This switch should avoid data corruption.

Async is important for streaming and high throughput work with webrtc or UDP.

@Winter-Soren
Copy link
Contributor Author

Winter-Soren commented Nov 2, 2025

Optional Dependencies for Persistent Peerstore

The persistent peerstore implementation supports multiple backend storage engines. While SQLite and in-memory backends work out of the box, LevelDB and RocksDB backends require additional system dependencies that can be challenging to install across different platforms.

Supported Backends

✅ Always Available

  • SQLite: Built into Python, no additional dependencies required
  • Memory: Pure Python implementation, no additional dependencies required

⚠️ Optional (Require System Dependencies)

  • LevelDB: Requires plyvel package and LevelDB C++ library
  • RocksDB: Requires python-rocksdb package and RocksDB C++ library

Installation Issues

Current Status

The LevelDB and RocksDB backends currently require:

  1. System-level C++ libraries to be installed
  2. Compilation of Python C extensions during pip install
  3. Platform-specific build tools and headers

This creates installation challenges across different operating systems and architectures.

Known Installation Failures

LevelDB (plyvel)

pip install plyvel
# Error: plyvel/_plyvel.cpp:1227:10: fatal error: 'leveldb/db.h' file not found

Root Cause: Missing LevelDB C++ library headers on the system.

RocksDB (python-rocksdb)

pip install python-rocksdb
# Error: Multiple Cython compilation errors in rocksdb/_rocksdb.pyx

Root Cause: Missing RocksDB C++ library or incompatible Cython version.

Platform-Specific Workarounds

macOS

# Install system dependencies first
brew install leveldb rocksdb

# Then install Python packages
pip install plyvel python-rocksdb

Ubuntu/Debian

# Install system dependencies first
sudo apt-get install libleveldb-dev librocksdb-dev

# Then install Python packages
pip install plyvel python-rocksdb

Windows

  • Requires manual compilation or pre-built binaries
  • Often the most challenging platform for these dependencies

Current Implementation Behavior

The persistent peerstore gracefully handles missing optional dependencies:

  1. Factory Functions: Check for package availability at creation time
  2. Clean Error Messages: Provide clear installation instructions
  3. Test Skipping: Tests automatically skip when dependencies are unavailable
  4. Fallback Options: SQLite and memory backends always available

Example Usage

from libp2p.peer.persistent import (
    create_async_sqlite_peerstore,    # ✅ Always works
    create_async_memory_peerstore,    # ✅ Always works
    create_async_leveldb_peerstore,   # ⚠️ May fail if plyvel not installed
    create_async_rocksdb_peerstore,   # ⚠️ May fail if python-rocksdb not installed
)

# Safe usage pattern
try:
    peerstore = create_async_leveldb_peerstore("/path/to/db")
except ImportError as e:
    print(f"LevelDB not available: {e}")
    # Fallback to SQLite
    peerstore = create_async_sqlite_peerstore("/path/to/db.sqlite")

Seeking Community Help

We need community input on the best approach to resolve these installation challenges:

Option 1: Pure Python Alternatives

  • Find or develop pure Python implementations of LevelDB/RocksDB interfaces
  • Trade some performance for ease of installation
  • Ensure cross-platform compatibility

Option 2: Pre-compiled Wheels

  • Work with maintainers to provide pre-compiled wheels for common platforms
  • Reduce compilation requirements for end users
  • May still require fallbacks for unsupported platforms

Option 3: Alternative Backends

  • Consider other high-performance key-value stores with better Python support
  • Examples: LMDB (python-lmdb), Berkeley DB, or others
  • Evaluate performance vs. installation complexity trade-offs

Option 4: Docker/Container Solutions

  • Provide Docker images with all dependencies pre-installed
  • Good for development and deployment scenarios
  • May not solve library usage scenarios

Current Test Results

With the current implementation:

  • 72 tests passed, 8 skipped when optional dependencies are not available
  • All core functionality works with SQLite and memory backends
  • Optional backend tests skip gracefully with clear messages

Workaround for Development

For development and testing purposes, the SQLite backend provides excellent performance and persistence capabilities without any additional dependencies:

# Recommended for most use cases
peerstore = create_async_sqlite_peerstore("peerstore.db")

The SQLite backend supports all the same features as LevelDB and RocksDB backends and is suitable for production use in most scenarios.

Hi!
@seetadev @acul71 wanted to have your feedback on this!

seetadev and others added 7 commits November 10, 2025 19:35
… implementations

- Updated method signatures in IAsyncAddrBook and IAsyncProtoBook to improve readability by formatting parameters across multiple lines.
- Removed unused imports of AsyncGenerator and Coroutine from various files.
- Cleaned up whitespace in multiple files to maintain consistency and improve code readability.
- Ensured that docstrings remain clear and informative after changes.
…atastore implementations

- Updated method signatures in IAsyncCertifiedAddrBook to use Envelope | None for better type clarity.
- Cleaned up whitespace in various files, including peerstore and datastore implementations, to enhance code readability.
- Simplified iterator usage in LevelDBDatastoreSync to yield directly from the iterator.
- Ensured consistent formatting in the __all__ declarations across datastore modules.
- Added type ignore comments for optional imports of plyvel and rocksdb to improve type checking.
- Updated the RocksDBDatastoreSync class to include a type ignore comment for the dynamically imported _rocksdb module, ensuring better compatibility with type checkers.
- Updated backoff handling logic in the GossipSub class to always add backoff, regardless of topic presence in the mesh.
- Adjusted the wait time in the test_handle_prune function to 0.5 seconds for better compatibility on Windows systems.
@yashksaini-coder
Copy link
Contributor

PR Review Summary: Persistent Storage for Peerstore #946

Summary:
This PR adds persistent storage to py-libp2p’s peerstore, supporting multiple databases (SQLite, LevelDB, RocksDB, Memory) and mirroring go-libp2p’s design. The implementation is architecturally solid but has some major concerns to address before merging.

✅ Good Points

  • Clean implementation and use of interfaces (IDatastore, etc.) and factories
  • Well-structured modules, code is clear and uses type hints
  • Supports both synchronous and asynchronous APIs
  • Tests are thorough across backends

❗ Critical Issues (Must Fix)

  1. Security - Unsafe Pickle Usage
    • Using pickle for (de)serialization is unsafe and can allow code execution if data is tampered with.
    • Fix: Use safe formats like JSON, Protocol Buffers, or MessagePack.
  2. Resource Management
    • Using __del__ for cleanup is unreliable in Python.
    • Fix: Remove __del__, use explicit close(), and support context managers.
  3. Thread Safety in SQLite
    • SQLite queries lack proper locking, risking race conditions.
    • Fix: Use locks and connection guards for thread-safe access.

⚠️ Other Issues (Should/Consider Fixing)

  • Catching broad Exception hides real bugs. Catch only specific exceptions.
  • Sync (sync()) is called on every write, impacting performance. Make sync batching/configurable.
  • Document error cases in method docstrings.
  • Reduce code duplication between sync and async implementations if possible.
  • Clarify design of query() being synchronous in an async context.
  • Prefer TYPE_CHECKING for optional imports, and document dependencies.

💡 Suggestions

  • Test edge cases: concurrent access, corruption, disk full, backend migration.
  • Consider connection pooling for SQLite if high throughput needed.

🔨 Summary of Action Items

Must fix before merge:

  • Replace pickle with safe serialization (recommend Protocol Buffers)
  • Remove __del__, add context manager support
  • Fix SQLite query thread safety

Should fix:

  • Use narrow exception handling everywhere
  • Add better error documentation in docstrings
  • Optimize/schedule sync() calls
  • Clarify/tidy query() async sync handling

Nice to have:

  • Reduce code duplication
  • Clean up optional dependency handling
  • More edge-case testing
  • Consider connection pooling

Recommendation:
Request changes. PR is architecturally sound and well-organized, but critical security and reliability fixes are needed.

References:


@seetadev @Winter-Soren

@acul71
Copy link
Contributor

acul71 commented Nov 17, 2025

AI Pull Request Review: PR #946 - feat/945-persistent-storage-for-peerstore

PR: #946
Author: @Winter-Soren
Issue: #945
Date: 2025-01-17
Status: Open


1. Summary of Changes

This PR implements persistent storage for py-libp2p's peerstore, addressing issue #945. The implementation enables peer data (addresses, keys, metadata, protocols, latency metrics) to survive process restarts, addressing a critical limitation where all peer information was lost on restart.

Key Components Added:

  1. Datastore Abstraction Layer (libp2p/peer/persistent/datastore/):

    • IDatastore and IDatastoreSync interfaces for pluggable backends
    • Multiple backend implementations: SQLite, LevelDB, RocksDB, Memory
    • Batching support via IBatchingDatastore and IBatchingDatastoreSync
    • Both async and sync variants for all backends
  2. Persistent PeerStore Implementations:

    • AsyncPersistentPeerStore (libp2p/peer/persistent/async_/peerstore.py): Full async implementation using trio
    • SyncPersistentPeerStore (libp2p/peer/persistent/sync/peerstore.py): Fully synchronous implementation
    • Both maintain compatibility with existing IPeerStore and IAsyncPeerStore interfaces
    • In-memory caching layer for performance optimization
  3. Factory Functions (libp2p/peer/persistent/factory.py):

    • Convenient creation methods for different backends
    • Support for both async and sync peerstores
    • Easy backend selection via string identifiers
  4. Test Suites:

    • test_async_persistent_peerstore.py - Async peerstore tests
    • test_sync_persistent_peerstore.py - Sync peerstore tests
    • test_persistent_peerstore_backends.py - Backend-specific tests
    • test_persistent_peerstore_persistence.py - Persistence across restarts

Files Changed:

  • 39 files changed: 6,446 insertions(+), 31 deletions(-)
  • Major additions in libp2p/peer/persistent/ (new directory structure)
  • Comprehensive test coverage across all backends

Breaking Changes:

None identified. This is a new feature addition that maintains backward compatibility with existing in-memory peerstore.


2. Strengths

Architecture & Design

  • Datastore-agnostic design: Clean abstraction layer following go-libp2p's pstoreds pattern, allowing multiple backend implementations
  • Dual implementation: Both async and sync peerstores provide flexibility for different use cases
  • Interface compatibility: Maintains full compatibility with existing IPeerStore and IAsyncPeerStore interfaces
  • Comprehensive backend support: SQLite (default, no dependencies), LevelDB, RocksDB, and Memory backends
  • In-memory caching: Performance optimization through caching layer reduces disk I/O

Code Quality

  • Well-structured modules: Clear separation of concerns between datastore abstraction, peerstore implementations, and factory functions
  • Type hints: Comprehensive type annotations throughout the codebase
  • Error handling: Proper exception handling with meaningful error messages
  • Thread safety: Proper use of locks (trio.Lock for async, threading.RLock for sync) for concurrent access

Testing

  • Comprehensive test coverage: Multiple test suites covering async, sync, backends, and persistence scenarios
  • All tests passing: 72 persistent peerstore tests passed, 8 skipped (expected for optional backends)
  • Backend-specific tests: Tests for each datastore backend implementation
  • Persistence tests: Verification that data survives process restarts
  • Full suite validation: All 1,670 tests in codebase passing, confirming stability

Documentation

  • Newsfragment: Proper newsfragment exists (newsfragments/946.feature.rst) with user-facing description
  • Docstrings: Comprehensive docstrings for classes and methods
  • Examples: Usage examples provided in the PR description

3. Issues Found

Critical

3.1 Security: Unsafe Pickle Usage

  • File: libp2p/peer/persistent/async_/peerstore.py, libp2p/peer/persistent/sync/peerstore.py

  • Line(s): Multiple locations (12, 106, 113, 121, 127, 133, 153, 158, 162, 166, 175, 191, 204, 215, 222 in async; similar in sync)

  • Issue: Using pickle for serialization/deserialization is a critical security vulnerability. Pickle can execute arbitrary Python code during deserialization, making it unsafe for untrusted data sources.

  • Impact: HIGH - If an attacker can tamper with the datastore (e.g., through file system access, network storage, or compromised backend), they can execute arbitrary code on the system.

  • Suggestion: Replace pickle with a safe serialization format:

    • Protocol Buffers (recommended): Already used extensively in py-libp2p, provides type safety and cross-language compatibility
    • JSON: Simple but limited (no binary data support without base64 encoding)
    • MessagePack: Binary format with better performance than JSON, safer than pickle
    • Custom serialization: Implement type-specific serialization for PeerData, Envelope, etc.

    Example fix using Protocol Buffers:

    # Instead of:
    await self.datastore.put(addr_key, pickle.dumps(peer_data.addrs))
    
    # Use:
    from libp2p.peer.persistent.pb import peer_data_pb2
    pb_addrs = peer_data_pb2.Addresses()
    pb_addrs.addrs.extend([addr.to_bytes() for addr in peer_data.addrs])
    await self.datastore.put(addr_key, pb_addrs.SerializeToString())

3.2 Resource Management: Unreliable __del__ Usage

  • File: Multiple datastore implementations

    • libp2p/peer/persistent/datastore/leveldb.py:163
    • libp2p/peer/persistent/datastore/leveldb_sync.py:187
    • libp2p/peer/persistent/datastore/rocksdb.py:179
    • libp2p/peer/persistent/datastore/rocksdb_sync.py:197
    • libp2p/peer/persistent/datastore/sqlite.py:184
    • libp2p/peer/persistent/datastore/sqlite_sync.py:185
  • Issue: Using __del__ for cleanup is unreliable in Python. The __del__ method may not be called in all circumstances (e.g., during interpreter shutdown, circular references, exceptions during object creation), leading to resource leaks (unclosed database connections, file handles).

  • Impact: MEDIUM - Resource leaks can accumulate over time, especially in long-running applications

  • Suggestion:

    1. Remove all __del__ methods
    2. Implement explicit close() methods (already present in some implementations)
    3. Add context manager support (__enter__ and __exit__) for proper resource management
    4. Document that users must explicitly close datastores or use context managers

    Example fix:

    # Remove __del__ and ensure close() is called
    def __enter__(self):
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()
        return False

3.3 SQLite Thread Safety Concerns

  • File: libp2p/peer/persistent/datastore/sqlite_sync.py

  • Line(s): 82-84, 99-103, 111-117, 125-128, 136-139, 149-160

  • Issue: While the code uses threading.Lock for synchronization, SQLite connections created with check_same_thread=False can still have race conditions. The connection is shared across operations, and SQLite's default threading mode may not be thread-safe for concurrent writes.

  • Impact: MEDIUM - Potential data corruption or database lock errors under concurrent access

  • Suggestion:

    1. Use SQLite's WAL (Write-Ahead Logging) mode for better concurrency
    2. Consider connection pooling or per-thread connections
    3. Add explicit timeout handling for database locks
    4. Document thread safety guarantees and usage patterns

    Example fix:

    self.connection = sqlite3.connect(
        str(self.path),
        check_same_thread=False,
        timeout=30.0  # Add timeout
    )
    self.connection.execute("PRAGMA journal_mode=WAL")  # Enable WAL mode

Major

3.4 Broad Exception Handling

  • File: Multiple files in libp2p/peer/persistent/
  • Line(s):
    • async_/peerstore.py:140-142, 180-181, 194-195
    • sync/peerstore.py:142-144, 182-183, 196-197, 216-217, 225-226
  • Issue: Catching broad Exception hides specific errors and makes debugging difficult. It also prevents proper error propagation for unexpected errors.
  • Impact: MEDIUM - Makes debugging harder and may hide real bugs
  • Suggestion: Catch specific exceptions:
    # Instead of:
    except Exception as e:
        logger.error(f"Failed to load peer data for {peer_id}: {e}")
    
    # Use:
    except (KeyError, ValueError, TypeError) as e:
        logger.error(f"Failed to load peer data for {peer_id}: {e}")
        # Handle specific error types
    except Exception as e:
        logger.exception(f"Unexpected error loading peer data for {peer_id}")
        raise  # Re-raise unexpected errors

3.5 Performance: Sync Called on Every Write

  • File: libp2p/peer/persistent/async_/peerstore.py:178, libp2p/peer/persistent/sync/peerstore.py:180, 206, 224

  • Issue: sync() is called after every write operation, which can significantly impact performance. For high-throughput scenarios, this creates unnecessary disk I/O.

  • Impact: MEDIUM - Performance degradation in high-throughput scenarios

  • Suggestion:

    1. Make sync batching configurable
    2. Add a background sync task that periodically flushes writes
    3. Allow users to control sync frequency
    4. Document performance implications

    Example:

    def __init__(self, datastore: IDatastore, max_records: int = 10000, sync_interval: float = 1.0):
        self.sync_interval = sync_interval
        self._last_sync = time.time()
        self._pending_sync = False
    
    async def _save_peer_data(self, peer_id: ID, peer_data: PeerData) -> None:
        # ... save operations ...
        self._pending_sync = True
        if time.time() - self._last_sync > self.sync_interval:
            await self.datastore.sync(b"")
            self._last_sync = time.time()
            self._pending_sync = False

3.6 Missing Error Documentation

  • File: Multiple methods across peerstore implementations
  • Issue: Method docstrings don't document which exceptions can be raised or error conditions
  • Impact: LOW - Makes API usage less clear
  • Suggestion: Add :raises: sections to docstrings:
    async def _save_peer_data(self, peer_id: ID, peer_data: PeerData) -> None:
        """
        Save peer data to datastore.
        
        :param peer_id: The peer ID to save data for
        :param peer_data: The peer data to save
        :raises PeerStoreError: If saving to datastore fails
        :raises ValueError: If peer_id or peer_data is invalid
        """

Minor

3.7 Code Duplication Between Sync and Async

  • File: libp2p/peer/persistent/async_/peerstore.py vs libp2p/peer/persistent/sync/peerstore.py
  • Issue: Significant code duplication between async and sync implementations. The logic is nearly identical, only differing in async/await usage.
  • Impact: LOW - Maintenance burden, but acceptable for clarity
  • Suggestion: Consider using code generation or shared base classes if the duplication becomes problematic. For now, this is acceptable as it provides clear separation.

3.8 Optional Dependency Handling

  • File: libp2p/peer/persistent/datastore/leveldb.py, libp2p/peer/persistent/datastore/rocksdb.py
  • Issue: Optional imports (plyvel, rocksdb) are handled with try/except, but could use TYPE_CHECKING for better type checking support
  • Impact: LOW - Minor type checking improvement
  • Suggestion: Use TYPE_CHECKING for optional imports:
    from typing import TYPE_CHECKING
    
    if TYPE_CHECKING:
        import plyvel
    else:
        try:
            import plyvel
        except ImportError:
            plyvel = None

3.9 Query Method in Async Context

  • File: libp2p/peer/persistent/datastore/base.py:95
  • Issue: The query() method returns a synchronous iterator in an async context, which may be confusing
  • Impact: LOW - Minor API design concern
  • Suggestion: Consider providing an async iterator variant or document the behavior clearly

4. Security Review

4.1 Pickle Security Vulnerability

  • Risk: CRITICAL - Using pickle for serialization allows arbitrary code execution if data is tampered with
  • Impact: HIGH - Complete system compromise possible if datastore is accessible to attackers
  • Mitigation: Replace pickle with Protocol Buffers or another safe serialization format (see issue 3.1)

4.2 Input Validation

  • Risk: MEDIUM - Peer IDs and data are not extensively validated before storage
  • Impact: MEDIUM - Potential for data corruption or injection attacks
  • Mitigation: Add validation for peer IDs, addresses, and metadata before storage

4.3 File System Permissions

  • Risk: LOW - SQLite/LevelDB/RocksDB files may be created with insecure permissions
  • Impact: LOW - Unauthorized access to peer data
  • Mitigation: Ensure database files are created with appropriate permissions (e.g., 0600 for SQLite)

4.4 Sensitive Data Storage

  • Risk: MEDIUM - Private keys are stored in the datastore
  • Impact: HIGH - If datastore is compromised, private keys are exposed
  • Mitigation:
    • Document security implications clearly
    • Consider encryption at rest for sensitive data
    • Ensure proper file permissions on datastore files

5. Documentation and Examples

5.1 Documentation Quality

  • Newsfragment: Proper newsfragment exists with user-facing description
  • Docstrings: Comprehensive docstrings for classes and methods
  • ⚠️ API Documentation: Missing :raises: sections in docstrings (see issue 3.6)
  • ⚠️ Usage Examples: Examples mentioned in PR but not verified in codebase

5.2 Missing Documentation

  • Issue: No user-facing documentation in docs/ directory explaining:
    • How to use persistent peerstore
    • Backend selection and configuration
    • Performance considerations
    • Migration from in-memory peerstore
  • Suggestion: Add documentation to docs/ directory with:
    • Quick start guide
    • Backend comparison and selection guide
    • Configuration options
    • Performance tuning tips

5.3 Example Code

  • Issue: Examples mentioned in PR description but location not verified
  • Suggestion: Verify examples exist and are comprehensive, or add them if missing

6. Newsfragment Requirement

Newsfragment Present: newsfragments/946.feature.rst exists and contains appropriate user-facing description.

Content Review:

  • ✅ Filename format correct: 946.feature.rst
  • ✅ Type correct: .feature.rst (new functionality)
  • ✅ Content is user-facing (not developer-focused)
  • ✅ Describes key features and benefits
  • ⚠️ Note: File should end with a newline character (not verified in review)

Status:PASS - Newsfragment requirement met.


7. Tests and Validation

7.1 Test Coverage

  • Comprehensive test suites: Four test files covering async, sync, backends, and persistence
  • Backend-specific tests: Tests for each datastore backend
  • Persistence tests: Verification that data survives process restarts
  • Integration tests: Tests for real-world usage scenarios

7.2 Build and Linting

  • Linting: All checks passed (make pr completed successfully)
  • Type checking: Mypy and pyrefly typecheck passed
  • Formatting: Code formatting checks passed

7.3 Documentation Build

  • HTML Build: Documentation build succeeded with no errors
  • Doctests: All 4 doctests passed successfully
  • Newsfragments: Validated successfully, PR feat/945-persistent-storage-for-peerstore #946 newsfragment included in draft release notes
  • Status: Documentation is ready and builds correctly

7.4 Test Execution

  • Verified: All persistent peerstore tests executed successfully
  • Results:
    • test_async_persistent_peerstore.py: All tests passed
    • test_sync_persistent_peerstore.py: All tests passed
    • test_persistent_peerstore_backends.py: All tests passed (8 skipped for optional backends)
    • test_persistent_peerstore_persistence.py: All tests passed
  • Summary: 72 tests passed, 8 skipped (expected - optional LevelDB/RocksDB backends not available)
  • Full Test Suite: 1,670 tests passed, 12 skipped across entire codebase
  • Status: ✅ All tests passing - codebase is stable and ready for use

7.5 Edge Case Testing

  • ⚠️ Missing: Tests for edge cases mentioned in previous reviews:
    • Concurrent access scenarios
    • Database corruption recovery
    • Disk full scenarios
    • Backend migration
  • Suggestion: Add tests for these edge cases to improve robustness

8. Recommendations for Improvement

8.1 Immediate Fixes (Blockers)

  1. Replace pickle with safe serialization: Use Protocol Buffers or MessagePack (CRITICAL security issue)
  2. Remove __del__ methods: Implement context managers and explicit close() methods
  3. Improve SQLite thread safety: Enable WAL mode and add proper timeout handling

8.2 High Priority Fixes

  1. Narrow exception handling: Catch specific exceptions instead of broad Exception
  2. Optimize sync calls: Make sync batching configurable to improve performance
  3. Add error documentation: Document exceptions in method docstrings

8.3 Medium Priority Improvements

  1. Add edge case tests: Test concurrent access, corruption, disk full scenarios
  2. User documentation: Add comprehensive docs to docs/ directory
  3. Performance tuning: Document performance characteristics and tuning options

8.4 Low Priority Enhancements

  1. Reduce code duplication: Consider shared base classes if duplication becomes problematic
  2. Improve type checking: Use TYPE_CHECKING for optional imports
  3. Connection pooling: Consider connection pooling for high-throughput scenarios

9. Questions for the Author

  1. Pickle Security: Was pickle chosen for convenience, or is there a specific reason? Given the security implications, would you be open to migrating to Protocol Buffers?

  2. Sync Performance: The current implementation calls sync() after every write. Have you benchmarked this? Would you consider making sync batching configurable?

  3. Thread Safety: For SQLite, have you tested concurrent access scenarios? The current implementation uses locks, but SQLite's default mode may still have issues.

  4. Resource Management: The __del__ methods are unreliable. Would you prefer to remove them and require explicit close() calls, or implement context managers?

  5. Backend Selection: What's the recommended backend for production use? SQLite seems to be the default, but are there performance characteristics that favor LevelDB or RocksDB?

  6. Migration Path: Is there a utility to migrate from in-memory peerstore to persistent peerstore, or is this manual?

  7. Data Format: If we migrate from pickle to Protocol Buffers, how should we handle migration of existing datastores?

  8. Examples: Where are the usage examples mentioned in the PR? Are they in the codebase or planned?


10. Overall Assessment

Quality Rating: Needs Work

The PR implements a well-architected persistent peerstore following go-libp2p patterns. The code is well-structured, has comprehensive test coverage (all 72 persistent peerstore tests passing), and maintains interface compatibility. However, critical security and reliability issues prevent immediate merge:

  • Blockers:
    • Pickle security vulnerability (CRITICAL)
    • Unreliable __del__ resource management
    • SQLite thread safety concerns
  • Major Issues:
    • Broad exception handling
    • Performance concerns with sync calls
    • Missing error documentation

Security Impact: HIGH

The pickle usage is a critical security vulnerability that must be addressed before merge. Additionally, private keys are stored in the datastore, requiring careful consideration of file permissions and access controls.

Merge Readiness: Needs Fixes

The PR requires fixes for:

  1. CRITICAL: Replace pickle with safe serialization (Protocol Buffers recommended)
  2. CRITICAL: Remove __del__ methods and implement proper resource management
  3. HIGH: Improve SQLite thread safety
  4. MEDIUM: Narrow exception handling
  5. MEDIUM: Optimize sync calls for performance

After these fixes, the PR should be re-reviewed for:

  • Security audit of serialization implementation
  • Performance benchmarking
  • Final code quality review

Confidence: High

The review is comprehensive and identifies all critical issues. The fixes required are well-defined, though the pickle replacement will require significant refactoring.


Review Checklist

  • PR context gathered (body, comments, related issues)
  • Code changes reviewed
  • Linting errors identified (none found)
  • Security concerns assessed (CRITICAL issues found)
  • Documentation gaps identified
  • Test coverage evaluated
  • Newsfragment requirement verified (✅ present)
  • Documentation build verified (✅ build succeeded, all doctests passed)
  • Test execution verified (✅ all tests passing)

References


Reviewer Notes:

  • This is a substantial and well-architected PR that addresses a critical need
  • The security issues are serious but fixable with proper serialization
  • The code quality is high, with good structure and comprehensive tests (all 72 persistent peerstore tests passing)
  • All tests in the full codebase (1,670 tests) are passing, confirming overall stability
  • Once security and reliability issues are addressed, this will be a valuable addition to py-libp2p
  • The implementation follows go-libp2p patterns well, ensuring consistency across implementations

@Winter-Soren
Copy link
Contributor Author

Thank you so much @yashksaini-coder and @acul71 for your comprehensive feedback!!!

I have successfully addressed ALL the critical feedback.

Original Critical Issues - ALL COMPLETED

  1. CRITICAL: Replace unsafe pickle serialization with Protocol Buffers
  2. CRITICAL: Remove unreliable __del__ methods and implement context managers
  3. CRITICAL: Improve SQLite thread safety with WAL mode and timeouts
  4. HIGH: Replace broad Exception catches with specific exceptions
  5. MEDIUM: Make sync calls configurable for better performance
  6. LOW: Add proper error documentation to method docstrings

Additional Issues Fixed

  1. Update sync peerstore with Protocol Buffer serialization
  2. Fix pre-commit type checking issues

🔧 Type Checking Issues Resolved

  • ✅ Added proper context manager methods (__aenter__, __aexit__, __enter__, __exit__) to all peerstore classes
  • ✅ Fixed type annotations for context manager methods
  • ✅ Resolved KeyType enum assignment issues in serialization
  • ✅ Fixed Envelope serialization/deserialization method calls
  • ✅ Corrected method names in example files (added _async suffix)
  • ✅ Fixed import paths for crypto serialization functions
  • ✅ Resolved PeerRecordState serialization issues
  • ✅ Added missing _closed attribute to LevelDB datastore
  • ✅ Fixed SQLite batch lock type compatibility

🧪 Testing Status

  • ✅ All 52 core peerstore tests PASSING
  • ✅ Protocol Buffer serialization working correctly
  • ✅ SQLite persistence functioning with new serialization
  • ✅ Context managers working properly
  • ✅ Both async and sync implementations fully functional

The persistent peerstore implementation is now production-ready with:

  • 🔒 Secure Protocol Buffer serialization (no more pickle vulnerability)
  • 🛡️ Reliable resource management with context managers
  • High-performance SQLite with WAL mode and configurable sync
  • 🎯 Robust error handling with specific exceptions
  • 📚 Comprehensive documentation with proper error descriptions

yashksaini-coder and others added 3 commits November 19, 2025 08:58
…ation modules

- Cleaned up whitespace and improved formatting for better readability in example usage of persistent peerstore.
- Updated serialization methods to ensure consistent formatting and clarity in the codebase.
- Enhanced context manager exit method signatures across various datastore implementations for improved readability.
- Simplified imports for public and private key serialization by removing unnecessary key type mappings.
- Improved comments for clarity regarding the limitations of reconstructing PeerRecordState.
- Enhanced code readability through consistent formatting and whitespace adjustments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants