Skip to content

Conversation

@yashksaini-coder
Copy link
Contributor

Fixes: #553 (Implement Advanced Connection Management Features)

What was wrong?

Python libp2p was missing advanced connection management features present in JavaScript libp2p, including:

  • Priority-based dial queue management
  • Automatic reconnection for KEEP_ALIVE peers
  • Global connection limits and intelligent pruning
  • Rate limiting for incoming connections
  • IP allow/deny list filtering
  • Address sorting and DNS resolution
  • Comprehensive connection metrics
  • Enhanced connection state tracking

How was it fixed?

Implemented a complete connection management system matching JavaScript libp2p's capabilities:

Core Components Added:

  • DialQueue: Priority-based scheduling with concurrency limits and queue size management
  • ReconnectQueue: Automatic reconnection with exponential backoff for KEEP_ALIVE tagged peers
  • ConnectionPruner: Intelligent connection pruning based on peer tags, stream count, direction, and age
  • ConnectionRateLimiter: Per-host rate limiting using sliding window algorithm
  • ConnectionGate: IP allow/deny list filtering with CIDR support
  • AddressManager: Address sorting and filtering (transport priority, public/private, circuit relay)
  • DNSResolver: Async DNS resolution for /dns4/, /dns6/, and /dnsaddr/ multiaddrs with caching
  • ConnectionState: Enhanced connection status tracking (PENDING, OPEN, CLOSING, CLOSED) with timeline

Configuration Enhancements:

Extended ConnectionConfig with 17+ new options matching JS libp2p defaults:

  • Global connection limits (max_connections, max_parallel_dials, etc.)
  • Comprehensive timeout configuration (dial_timeout, inbound_upgrade_timeout, etc.)
  • Reconnection parameters (reconnect_retries, reconnect_backoff_factor, etc.)
  • Security lists (allow_list, deny_list with CIDR support)
  • Custom address sorting

Integration:

  • Fully integrated all components into Swarm class
  • Maintains backward compatibility with existing APIs
  • All components follow trio async patterns
  • Comprehensive error handling and logging

Testing:

  • 24 unit tests covering all new components
  • All tests passing ✅
  • Type checking passing ✅
  • Linting passing ✅

Summary of approach:

  1. Analyzed JS libp2p connection manager implementation
  2. Implemented components in phases following dependency order
  3. Integrated components into Swarm with proper lifecycle management
  4. Added comprehensive tests and fixed all type/lint issues

@seetadev @sumanjeet0012 Raised a base PR to help add more advance connection management features.

yashksaini-coder and others added 30 commits November 4, 2025 19:56
… pruning, dialing, metrics tracking, rate limiting, and reconnection queue. These modules provide functionality for IP filtering, connection limits, priority-based dialing, metrics collection, rate limiting for incoming connections, and automatic reconnections for peers tagged with KEEP_ALIVE, aligning with JavaScript libp2p behavior.
… and improving connection handling in the Swarm class. Added support for connection limits, timeouts, rate limiting, and security features such as allow/deny lists. Introduced connection queues for dialing and reconnection, along with metrics tracking for active connections.
Introduced a new network layer for libp2p, including connection management features such as address sorting, filtering, and selection logic. The address manager aligns with JavaScript libp2p behavior and provides functionality for handling IP addresses, including checks for private and loopback addresses, as well as transport priority sorting. Updated connection pruner to enhance sorting logic for connection management.
Introduced a new module for managing connection states in libp2p, aligning with JavaScript libp2p behavior. This includes tracking connection statuses and timelines. Additionally, added a DNS resolver module for handling multiaddrs, supporting /dns4/, /dns6/, and /dnsaddr/ resolutions with caching capabilities. Both modules enhance the network layer's functionality and maintain consistency with existing libp2p features.
Updated the DialQueue class to integrate an AddressManager and DNSResolver for improved address handling. This allows for sorting, filtering, and resolving DNS addresses before dialing peers. The changes enhance the dialing process by preparing addresses and handling potential resolution failures, aligning with the overall network layer improvements.
- Introduced `is_closed` property in `INetConn` interface to check connection status.
- Renamed `connection_gater` to `connection_gate` in `AddressManager` for consistency.
- Enhanced error handling and logging in `ConnectionPruner` and `Swarm` classes.
- Implemented connection management components including `ConnectionGate`, `ConnectionRateLimiter`, `AddressManager`, `DNSResolver`, `DialQueue`, `ReconnectQueue`, and `ConnectionPruner` in the `Swarm` class.
- Added checks for connection limits and improved connection dialing logic using a dial queue.
- Triggered connection pruning and reconnection logic based on connection status and limits.
- Improved dial queue management by checking if it is started before attempting to dial.
- Added detailed error handling for various connection scenarios, including recoverable errors from the dial queue.
- Ensured proper cleanup of connections and resources during failures in dialing and upgrading connections.
- Refactored connection upgrade logic to handle exceptions and resource management more robustly.
- Updated resource scope release to handle both coroutine and synchronous close methods.
…ng and logging

- Updated the `allow_list` parameter in `ConnectionPruner` to accept both string and Multiaddr types for better flexibility.
- Enhanced logging messages in the `Swarm` class for clarity and consistency, particularly in error handling and connection management.
- Improved code readability by breaking long lines and adding whitespace for better structure.
- Added `prometheus-client` version 0.23.0 to the development dependencies in `pyproject.toml` for enhanced monitoring capabilities.
…nnection limits, parallel dials, and timeouts. Implement exponential backoff for connection retries and improve peerstore update verification. Adjust concurrency settings to optimize performance with a higher number of peers.
… code readability. Adjust comments for better understanding of connection management and increase sleep duration for connection establishment.
…hance logging. Update comments for clarity and adjust sleep duration for improved connection stability.
…es to improve code cleanliness and maintainability.
@acul71
Copy link
Contributor

acul71 commented Nov 21, 2025

Hi @yashksaini-coder, thanks for this PR.
I don't understand why I can see that CI/CD tests passes, but in my linux box I have this failing tests:

FAILED tests/core/network/test_connection_management.py::TestDNSResolver::test_dns_resolver_initialization - Failed: async def functions are not natively supported.
FAILED tests/core/network/test_connection_management.py::TestDNSResolver::test_dns_resolver_non_dns_address - Failed: async def functions are not natively supported.
FAILED tests/core/network/test_connection_management.py::TestDNSResolver::test_dns_resolver_cache - Failed: async def functions are not natively supported.
FAILED tests/core/network/test_connection_management.py::TestDNSResolver::test_dns_resolver_clear_cache - Failed: async def functions are not natively supported.

why are you using

@pytest.mark.asyncio
async def test_dns_resolver_initialization(self):

# Instead of :
@pytest.mark.trio
async def test_dns_resolver_initialization(self):

Full review here:

AI Pull Request Review: #1018 - Feat/advance connection management

Review Date: 2025-11-20
PR Number: #1018
Author: yashksaini-coder
Branch: Feat/Advance-Connection-Management
Base: main


1. Summary of Changes

This PR implements comprehensive advanced connection management features for py-libp2p, addressing issue #553. The changes bring Python libp2p's connection management capabilities in line with JavaScript libp2p's implementation.

Issues Addressed

Issue #553 describes missing features in Python's connection management compared to JavaScript libp2p, including:

  • Priority-based dial queue management
  • Automatic reconnection for KEEP_ALIVE peers
  • Global connection limits and intelligent pruning
  • Rate limiting for incoming connections
  • IP allow/deny list filtering
  • Address sorting and DNS resolution
  • Comprehensive connection metrics
  • Enhanced connection state tracking

Modules Affected

New Modules Added:

  • libp2p/network/dial_queue.py - Priority-based dial scheduling
  • libp2p/network/reconnect_queue.py - Automatic reconnection with exponential backoff
  • libp2p/network/connection_pruner.py - Intelligent connection pruning
  • libp2p/network/rate_limiter.py - Per-host rate limiting (sliding window)
  • libp2p/network/connection_gate.py - IP allow/deny list filtering with CIDR support
  • libp2p/network/address_manager.py - Address sorting and filtering
  • libp2p/network/dns_resolver.py - Async DNS resolution for /dns4/, /dns6/, /dnsaddr/
  • libp2p/network/connection_state.py - Enhanced connection status tracking
  • libp2p/network/metrics.py - Connection metrics tracking

Modified Modules:

  • libp2p/network/swarm.py - Integrated all connection management components
  • libp2p/network/config.py - Extended ConnectionConfig with 17+ new options
  • examples/connection_management/ - New example scripts demonstrating features
  • tests/core/network/test_connection_management.py - 24 unit tests for new components

Statistics

  • Additions: +5,727 lines
  • Deletions: -67 lines
  • Net Change: +5,660 lines
  • Files Changed: 33 files

Breaking Changes

None identified. The PR maintains backward compatibility with existing APIs.


2. Branch Sync Status and Merge Conflicts

Branch Sync Status

  • Status:Ahead of origin/main
  • Details: Branch is 0 commits behind, 32 commits ahead of origin/main
  • Assessment: The branch contains all commits from main plus 32 new commits implementing the feature.

Merge Conflict Analysis

  • Conflicts Detected:No conflicts - PR can be merged cleanly
  • Details: Test merge completed successfully with no conflicts detected.

3. Strengths

  1. Comprehensive Feature Implementation

    • All major connection management features from JS libp2p are implemented
    • Well-structured modular design with clear separation of concerns
    • Each component has a single, well-defined responsibility
  2. Excellent Test Coverage

    • 24 unit tests covering all new components
    • Tests are well-organized and cover both normal and edge cases
    • Good use of fixtures and test utilities
  3. Strong Documentation

    • Comprehensive example scripts in examples/connection_management/
    • Detailed README with usage instructions
    • Good inline documentation and docstrings
    • Documentation build passes successfully
  4. Proper Integration

    • All components properly integrated into Swarm class
    • Maintains backward compatibility
    • Follows existing code patterns and conventions
  5. Configuration Design

    • Extensive configuration options matching JS libp2p defaults
    • Sensible defaults for all new parameters
    • Clear parameter names and types
  6. Error Handling

    • Comprehensive error handling throughout
    • Proper logging at appropriate levels
    • Graceful degradation where appropriate
  7. Type Safety

    • Type checking passes (mypy and pyrefly)
    • Good use of type hints throughout
    • Proper handling of optional values

4. Issues Found

Critical

4.1 Missing Newsfragment (BLOCKER)

  • File: newsfragments/
  • Issue: No newsfragment file exists for issue Implement Advanced Connection Management Features from JS libp2p #553
  • Severity: CRITICAL / BLOCKER
  • Impact: PR cannot be approved without a newsfragment. This is a mandatory requirement.
  • Suggestion:
    • Create newsfragments/553.feature.rst with content describing the user-facing aspects of this feature
    • The file must:
      • Follow the format <ISSUE_NUMBER>.<TYPE>.rst (e.g., 553.feature.rst)
      • Contain ReST-formatted user-facing description
      • End with a newline character
      • Focus on user impact, not implementation details
    • Example content:
      Implemented advanced connection management features including priority-based dial queues, automatic reconnection, connection limits and pruning, rate limiting, IP allow/deny lists, DNS resolution, and comprehensive connection metrics. These features bring py-libp2p's connection management capabilities in line with JavaScript libp2p.
      

4.2 Test Failures: DNS Resolver Tests Using Wrong Pytest Marker

  • File: tests/core/network/test_connection_management.py
  • Lines: 223, 229, 240, 253
  • Issue: DNS resolver tests use @pytest.mark.asyncio instead of @pytest.mark.trio
  • Severity: CRITICAL
  • Impact: 4 tests are failing because the project uses pytest-trio plugin, not pytest-asyncio
  • Error Message:
    async def functions are not natively supported.
    You need to install a suitable plugin for your async framework, for example:
      - anyio
      - pytest-asyncio
      - pytest-tornasync
      - pytest-trio
      - pytest-twisted
    
  • Suggestion: Replace all instances of @pytest.mark.asyncio with @pytest.mark.trio in the DNS resolver test class:
    # Change from:
    @pytest.mark.asyncio
    async def test_dns_resolver_initialization(self):
    
    # To:
    @pytest.mark.trio
    async def test_dns_resolver_initialization(self):
  • Affected Tests:
    • test_dns_resolver_initialization
    • test_dns_resolver_non_dns_address
    • test_dns_resolver_cache
    • test_dns_resolver_clear_cache

Major

4.3 Pytest Unknown Mark Warnings

  • File: tests/core/network/test_connection_management.py
  • Lines: 223, 229, 240, 253
  • Issue: Pytest warnings about unknown pytest.mark.asyncio marks
  • Severity: Major
  • Impact: 24 warnings generated during test execution (6 warnings per test × 4 tests)
  • Suggestion: Fix by replacing @pytest.mark.asyncio with @pytest.mark.trio as described in issue 4.2

Minor

4.4 DNS Resolver Uses asyncio Instead of trio

  • File: libp2p/network/dns_resolver.py
  • Line: 10
  • Issue: Module imports asyncio but the project uses trio for async operations
  • Severity: Minor
  • Impact: Potential inconsistency, though the code may be using asyncio for DNS resolution specifically
  • Suggestion: Verify if asyncio is necessary for DNS resolution or if trio equivalents can be used. If asyncio is required (e.g., for asyncio.get_event_loop()), document why in a comment.

5. Security Review

Security Considerations

  1. IP Allow/Deny Lists

    • ✅ Properly implemented with CIDR support
    • ✅ Deny list takes precedence over allow list (correct security behavior)
    • ✅ Input validation for CIDR blocks should be verified
  2. Rate Limiting

    • ✅ Sliding window algorithm implemented
    • ✅ Per-host rate limiting prevents DoS attacks
    • ✅ Configurable thresholds allow tuning for different environments
  3. Connection Limits

    • ✅ Prevents resource exhaustion attacks
    • ✅ Intelligent pruning prevents connection starvation
    • ✅ Allow list bypasses limits (documented behavior)
  4. DNS Resolution

    • ✅ Maximum recursion depth prevents infinite loops
    • ✅ Caching reduces DNS query load
    • ⚠️ Recommendation: Verify DNS resolution doesn't expose internal network information in error messages
  5. No Security Vulnerabilities Identified

    • No unvalidated external input found
    • No unsafe subprocess usage
    • No improper key handling
    • No sensitive data exposure in logs (standard logging levels used)

6. Documentation and Examples

Strengths

  1. Comprehensive Examples

    • 5 example scripts covering all major features
    • Well-documented with clear explanations
    • Production-ready configuration examples
  2. Documentation Structure

    • README with overview and usage instructions
    • EXAMPLES_SUMMARY.md with feature checklist
    • Sphinx documentation builds successfully
    • New documentation page: examples.connection_management.rst
  3. Code Documentation

    • Good docstrings on classes and methods
    • Type hints throughout
    • Clear parameter descriptions

Areas for Improvement

  1. API Documentation

    • Consider adding more detailed API reference documentation
    • Document default values and their rationale
    • Add examples to docstrings for complex methods
  2. Migration Guide

    • Consider adding a migration guide for users upgrading to this version
    • Document any configuration changes needed

7. Newsfragment Requirement

⚠️ CRITICAL: Newsfragment is MISSING - This is a BLOCKER

Current Status

Required Action

  1. Create newsfragment file: newsfragments/553.feature.rst
  2. Content requirements:
    • Must be ReST-formatted
    • Must describe user-facing aspects (not implementation details)
    • Must end with a newline character
    • Must follow the format: <ISSUE_NUMBER>.<TYPE>.rst

Suggested Content

Implemented advanced connection management features including priority-based dial queues, automatic reconnection for KEEP_ALIVE peers, global connection limits with intelligent pruning, rate limiting for incoming connections, IP allow/deny list filtering with CIDR support, DNS resolution for multiaddrs, and comprehensive connection metrics. These features bring py-libp2p's connection management capabilities in line with JavaScript libp2p, improving scalability, security, and resource management for production deployments.

Verification


8. Tests and Validation

Test Execution Summary

  • Total Tests: 1,659
  • Passed: 1,651 ✅
  • Failed: 4 ❌
  • Skipped: 4
  • Warnings: 97
  • Exit Code: 2 (failure)

Test Failures

8.1 DNS Resolver Tests (4 failures)

  • Test Class: TestDNSResolver

  • Failed Tests:

    1. test_dns_resolver_initialization
    2. test_dns_resolver_non_dns_address
    3. test_dns_resolver_cache
    4. test_dns_resolver_clear_cache
  • Root Cause: Tests use @pytest.mark.asyncio instead of @pytest.mark.trio

  • Error: async def functions are not natively supported. You need to install a suitable plugin...

  • Fix: Replace @pytest.mark.asyncio with @pytest.mark.trio in all 4 test methods

Test Warnings

  1. Pytest Unknown Mark Warnings (24 warnings)

    • Location: tests/core/network/test_connection_management.py:223, 229, 240, 253
    • Message: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio
    • Fix: Same as test failures - use @pytest.mark.trio instead
  2. RuntimeWarning (1 warning)

    • Location: tests/core/stream_muxer/test_muxer_multistream.py:68
    • Message: RuntimeWarning: coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
    • Note: This warning is unrelated to this PR (pre-existing)

Test Coverage

  • New Tests Added: 24 unit tests for connection management components
  • Coverage: All new components have test coverage
  • Test Quality: Tests are well-structured and cover normal and edge cases

Linting Results

  • Status:PASSED
  • All Checks Passed:
    • check yaml
    • check toml
    • fix end of files
    • trim trailing whitespace
    • pyupgrade
    • ruff (legacy alias)
    • ruff format
    • mdformat
    • run mypy with all dev dependencies present
    • run pyrefly typecheck locally
    • Check for .rst files in the top-level directory

Type Checking Results

  • Status:PASSED
  • mypy: Passed
  • pyrefly: Passed
  • No type errors found

Documentation Build Results

  • Status:PASSED
  • HTML Build: Successful
  • Doctest: 4 tests, all passed
  • No build errors or warnings

9. Recommendations for Improvement

High Priority

  1. Fix Test Failures (CRITICAL)

    • Replace @pytest.mark.asyncio with @pytest.mark.trio in DNS resolver tests
    • This will fix all 4 failing tests and 24 warnings
  2. Add Newsfragment (BLOCKER)

    • Create newsfragments/553.feature.rst as described in section 7
    • This is required for PR approval

Medium Priority

  1. Verify DNS Resolver Async Implementation

    • Review libp2p/network/dns_resolver.py to ensure asyncio usage is necessary
    • If trio alternatives exist, consider using them for consistency
    • Document why asyncio is used if it's required
  2. Add Integration Tests

    • Consider adding integration tests that exercise multiple components together
    • Test real-world scenarios with actual network connections
  3. Performance Testing

    • Add performance benchmarks for connection management operations
    • Test behavior under high connection loads
    • Verify pruning and rate limiting work correctly under stress

Low Priority

  1. Documentation Enhancements

    • Add more detailed API reference documentation
    • Include migration guide for existing users
    • Add troubleshooting section for common configuration issues
  2. Code Organization

    • Consider grouping related connection management modules in a subdirectory
    • This would improve code organization as the codebase grows

10. Questions for the Author

  1. DNS Resolver Implementation:

    • Why does dns_resolver.py use asyncio instead of trio? Is this intentional for DNS resolution compatibility, or should it be migrated to trio?
  2. Test Marker Choice:

    • The DNS resolver tests use @pytest.mark.asyncio while the rest of the codebase uses @pytest.mark.trio. Was this intentional, or should they be changed to @pytest.mark.trio?
  3. Connection Limits:

    • How were the default connection limits (e.g., max_connections=300) chosen? Are they based on JS libp2p defaults or performance testing?
  4. Backward Compatibility:

    • Are there any edge cases where existing code might break due to the new connection management features? Should we add a migration guide?
  5. Performance Impact:

    • What is the performance impact of the new connection management features? Have you done any benchmarking?
  6. Configuration Validation:

    • Should ConnectionConfig validate that configuration values are reasonable (e.g., max_connections > 0, dial_timeout > 0)? Currently, invalid values might cause runtime errors.

11. Overall Assessment

Quality Rating: Good (Needs fixes before merge)

The PR implements a comprehensive set of connection management features with good code quality, documentation, and test coverage. However, there are critical blockers that must be addressed before the PR can be approved.

Security Impact: Low

No security vulnerabilities identified. The new features (rate limiting, allow/deny lists, connection limits) actually improve security by preventing DoS attacks and providing access control.

Merge Readiness: Needs fixes

Blockers:

  1. ❌ Missing newsfragment for issue Implement Advanced Connection Management Features from JS libp2p #553 (CRITICAL)
  2. ❌ 4 test failures due to incorrect pytest marker (CRITICAL)

Required Actions Before Merge:

  1. Create newsfragments/553.feature.rst with appropriate content
  2. Fix DNS resolver tests by replacing @pytest.mark.asyncio with @pytest.mark.trio
  3. Re-run test suite to verify all tests pass

Confidence: High

The implementation is well-structured and follows existing code patterns. The issues identified are straightforward to fix (newsfragment creation and test marker correction). Once these blockers are resolved, the PR should be ready for merge.

Summary

This is a substantial and well-implemented feature addition that brings py-libp2p's connection management capabilities in line with JavaScript libp2p. The code quality is high, documentation is comprehensive, and the integration is clean. The only blockers are administrative (newsfragment) and a simple test fix (pytest marker). Once these are addressed, this PR should be approved.


Review Completed: 2025-11-20
Reviewer: AI Assistant (following py-libp2p PR review prompt)

…based dial queues, automatic reconnection, connection limits, rate limiting, and comprehensive metrics to align py-libp2p with JavaScript libp2p.
…uts for outbound and inbound stream negotiations, and adding connection direction tracking in the Swarm and ConnectionState classes.
…r for outbound and inbound stream negotiation timeouts in BasicHost class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Advanced Connection Management Features from JS libp2p

2 participants