Skip to content

Add true end-to-end VPN connectivity tests using network namespaces #14912

@dguido

Description

@dguido

Summary

Current integration tests verify that VPN services start, but don't verify they work. We have legacy LXD tests in tests/legacy-lxd/ that actually tested VPN connectivity (client connects, handshake succeeds, traffic flows), but they're no longer run.

This issue proposes modernizing E2E testing using Linux network namespaces, which work natively on GitHub Actions runners.

Current State

What integration tests do now:

  • Deploy Algo to localhost
  • Verify services are running (systemctl is-active)
  • Verify config files exist
  • Test DNS resolution from server

What they DON'T test:

  • VPN client can actually connect
  • Handshake/tunnel establishes successfully
  • Traffic flows through VPN
  • DNS works through VPN tunnel
  • CA certificate constraints are valid

Legacy LXD tests (tests/legacy-lxd/) did all of this but are dead code - not run in CI.

Proposed Solution: Network Namespaces

Use Linux network namespaces to simulate a client connecting to the server, all on the same GitHub Actions runner.

Architecture

┌─────────────────────────────────────────────────┐
│              GitHub Actions Runner              │
│                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    │
│  │ Main Namespace  │    │ Client Namespace │    │
│  │   (VPN Server)  │────│   (VPN Client)   │    │
│  │                 │veth│                  │    │
│  │  wg0: 10.19.x.1 │    │  wg0: 10.19.x.2  │    │
│  │  strongswan     │    │  ipsec client    │    │
│  │  dns: 172.16.0.1│    │                  │    │
│  └─────────────────┘    └─────────────────┘    │
└─────────────────────────────────────────────────┘

Prior Art

Implementation Plan

Phase 1: Create E2E Test Infrastructure

New file: tests/e2e/test-vpn-connectivity.sh

#!/bin/bash
set -euo pipefail

VPN_TYPE="${1:-wireguard}"
SERVER_IP="${2:-127.0.0.1}"
CONFIG_DIR="${3:-configs/localhost}"

cleanup() {
    ip netns exec vpn-client wg-quick down "$CONFIG_DIR/wireguard/alice.conf" 2>/dev/null || true
    ip netns delete vpn-client 2>/dev/null || true
    ip link delete veth-host 2>/dev/null || true
}
trap cleanup EXIT

echo "=== Setting up client network namespace ==="

# Create client namespace
ip netns add vpn-client

# Create veth pair to connect namespaces
ip link add veth-host type veth peer name veth-client
ip link set veth-client netns vpn-client

# Configure host side
ip addr add 10.200.200.1/24 dev veth-host
ip link set veth-host up

# Configure client side
ip netns exec vpn-client ip addr add 10.200.200.2/24 dev veth-client
ip netns exec vpn-client ip link set veth-client up
ip netns exec vpn-client ip link set lo up
ip netns exec vpn-client ip route add default via 10.200.200.1

# Enable forwarding and NAT for client namespace
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 10.200.200.0/24 -j MASQUERADE

if [[ "$VPN_TYPE" == "wireguard" ]]; then
    echo "=== Testing WireGuard connectivity ==="

    # Validate config files
    echo "Validating mobileconfig XML..."
    xmllint --noout "$CONFIG_DIR/wireguard/apple/"*/*.mobileconfig

    # Bring up WireGuard in client namespace
    # Modify config to use correct endpoint
    cp "$CONFIG_DIR/wireguard/alice.conf" /tmp/alice-test.conf
    sed -i "s/Endpoint = .*/Endpoint = $SERVER_IP:51820/" /tmp/alice-test.conf

    ip netns exec vpn-client wg-quick up /tmp/alice-test.conf

    # Verify interface exists
    echo "Checking WireGuard interface..."
    ip netns exec vpn-client ip addr show wg0

    # Wait for handshake
    echo "Waiting for handshake..."
    for i in {1..10}; do
        if ip netns exec vpn-client wg show | grep -q "latest handshake"; then
            echo "✓ Handshake successful"
            break
        fi
        sleep 1
    done

    # Verify handshake occurred
    ip netns exec vpn-client wg show
    ip netns exec vpn-client wg show | grep -q "latest handshake" || {
        echo "✗ No WireGuard handshake"
        exit 1
    }

    # Test connectivity through VPN
    echo "Testing ping through VPN tunnel..."
    ip netns exec vpn-client ping -c 3 -W 5 172.16.0.1 || {
        echo "✗ Cannot ping VPN DNS IP"
        exit 1
    }
    echo "✓ Ping successful"

    # Test DNS through VPN
    echo "Testing DNS resolution through VPN..."
    ip netns exec vpn-client dig @172.16.0.1 google.com +short +timeout=5 | grep -q '\.' || {
        echo "✗ DNS resolution failed"
        exit 1
    }
    echo "✓ DNS resolution successful"

    echo "=== WireGuard E2E tests PASSED ==="

elif [[ "$VPN_TYPE" == "ipsec" ]]; then
    echo "=== Testing IPsec connectivity ==="

    # Validate mobileconfig
    echo "Validating mobileconfig XML..."
    xmllint --noout "$CONFIG_DIR/ipsec/apple/alice.mobileconfig"

    # Verify CA name constraints (security check from legacy tests)
    echo "Checking CA name constraints..."
    CA_CHECK=$(openssl verify -verbose \
        -CAfile "$CONFIG_DIR/ipsec/.pki/cacert.pem" \
        "$CONFIG_DIR/ipsec/.pki/certs/"*.crt 2>&1) || true

    if echo "$CA_CHECK" | grep -q "permitted subtree violation"; then
        echo "✓ CA name constraints working correctly"
    else
        echo "⚠ CA name constraints test inconclusive"
    fi

    # Deploy IPsec client config
    # ... (IPsec client setup in namespace - more complex, needs swanctl)

    echo "=== IPsec E2E tests PASSED ==="
fi

echo ""
echo "All E2E connectivity tests passed!"

Phase 2: Integrate into CI

Update: .github/workflows/integration-tests.yml

Add after the "Verify services are running" step:

- name: Run E2E VPN connectivity tests
  run: |
    chmod +x tests/e2e/test-vpn-connectivity.sh

    if [[ "${{ matrix.vpn_type }}" == "wireguard" || "${{ matrix.vpn_type }}" == "both" ]]; then
      sudo ./tests/e2e/test-vpn-connectivity.sh wireguard 127.0.0.1 configs/localhost
    fi

    if [[ "${{ matrix.vpn_type }}" == "ipsec" || "${{ matrix.vpn_type }}" == "both" ]]; then
      sudo ./tests/e2e/test-vpn-connectivity.sh ipsec 127.0.0.1 configs/localhost
    fi

Phase 3: Clean Up Legacy Tests

Once E2E tests are working:

  1. Delete tests/legacy-lxd/ directory
  2. Remove from .ansible-lint exclude list

Test Coverage Matrix

Test Current With E2E
Services start
Config files generated
XML config valid -
Client connects -
Handshake succeeds -
Traffic through VPN -
DNS through VPN -
CA constraints valid -

Dependencies

No new dependencies - uses standard Linux networking tools available on Ubuntu runners:

  • ip netns (iproute2)
  • wg-quick (already installed for integration tests)
  • xmllint (libxml2-utils)
  • iptables

Estimated Effort

  • E2E script development: ~3 hours
  • CI integration: ~1 hour
  • Testing/debugging: ~3 hours
  • Documentation: ~1 hour

Total: ~8 hours

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions