Skip to content

feat: Fleet authentication middleware, config, and server authorization#3

Open
Copilot wants to merge 8 commits intomainfrom
copilot/add-fleet-auth-dashboard
Open

feat: Fleet authentication middleware, config, and server authorization#3
Copilot wants to merge 8 commits intomainfrom
copilot/add-fleet-auth-dashboard

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 23, 2026

Implements the nevr-fleetmanager tasks from the fleet auth dashboard spec: API key auth middleware for WebSocket connections, validation caching, rate limiting, config expansion, and server registration authorization.

Config (internal/config/)

  • Add FleetManagerConfig with CacheTTL, RateLimitPerMinute, RegistrationTimeout, PingInterval, IdleTimeout and corresponding FM_* env var overrides

Auth package (server/auth/)

  • cache.goValidationCache with TTL expiration, context-stoppable cleanup. Success-only caching (failures never cached).
  • ratelimit.go — Sliding window RateLimiter per IP, configurable max/min.
  • middleware.goAuthMiddleware.Authenticate(r *http.Request) extracts Authorization: Bearer fm_<16hex>_<32hex>, rate-limits, checks cache, falls back to DB + bcrypt, caches on success. Returns typed errors (ErrKeyRevoked, ErrRateLimited, ErrInvalidKey).

WebSocket auth gate (server/evr_fleet_manager.go)

  • HandleWebSocket now authenticates before upgrader.Upgrade() — rejects with 401/403/429 at HTTP level
  • ServerIdentity stored on GameServerInstance.Identity after auth
  • Registration timeout disconnects servers that connect but don't send registration within configured duration

Server registration authorization

  • authorizeServerRegistration replaced from stub → checks identity is present (when middleware active) and enforces API key uniqueness across connected servers
// Auth happens before upgrade — no unauthenticated WebSocket connections
func (fm *FleetManager) HandleWebSocket(w http.ResponseWriter, r *http.Request, queryParams map[string][]string) {
    if fm.authMiddleware != nil {
        identity, err := fm.authMiddleware.Authenticate(r)
        if err != nil {
            // 401/403/429 returned, connection rejected before upgrade
            return
        }
    }
    conn, err := fm.upgrader.Upgrade(w, r, nil)
    // ...
}

cmd/app/main.go

  • Removed commented-out block referencing nonexistent symbols (NewEVRFleetManager, Start, Stop)

42 tests pass with -race. CodeQL clean. golang.org/x/crypto/bcrypt added (v0.48.0, no advisories).


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 7 commits February 23, 2026 08:45
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
…n cache and rate limiter

Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
… authorization

Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
…ecycle

Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Copilot AI changed the title [WIP] Add fleet authentication dashboard execution feat: Fleet authentication middleware, config, and server authorization Feb 23, 2026
Copilot AI requested a review from thesprockee February 23, 2026 08:58
@thesprockee thesprockee marked this pull request as ready for review March 6, 2026 10:32
Copilot AI review requested due to automatic review settings March 6, 2026 10:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds FleetManager API-key authentication components (middleware, caching, rate limiting) and extends FleetManager connection handling to support pre-upgrade auth gating and registration-timeout enforcement, along with config/schema updates and tests.

Changes:

  • Add server/auth package with API key parsing/validation, TTL cache, and per-IP sliding-window rate limiting.
  • Gate WebSocket upgrades behind authentication (when enabled), persist server identity, and enforce API-key uniqueness at registration time.
  • Expand config surface (fleet_manager YAML + FM_* env vars) and add/update unit tests.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
server/evr_fleet_manager.go Adds auth middleware hook, stores server identity, registration timeout loop, and registration authorization checks.
server/evr_fleet_manager_handlers.go Updates registration handler to call new authorization signature and signal registration completion.
server/evr_fleet_manager_test.go Adds tests for server registration authorization and WebSocket behavior without middleware.
server/auth/middleware.go Implements API key extraction/parsing, rate limiting, cache lookup, DB + bcrypt validation, and HTTP JSON error responses.
server/auth/cache.go Adds success-only TTL validation cache with cleanup goroutine.
server/auth/ratelimit.go Adds per-IP sliding-window rate limiter with cleanup goroutine.
server/auth/auth_test.go Adds unit tests for cache, rate limiter, and parsing helpers.
internal/config/config.go Adds FleetManagerConfig and FM_* env var overrides for auth/connection settings.
internal/config/config_test.go Adds coverage for FleetManager config defaults, YAML loading, and env overrides.
configs/config.yaml Documents and provides default fleet_manager config values.
cmd/app/main.go Removes stale commented example block; clarifies this is a dev harness.
go.mod / go.sum Adds golang.org/x/crypto dependency for bcrypt.
.gitignore Ignores app binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +197 to +214
// getClientIP extracts the client IP from the request.
func getClientIP(r *http.Request) string {
// Check X-Forwarded-For header first
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
parts := strings.SplitN(xff, ",", 2)
return strings.TrimSpace(parts[0])
}

// Check X-Real-IP header
if xri := r.Header.Get("X-Real-IP"); xri != "" {
return xri
}

// Fall back to RemoteAddr
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
return r.RemoteAddr
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getClientIP trusts X-Forwarded-For / X-Real-IP unconditionally. If this handler is reachable without a trusted proxy stripping/setting these headers, clients can spoof their IP to bypass per-IP rate limiting. Consider only honoring these headers when the immediate peer (RemoteAddr) is a trusted proxy (or make this behavior configurable), otherwise fall back to RemoteAddr.

Suggested change
// getClientIP extracts the client IP from the request.
func getClientIP(r *http.Request) string {
// Check X-Forwarded-For header first
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
parts := strings.SplitN(xff, ",", 2)
return strings.TrimSpace(parts[0])
}
// Check X-Real-IP header
if xri := r.Header.Get("X-Real-IP"); xri != "" {
return xri
}
// Fall back to RemoteAddr
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
return r.RemoteAddr
}
// isTrustedProxy reports whether the given IP address belongs to a trusted proxy.
// By default, loopback and private address ranges are considered trusted.
func isTrustedProxy(ip string) bool {
parsed := net.ParseIP(ip)
if parsed == nil {
return false
}
if parsed.IsLoopback() || parsed.IsPrivate() {
return true
}
return false
}
// getClientIP extracts the client IP from the request.
func getClientIP(r *http.Request) string {
// Derive the immediate peer IP from RemoteAddr.
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
host = r.RemoteAddr
}
// Only trust forwarding headers if the immediate peer is a trusted proxy.
if isTrustedProxy(host) {
// Check X-Forwarded-For header first.
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
parts := strings.SplitN(xff, ",", 2)
return strings.TrimSpace(parts[0])
}
// Check X-Real-IP header.
if xri := r.Header.Get("X-Real-IP"); xri != "" {
return xri
}
}
// Fall back to the immediate peer IP.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +133
// Async update last_used_at
go func() {
_, _ = am.db.Exec(
`UPDATE server_api_keys SET last_used_at = now() WHERE key_id = $1`,
keyID,
)
}()
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateKeyFromDB spawns a goroutine per successful authentication to update last_used_at and ignores any errors. Under load this can create unnecessary goroutine churn and mask DB issues. Consider performing the update inline with a context/timeout, or funneling updates through a bounded worker/queue and logging failures via am.logger.

Copilot uses AI. Check for mistakes.
}

// cleanupLoop periodically removes expired entries from the cache.
func (vc *ValidationCache) cleanupLoop(ctx context.Context) {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewValidationCache starts a ticker with cleanupInterval without validating it. time.NewTicker(<=0) panics, so misconfiguration (e.g. env var set to 0/negative) can crash the process. Consider guarding against non-positive intervals (disable cleanup or clamp to a minimum).

Suggested change
func (vc *ValidationCache) cleanupLoop(ctx context.Context) {
func (vc *ValidationCache) cleanupLoop(ctx context.Context) {
if vc.cleanupInterval <= 0 {
// Disable cleanup when interval is non-positive to avoid time.NewTicker panic.
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +68
// Set stores a successful validation result in the cache with the configured TTL.
func (vc *ValidationCache) Set(keyID string, identity *ServerIdentity) {
vc.mu.Lock()
defer vc.mu.Unlock()

vc.entries[keyID] = &cacheEntry{
Identity: identity,
ExpiresAt: time.Now().Add(vc.ttl),
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidationCache.Set will happily store a nil identity, creating an entry that will never produce a cache hit (callers check != nil) and will linger until cleanup. Consider rejecting nil (no-op or panic) so cache usage stays consistent with the “success-only caching” contract.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +62
// cleanupLoop periodically removes expired entries from the rate limiter.
func (rl *RateLimiter) cleanupLoop(ctx context.Context) {
ticker := time.NewTicker(rl.cleanupInterval)
defer ticker.Stop()

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewRateLimiter starts a ticker with cleanupInterval without validating it. time.NewTicker(<=0) panics, so a bad config/env value can crash the process. Consider guarding against non-positive intervals (disable cleanup or clamp to a safe minimum).

Copilot uses AI. Check for mistakes.
Comment on lines +511 to +525
// registrationTimeoutLoop disconnects servers that don't register within the timeout.
func (fm *FleetManager) registrationTimeoutLoop(server *GameServerInstance) {
select {
case <-server.registered:
// Server registered successfully
fm.logger.Debug("Server registered within timeout", zap.String("server_id", server.ID))
return
case <-server.stopPing:
// Server disconnected
return
case <-fm.ctx.Done():
// FleetManager shutting down
return
case <-time.After(fm.registrationTimeout):
fm.logger.Warn("Server registration timeout, disconnecting",
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registrationTimeoutLoop uses time.After directly in a select. If the server registers/disconnects before the timeout, the underlying timer still fires later and can accumulate for many connections (GC only after it fires). Prefer time.NewTimer and defer timer.Stop() so the timer can be stopped when another case wins.

Copilot uses AI. Check for mistakes.
if existing.Identity != nil && existing.Identity.KeyID == server.Identity.KeyID {
existing.mu.RUnlock()
fm.serversMu.RUnlock()
return fmt.Errorf("API key already in use by another server")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authorizeServerRegistration uses fmt.Errorf without any formatting directives. Prefer errors.New here to avoid implying formatted output and to keep error creation consistent.

Suggested change
return fmt.Errorf("API key already in use by another server")
return errors.New("API key already in use by another server")

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +90
// 3. Rate limit check on client IP
clientIP := getClientIP(r)
if !am.rateLimiter.Allow(clientIP) {
return nil, ErrRateLimited
}

// 4. Cache check for key_id
if identity := am.cache.Get(keyID); identity != nil {
return identity, nil
}

// 5. Direct DB query
identity, err := am.validateKeyFromDB(keyID, secret)
if err != nil {
return nil, err
}

// 6. Cache successful validation
am.cache.Set(keyID, identity)

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authenticate dereferences am.rateLimiter, am.cache, and am.db without any nil checks. Because AuthMiddleware is an exported struct (and is instantiated as &auth.AuthMiddleware{} in tests), this can panic at runtime if the middleware is not fully initialized. Consider validating required dependencies in NewAuthMiddleware (or making fields unexported and forcing construction via the constructor) and returning a clear error from Authenticate when required components are missing.

Copilot uses AI. Check for mistakes.
Comment on lines 445 to +499
@@ -446,16 +486,54 @@ func (fm *FleetManager) HandleWebSocket(w http.ResponseWriter, r *http.Request,
server := &GameServerInstance{
ID: serverUUID.String(),
State: GameServerStateIdle,
Identity: identity,
Metadata: make(map[string]any),
Conn: conn,
ConnectedAt: time.Now(),
stopPing: make(chan struct{}),
registered: make(chan struct{}),
}

// Start registration timeout if configured
if fm.registrationTimeout > 0 {
go fm.registrationTimeoutLoop(server)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces authMiddleware and registrationTimeout, but there is no wiring in this repo that constructs/sets them (no call sites for NewAuthMiddleware, NewValidationCache, NewRateLimiter, SetAuthMiddleware, or SetRegistrationTimeout). As a result, WebSocket auth gating and registration timeouts will never be active in production unless external code manually sets these fields. Consider initializing these features in InitModule (using config + DB handle) or documenting clearly how they are enabled.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +102
// Verify that only explicitly Set entries are retrievable.
// A nil identity should not be cached.
cache.Set("key1", nil)

got := cache.Get("key1")
// Even if nil was set, get should return it — but the caller
// should never set nil (only successful validations).
// The key point: there's no SetFailure method.
// Cache consumers must only call Set with valid identities.
_ = got
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestValidationCache_NoFailureCaching doesn't assert any behavior (it assigns got to _), so it won't fail even if the cache behavior regresses. Consider either removing it or converting it into a real assertion (e.g., make Set reject nil identities and assert that the key is not present afterwards).

Suggested change
// Verify that only explicitly Set entries are retrievable.
// A nil identity should not be cached.
cache.Set("key1", nil)
got := cache.Get("key1")
// Even if nil was set, get should return it — but the caller
// should never set nil (only successful validations).
// The key point: there's no SetFailure method.
// Cache consumers must only call Set with valid identities.
_ = got
// Verify that only non-nil identities are cached.
// A nil identity should not be cached.
cache.Set("key1", nil)
if got := cache.Get("key1"); got != nil {
t.Errorf("expected no cached entry for key1 when setting nil identity, got %#v", got)
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants