feat: Fleet authentication middleware, config, and server authorization#3
feat: Fleet authentication middleware, config, and server authorization#3
Conversation
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
…n cache and rate limiter Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
… authorization Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
…ecycle Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
Co-authored-by: thesprockee <962164+thesprockee@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds FleetManager API-key authentication components (middleware, caching, rate limiting) and extends FleetManager connection handling to support pre-upgrade auth gating and registration-timeout enforcement, along with config/schema updates and tests.
Changes:
- Add
server/authpackage with API key parsing/validation, TTL cache, and per-IP sliding-window rate limiting. - Gate WebSocket upgrades behind authentication (when enabled), persist server identity, and enforce API-key uniqueness at registration time.
- Expand config surface (
fleet_managerYAML +FM_*env vars) and add/update unit tests.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
server/evr_fleet_manager.go |
Adds auth middleware hook, stores server identity, registration timeout loop, and registration authorization checks. |
server/evr_fleet_manager_handlers.go |
Updates registration handler to call new authorization signature and signal registration completion. |
server/evr_fleet_manager_test.go |
Adds tests for server registration authorization and WebSocket behavior without middleware. |
server/auth/middleware.go |
Implements API key extraction/parsing, rate limiting, cache lookup, DB + bcrypt validation, and HTTP JSON error responses. |
server/auth/cache.go |
Adds success-only TTL validation cache with cleanup goroutine. |
server/auth/ratelimit.go |
Adds per-IP sliding-window rate limiter with cleanup goroutine. |
server/auth/auth_test.go |
Adds unit tests for cache, rate limiter, and parsing helpers. |
internal/config/config.go |
Adds FleetManagerConfig and FM_* env var overrides for auth/connection settings. |
internal/config/config_test.go |
Adds coverage for FleetManager config defaults, YAML loading, and env overrides. |
configs/config.yaml |
Documents and provides default fleet_manager config values. |
cmd/app/main.go |
Removes stale commented example block; clarifies this is a dev harness. |
go.mod / go.sum |
Adds golang.org/x/crypto dependency for bcrypt. |
.gitignore |
Ignores app binary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // getClientIP extracts the client IP from the request. | ||
| func getClientIP(r *http.Request) string { | ||
| // Check X-Forwarded-For header first | ||
| if xff := r.Header.Get("X-Forwarded-For"); xff != "" { | ||
| parts := strings.SplitN(xff, ",", 2) | ||
| return strings.TrimSpace(parts[0]) | ||
| } | ||
|
|
||
| // Check X-Real-IP header | ||
| if xri := r.Header.Get("X-Real-IP"); xri != "" { | ||
| return xri | ||
| } | ||
|
|
||
| // Fall back to RemoteAddr | ||
| host, _, err := net.SplitHostPort(r.RemoteAddr) | ||
| if err != nil { | ||
| return r.RemoteAddr | ||
| } |
There was a problem hiding this comment.
getClientIP trusts X-Forwarded-For / X-Real-IP unconditionally. If this handler is reachable without a trusted proxy stripping/setting these headers, clients can spoof their IP to bypass per-IP rate limiting. Consider only honoring these headers when the immediate peer (RemoteAddr) is a trusted proxy (or make this behavior configurable), otherwise fall back to RemoteAddr.
| // getClientIP extracts the client IP from the request. | |
| func getClientIP(r *http.Request) string { | |
| // Check X-Forwarded-For header first | |
| if xff := r.Header.Get("X-Forwarded-For"); xff != "" { | |
| parts := strings.SplitN(xff, ",", 2) | |
| return strings.TrimSpace(parts[0]) | |
| } | |
| // Check X-Real-IP header | |
| if xri := r.Header.Get("X-Real-IP"); xri != "" { | |
| return xri | |
| } | |
| // Fall back to RemoteAddr | |
| host, _, err := net.SplitHostPort(r.RemoteAddr) | |
| if err != nil { | |
| return r.RemoteAddr | |
| } | |
| // isTrustedProxy reports whether the given IP address belongs to a trusted proxy. | |
| // By default, loopback and private address ranges are considered trusted. | |
| func isTrustedProxy(ip string) bool { | |
| parsed := net.ParseIP(ip) | |
| if parsed == nil { | |
| return false | |
| } | |
| if parsed.IsLoopback() || parsed.IsPrivate() { | |
| return true | |
| } | |
| return false | |
| } | |
| // getClientIP extracts the client IP from the request. | |
| func getClientIP(r *http.Request) string { | |
| // Derive the immediate peer IP from RemoteAddr. | |
| host, _, err := net.SplitHostPort(r.RemoteAddr) | |
| if err != nil { | |
| host = r.RemoteAddr | |
| } | |
| // Only trust forwarding headers if the immediate peer is a trusted proxy. | |
| if isTrustedProxy(host) { | |
| // Check X-Forwarded-For header first. | |
| if xff := r.Header.Get("X-Forwarded-For"); xff != "" { | |
| parts := strings.SplitN(xff, ",", 2) | |
| return strings.TrimSpace(parts[0]) | |
| } | |
| // Check X-Real-IP header. | |
| if xri := r.Header.Get("X-Real-IP"); xri != "" { | |
| return xri | |
| } | |
| } | |
| // Fall back to the immediate peer IP. |
| // Async update last_used_at | ||
| go func() { | ||
| _, _ = am.db.Exec( | ||
| `UPDATE server_api_keys SET last_used_at = now() WHERE key_id = $1`, | ||
| keyID, | ||
| ) | ||
| }() |
There was a problem hiding this comment.
validateKeyFromDB spawns a goroutine per successful authentication to update last_used_at and ignores any errors. Under load this can create unnecessary goroutine churn and mask DB issues. Consider performing the update inline with a context/timeout, or funneling updates through a bounded worker/queue and logging failures via am.logger.
| } | ||
|
|
||
| // cleanupLoop periodically removes expired entries from the cache. | ||
| func (vc *ValidationCache) cleanupLoop(ctx context.Context) { |
There was a problem hiding this comment.
NewValidationCache starts a ticker with cleanupInterval without validating it. time.NewTicker(<=0) panics, so misconfiguration (e.g. env var set to 0/negative) can crash the process. Consider guarding against non-positive intervals (disable cleanup or clamp to a minimum).
| func (vc *ValidationCache) cleanupLoop(ctx context.Context) { | |
| func (vc *ValidationCache) cleanupLoop(ctx context.Context) { | |
| if vc.cleanupInterval <= 0 { | |
| // Disable cleanup when interval is non-positive to avoid time.NewTicker panic. | |
| return | |
| } |
| // Set stores a successful validation result in the cache with the configured TTL. | ||
| func (vc *ValidationCache) Set(keyID string, identity *ServerIdentity) { | ||
| vc.mu.Lock() | ||
| defer vc.mu.Unlock() | ||
|
|
||
| vc.entries[keyID] = &cacheEntry{ | ||
| Identity: identity, | ||
| ExpiresAt: time.Now().Add(vc.ttl), | ||
| } |
There was a problem hiding this comment.
ValidationCache.Set will happily store a nil identity, creating an entry that will never produce a cache hit (callers check != nil) and will linger until cleanup. Consider rejecting nil (no-op or panic) so cache usage stays consistent with the “success-only caching” contract.
| // cleanupLoop periodically removes expired entries from the rate limiter. | ||
| func (rl *RateLimiter) cleanupLoop(ctx context.Context) { | ||
| ticker := time.NewTicker(rl.cleanupInterval) | ||
| defer ticker.Stop() | ||
|
|
There was a problem hiding this comment.
NewRateLimiter starts a ticker with cleanupInterval without validating it. time.NewTicker(<=0) panics, so a bad config/env value can crash the process. Consider guarding against non-positive intervals (disable cleanup or clamp to a safe minimum).
| // registrationTimeoutLoop disconnects servers that don't register within the timeout. | ||
| func (fm *FleetManager) registrationTimeoutLoop(server *GameServerInstance) { | ||
| select { | ||
| case <-server.registered: | ||
| // Server registered successfully | ||
| fm.logger.Debug("Server registered within timeout", zap.String("server_id", server.ID)) | ||
| return | ||
| case <-server.stopPing: | ||
| // Server disconnected | ||
| return | ||
| case <-fm.ctx.Done(): | ||
| // FleetManager shutting down | ||
| return | ||
| case <-time.After(fm.registrationTimeout): | ||
| fm.logger.Warn("Server registration timeout, disconnecting", |
There was a problem hiding this comment.
registrationTimeoutLoop uses time.After directly in a select. If the server registers/disconnects before the timeout, the underlying timer still fires later and can accumulate for many connections (GC only after it fires). Prefer time.NewTimer and defer timer.Stop() so the timer can be stopped when another case wins.
| if existing.Identity != nil && existing.Identity.KeyID == server.Identity.KeyID { | ||
| existing.mu.RUnlock() | ||
| fm.serversMu.RUnlock() | ||
| return fmt.Errorf("API key already in use by another server") |
There was a problem hiding this comment.
authorizeServerRegistration uses fmt.Errorf without any formatting directives. Prefer errors.New here to avoid implying formatted output and to keep error creation consistent.
| return fmt.Errorf("API key already in use by another server") | |
| return errors.New("API key already in use by another server") |
| // 3. Rate limit check on client IP | ||
| clientIP := getClientIP(r) | ||
| if !am.rateLimiter.Allow(clientIP) { | ||
| return nil, ErrRateLimited | ||
| } | ||
|
|
||
| // 4. Cache check for key_id | ||
| if identity := am.cache.Get(keyID); identity != nil { | ||
| return identity, nil | ||
| } | ||
|
|
||
| // 5. Direct DB query | ||
| identity, err := am.validateKeyFromDB(keyID, secret) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // 6. Cache successful validation | ||
| am.cache.Set(keyID, identity) | ||
|
|
There was a problem hiding this comment.
Authenticate dereferences am.rateLimiter, am.cache, and am.db without any nil checks. Because AuthMiddleware is an exported struct (and is instantiated as &auth.AuthMiddleware{} in tests), this can panic at runtime if the middleware is not fully initialized. Consider validating required dependencies in NewAuthMiddleware (or making fields unexported and forcing construction via the constructor) and returning a clear error from Authenticate when required components are missing.
| @@ -446,16 +486,54 @@ func (fm *FleetManager) HandleWebSocket(w http.ResponseWriter, r *http.Request, | |||
| server := &GameServerInstance{ | |||
| ID: serverUUID.String(), | |||
| State: GameServerStateIdle, | |||
| Identity: identity, | |||
| Metadata: make(map[string]any), | |||
| Conn: conn, | |||
| ConnectedAt: time.Now(), | |||
| stopPing: make(chan struct{}), | |||
| registered: make(chan struct{}), | |||
| } | |||
|
|
|||
| // Start registration timeout if configured | |||
| if fm.registrationTimeout > 0 { | |||
| go fm.registrationTimeoutLoop(server) | |||
There was a problem hiding this comment.
This PR introduces authMiddleware and registrationTimeout, but there is no wiring in this repo that constructs/sets them (no call sites for NewAuthMiddleware, NewValidationCache, NewRateLimiter, SetAuthMiddleware, or SetRegistrationTimeout). As a result, WebSocket auth gating and registration timeouts will never be active in production unless external code manually sets these fields. Consider initializing these features in InitModule (using config + DB handle) or documenting clearly how they are enabled.
| // Verify that only explicitly Set entries are retrievable. | ||
| // A nil identity should not be cached. | ||
| cache.Set("key1", nil) | ||
|
|
||
| got := cache.Get("key1") | ||
| // Even if nil was set, get should return it — but the caller | ||
| // should never set nil (only successful validations). | ||
| // The key point: there's no SetFailure method. | ||
| // Cache consumers must only call Set with valid identities. | ||
| _ = got |
There was a problem hiding this comment.
TestValidationCache_NoFailureCaching doesn't assert any behavior (it assigns got to _), so it won't fail even if the cache behavior regresses. Consider either removing it or converting it into a real assertion (e.g., make Set reject nil identities and assert that the key is not present afterwards).
| // Verify that only explicitly Set entries are retrievable. | |
| // A nil identity should not be cached. | |
| cache.Set("key1", nil) | |
| got := cache.Get("key1") | |
| // Even if nil was set, get should return it — but the caller | |
| // should never set nil (only successful validations). | |
| // The key point: there's no SetFailure method. | |
| // Cache consumers must only call Set with valid identities. | |
| _ = got | |
| // Verify that only non-nil identities are cached. | |
| // A nil identity should not be cached. | |
| cache.Set("key1", nil) | |
| if got := cache.Get("key1"); got != nil { | |
| t.Errorf("expected no cached entry for key1 when setting nil identity, got %#v", got) | |
| } |
Implements the
nevr-fleetmanagertasks from the fleet auth dashboard spec: API key auth middleware for WebSocket connections, validation caching, rate limiting, config expansion, and server registration authorization.Config (
internal/config/)FleetManagerConfigwithCacheTTL,RateLimitPerMinute,RegistrationTimeout,PingInterval,IdleTimeoutand correspondingFM_*env var overridesAuth package (
server/auth/)cache.go—ValidationCachewith TTL expiration, context-stoppable cleanup. Success-only caching (failures never cached).ratelimit.go— Sliding windowRateLimiterper IP, configurable max/min.middleware.go—AuthMiddleware.Authenticate(r *http.Request)extractsAuthorization: Bearer fm_<16hex>_<32hex>, rate-limits, checks cache, falls back to DB + bcrypt, caches on success. Returns typed errors (ErrKeyRevoked,ErrRateLimited,ErrInvalidKey).WebSocket auth gate (
server/evr_fleet_manager.go)HandleWebSocketnow authenticates beforeupgrader.Upgrade()— rejects with 401/403/429 at HTTP levelServerIdentitystored onGameServerInstance.Identityafter authServer registration authorization
authorizeServerRegistrationreplaced from stub → checks identity is present (when middleware active) and enforces API key uniqueness across connected serverscmd/app/main.goNewEVRFleetManager,Start,Stop)42 tests pass with
-race. CodeQL clean.golang.org/x/crypto/bcryptadded (v0.48.0, no advisories).✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.