fix: fail-fast on missing or invalid DCR_ENCRYPTION_KEY#60
fix: fail-fast on missing or invalid DCR_ENCRYPTION_KEY#60luis5tb merged 3 commits intoRHEcosystemAppEng:mainfrom
Conversation
Code ReviewThis PR addresses a critical security vulnerability where client secrets could be encrypted with ephemeral Fernet keys. The validation logic is sound, but there's a critical gap in production safety. 🔴 Critical Issue: Lazy Initialization BypassProblem: Current Flow: Root Cause:
Impact: Service appears healthy but fails on first DCR use. Missing/invalid ✅ Required Fix: Eager Validation at StartupAdd DCR initialization to @asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
"""Application lifespan manager for startup/shutdown events."""
settings = get_settings()
# Startup: Initialize database
try:
from lightspeed_agent.db import init_database
logger.info("Initializing database: %s", settings.database_url.split("@")[-1])
await init_database()
logger.info("Database initialized successfully")
except Exception as e:
logger.error("Failed to initialize database: %s", e)
raise
# Startup: Validate DCR configuration (fail-fast on Cloud Run)
# This ensures DCR_ENCRYPTION_KEY is valid BEFORE the service becomes ready,
# preventing silent failures when the first DCR request arrives.
try:
from lightspeed_agent.dcr import get_dcr_service
logger.info("Validating DCR service configuration")
get_dcr_service() # Triggers DCRService.__init__() validation
logger.info("DCR service initialized successfully")
except Exception as e:
logger.error("Failed to initialize DCR service: %s", e)
raise
yield
# Shutdown: Close database connection
try:
from lightspeed_agent.db import close_database
logger.info("Closing database connection")
await close_database()
except Exception as e:
logger.error("Failed to close database: %s", e)Why this works:
🟡 Recommended: Add Test CoverageAdd tests for the new error conditions in @pytest.mark.asyncio
async def test_dcr_service_missing_key_on_cloud_run(monkeypatch, db_session):
"""Test that DCRService raises ValueError when DCR_ENCRYPTION_KEY is missing on Cloud Run."""
monkeypatch.setenv("K_SERVICE", "test-marketplace-handler")
monkeypatch.delenv("DCR_ENCRYPTION_KEY", raising=False)
with pytest.raises(ValueError, match="DCR_ENCRYPTION_KEY is required in production"):
DCRService()
@pytest.mark.asyncio
async def test_dcr_service_invalid_encryption_key(monkeypatch, db_session):
"""Test that DCRService raises ValueError for invalid DCR_ENCRYPTION_KEY."""
monkeypatch.setenv("DCR_ENCRYPTION_KEY", "not-a-valid-fernet-key")
with pytest.raises(ValueError, match="Invalid DCR_ENCRYPTION_KEY"):
DCRService()
@pytest.mark.asyncio
async def test_encrypt_secret_without_key_raises(service):
"""Test that _encrypt_secret raises RuntimeError when _fernet is None."""
service._fernet = None
with pytest.raises(RuntimeError, match="Cannot encrypt client secret"):
service._encrypt_secret("test-secret")VerificationCloud Run deployment config: ✅ Verified
RecommendationApprove with required fix:
Priority: CRITICAL - Without eager initialization, Cloud Run can start with invalid config. With the eager initialization fix, this PR will provide complete air-tight protection against missing/invalid encryption keys in Cloud Run production deployments. |
IlonaShishov
left a comment
There was a problem hiding this comment.
LGTM,
please consider if worth upgrading to eager fail-fast strategy suggested by Claude.
Previously, _encrypt_secret() silently generated an ephemeral Fernet key when DCR_ENCRYPTION_KEY was not configured. Secrets encrypted with an ephemeral key become permanently unrecoverable after a container restart. - Raise RuntimeError in _encrypt_secret() when no Fernet cipher is available instead of generating an ephemeral key - Raise ValueError in DCRService.__init__() when DCR_ENCRYPTION_KEY is missing on Cloud Run (K_SERVICE is set) - Raise ValueError on invalid DCR_ENCRYPTION_KEY at construction time instead of logging and continuing with _fernet=None - Add a stable Fernet key to test conftest so DCR tests have a valid encryption key available Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move DCR service initialization from lazy (first-request) to eager (lifespan startup) so Cloud Run fails to start with missing/invalid DCR_ENCRYPTION_KEY instead of appearing healthy and failing on first DCR request. Add test coverage for encryption key error paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
added! thanks! |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
/retest |
Previously, _encrypt_secret() silently generated an ephemeral Fernet key when DCR_ENCRYPTION_KEY was not configured. Secrets encrypted with an ephemeral key become permanently unrecoverable after a container restart.