Skip to content

Conversation

@devsergiy
Copy link
Member

@devsergiy devsergiy commented Nov 5, 2025

Use cache for variables normalization and variables remapping.

Closes ENG-8477

Summary by CodeRabbit

  • Performance

    • Added dedicated caching for variable normalization and variable remapping to improve request processing and surface cache-hit visibility.
    • New debug headers report variable-normalization and variable-remapping HIT/MISS state for requests.
  • Tests

    • Added tests covering normalization and remapping cache scenarios, including variable extraction, operation isolation, and list/value coercion; tests assert cache-hit headers and responses.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds two new caches (variables normalization and remap variables), wires them into graph server initialization and shutdown, passes them into the operation processor, extends variable normalization/remapping flows to read/write caches and surface cache-hit flags/headers/OTel attributes, updates related call sites and tests, and removes operationID from NormalizationCacheEntry.

Changes

Cohort / File(s) Summary
Tests
router-tests/normalization_cache_test.go
Adds cacheHit type, assertCacheHeaders helper, and TestAdditionalNormalizationCaches with multiple parallel subtests validating normalization, variables normalization, remapping cache hit/miss behavior and response payloads.
Graph server / cache lifecycle
router/core/graph_server.go
Adds variablesNormalizationCache and remapVariablesCache fields to graphMux; initializes both caches in buildOperationCaches, passes them into operation processor options, and closes them in Shutdown.
Operation processor & cache types
router/core/operation_processor.go
Adds VariablesNormalizationCacheEntry and RemapVariablesCacheEntry types; extends OperationProcessorOptions and OperationCache with new cache fields; implements cache-key helpers and read/write flows for variables normalization and remapping; removes operationID from NormalizationCacheEntry.
Warmup / call-site updates
router/core/cache_warmup.go
Updates callers to the new return arities: NormalizeVariables now returns three values and RemapVariables now returns two; call sites adapted to capture/cache booleans and other returned values.
Operation/context state
router/core/context.go, router/core/graphql_prehandler.go, router/core/websocket.go
Adds variablesNormalizationCacheHit and variablesRemappingCacheHit fields to operation context; surfaces cache-hit booleans from NormalizeVariables/RemapVariables, records them on operation context and trace spans.
HTTP headers & telemetry
router/core/graphql_handler.go, router/pkg/otel/attributes.go
Adds VariablesNormalizationCacheHeader and VariablesRemappingCacheHeader; refactors debug cache header emission to centralize HIT/MISS formatting; adds OTel attribute keys WgVariablesNormalizationCacheHit and WgVariablesRemappingCacheHit.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • All updated function signatures for NormalizeVariables and RemapVariables and every call site.
    • Removal of operationID from NormalizationCacheEntry and any dependent logic.
    • Cache key generation functions (normalizeVariablesCacheKey, remapVariablesCacheKey) for determinism and collision risk.
    • Concurrency, initialization, and shutdown of the new ristretto caches in graphMux.
    • Consistency between cache-hit flags, emitted HTTP headers, and OTel attributes, including tests that assert those headers.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: cache vars and remapping normalization' directly aligns with the main objective of the PR: implementing caching for variables normalization and variables remapping across the router core.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-77a722863b33ae4fc973e0b03934c1d0ec0fe2f6-nonroot

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/operation_processor.go (1)

944-979: Copy cached variables before storing entries

VariablesNormalizationCacheEntry.variables currently aliases o.parsedOperation.Request.Variables, which is backed by the pooled parseKit buffer. After the entry is cached, the kit gets reused (OperationProcessor.freeKit), so later requests mutate that shared backing array, corrupting the cached payload and any future hits.

Please copy the slice before caching so the entry owns its data. Apply the same fix in both places where we construct VariablesNormalizationCacheEntry.

@@
-		entry := VariablesNormalizationCacheEntry{
+		cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...)
+		entry := VariablesNormalizationCacheEntry{
 			uploadsMapping:           uploadsMapping,
 			id:                       o.parsedOperation.ID,
 			normalizedRepresentation: o.parsedOperation.NormalizedRepresentation,
-			variables:                o.parsedOperation.Request.Variables,
+			variables:                cachedVariables,
 			reparse:                  false,
 		}
@@
-		entry := VariablesNormalizationCacheEntry{
+		cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...)
+		entry := VariablesNormalizationCacheEntry{
 			uploadsMapping:           uploadsMapping,
 			id:                       o.parsedOperation.ID,
 			normalizedRepresentation: o.parsedOperation.NormalizedRepresentation,
-			variables:                o.parsedOperation.Request.Variables,
+			variables:                cachedVariables,
 			reparse:                  true,
 		}
🧹 Nitpick comments (1)
router/core/graph_server.go (1)

707-731: Expose cache metrics for the new caches

The new variables/remap caches are now part of the mux, but we never register them with metricInfos, so their hit/miss stats stay invisible in OTLP/Prometheus. Please append both caches to metricInfos alongside the existing ones so operators can monitor them.

@@
 	if s.normalizationCache != nil {
 		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("query_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.normalizationCache.Metrics))
 	}
 
+	if s.variablesNormalizationCache != nil {
+		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.variablesNormalizationCache.Metrics))
+	}
+
+	if s.remapVariablesCache != nil {
+		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_remap", srv.engineExecutionConfiguration.NormalizationCacheSize, s.remapVariablesCache.Metrics))
+	}
+
 	if s.persistedOperationCache != nil {
 		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("persisted_query_normalization", 1024, s.persistedOperationCache.Metrics))
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f865487 and 5096314.

📒 Files selected for processing (3)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/graph_server.go (4 hunks)
  • router/core/operation_processor.go (10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/normalization_cache_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-341)
  • Environment (1731-1767)
  • GraphQLRequest (1907-1915)
router/core/graphql_handler.go (1)
  • NormalizationCacheHeader (39-39)
router/core/graph_server.go (1)
router/core/operation_processor.go (3)
  • NormalizationCacheEntry (755-759)
  • VariablesNormalizationCacheEntry (761-767)
  • RemapVariablesCacheEntry (769-773)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)

Comment on lines 741 to 763
if s.planCache != nil {
s.planCache.Close()
}

if s.persistedOperationCache != nil {
s.persistedOperationCache.Close()
}

if s.normalizationCache != nil {
s.normalizationCache.Close()
}

if s.variablesNormalizationCache != nil {
s.variablesNormalizationCache.Close()
}

if s.remapVariablesCache != nil {
s.remapVariablesCache.Close()
}

if s.complexityCalculationCache != nil {
s.complexityCalculationCache.Close()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move this to a function to reduce the size a little bit. Something like

// Or define a unexported interface with a close function
func closeResources[T io.Closer](caches ...T) {
  for _, cache := range caches {
    if cache != nil {
      cache.Close()
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache does not satisfy the "io.Closer" interface. I will simplify it by removing redundant checks for nils. Method "Close" has that check.

Comment on lines 863 to 864
// fmt.Println("####### NormalizeVariables: variables len:", len(o.kit.doc.Input.Variables))
// fmt.Println("####### NormalizeVariables: variables:", string(o.kit.doc.Input.Variables))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed?

There are multiple comments with fmt statements in this file. Should they be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 761 to 774
type VariablesNormalizationCacheEntry struct {
normalizedRepresentation string
uploadsMapping []uploads.UploadPathMapping
id uint64
variables json.RawMessage
reparse bool
}

type RemapVariablesCacheEntry struct {
internalID uint64
normalizedRepresentation string
remapVariables map[string]string
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a few comments and descriptions here to understand the purpose and the fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a couple of comments there

@ysmolski ysmolski self-assigned this Nov 11, 2025
@ysmolski ysmolski changed the title fix: normalization cache fix: cache vars and remapping normalization Nov 13, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router/core/operation_processor.go (2)

813-817: Consider handling or documenting jsonparser.Delete errors.

The jsonparser.Delete calls on Line 816 can return errors, but they're being silently ignored. While delete operations on valid JSON should rarely fail, consider either:

  1. Handling the error (log or return it)
  2. Adding a comment explaining why it's safe to ignore

Example:

 for _, varName := range skipIncludeVariableNames {
-	o.parsedOperation.Request.Variables = jsonparser.Delete(o.parsedOperation.Request.Variables, varName)
+	var err error
+	o.parsedOperation.Request.Variables, err = jsonparser.Delete(o.parsedOperation.Request.Variables, varName)
+	if err != nil {
+		// Log or handle error
+	}
 }

943-950: Address or remove the TODO-style comment.

The comment on Line 943 suggests uncertainty about keyGen reset discipline: "should not be needed if we properly reset after use - check do we have any remaining places where we do not reset keygen - maybe wrap into a type which will reset once we got key".

After reviewing the code, the resets appear properly placed (lines 885, 950, 995, 1045, 1053, 1193, 1202). Consider either:

  1. Removing the comment if the concern has been addressed
  2. Creating a wrapper type as suggested for better encapsulation
  3. Converting this to a proper TODO with a tracking ticket
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2bf4a4 and dcbf441.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
🔇 Additional comments (7)
router/core/operation_processor.go (7)

48-78: LGTM! Documentation improvements.

The enhanced comments provide clearer explanations of the field purposes and when they're available.


106-128: LGTM! Cache fields properly added.

The new cache fields follow the established naming conventions and type patterns used for other caches in the struct.


163-176: LGTM! Cache fields properly integrated.

The cache fields are correctly added to the OperationCache struct following existing patterns.


767-794: LGTM! Well-documented cache entry types.

The cache entry structures are well-designed with clear documentation. The reparse flag in VariablesNormalizationCacheEntry is a smart optimization to avoid unnecessary document reparsing when the normalized form hasn't changed.


889-990: LGTM! Well-implemented caching logic with smart optimization.

The caching implementation is solid:

  • Cache key generation appropriately uses both variables and normalized representation
  • The reparse flag optimization avoids unnecessary document reparsing when the normalized form is unchanged (lines 954-963)
  • Proper error handling and cache miss fallback
  • Upload mappings are correctly preserved in cache entries

992-1073: LGTM! RemapVariables caching correctly implemented.

The caching logic is well-implemented:

  • Cache key appropriately uses only the normalized representation (line 993), since variable remapping depends on operation structure, not input variable values
  • Correct handling of cache hits with document reparsing (lines 1008-1010)
  • Proper error handling and fallback
  • keyGen properly reset after each use

1432-1441: LGTM! Cache initialization properly wired.

The new caches are correctly integrated into the NewOperationProcessor initialization:

  • Condition check includes the new cache types (lines 1432-1433)
  • Fields properly assigned from options (lines 1439-1440)
  • Follows the established pattern for other caches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants