-
Notifications
You must be signed in to change notification settings - Fork 192
fix: cache vars and remapping normalization #2318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…en, cleanup skip/include variables when we get normalized op from cache
WalkthroughAdds two new caches (variables normalization and remap variables), wires them into graph server initialization and shutdown, passes them into the operation processor, extends variable normalization/remapping flows to read/write caches and surface cache-hit flags/headers/OTel attributes, updates related call sites and tests, and removes Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/core/operation_processor.go (1)
944-979: Copy cached variables before storing entries
VariablesNormalizationCacheEntry.variablescurrently aliaseso.parsedOperation.Request.Variables, which is backed by the pooledparseKitbuffer. After the entry is cached, the kit gets reused (OperationProcessor.freeKit), so later requests mutate that shared backing array, corrupting the cached payload and any future hits.Please copy the slice before caching so the entry owns its data. Apply the same fix in both places where we construct
VariablesNormalizationCacheEntry.@@ - entry := VariablesNormalizationCacheEntry{ + cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...) + entry := VariablesNormalizationCacheEntry{ uploadsMapping: uploadsMapping, id: o.parsedOperation.ID, normalizedRepresentation: o.parsedOperation.NormalizedRepresentation, - variables: o.parsedOperation.Request.Variables, + variables: cachedVariables, reparse: false, } @@ - entry := VariablesNormalizationCacheEntry{ + cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...) + entry := VariablesNormalizationCacheEntry{ uploadsMapping: uploadsMapping, id: o.parsedOperation.ID, normalizedRepresentation: o.parsedOperation.NormalizedRepresentation, - variables: o.parsedOperation.Request.Variables, + variables: cachedVariables, reparse: true, }
🧹 Nitpick comments (1)
router/core/graph_server.go (1)
707-731: Expose cache metrics for the new cachesThe new variables/remap caches are now part of the mux, but we never register them with
metricInfos, so their hit/miss stats stay invisible in OTLP/Prometheus. Please append both caches tometricInfosalongside the existing ones so operators can monitor them.@@ if s.normalizationCache != nil { metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("query_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.normalizationCache.Metrics)) } + if s.variablesNormalizationCache != nil { + metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.variablesNormalizationCache.Metrics)) + } + + if s.remapVariablesCache != nil { + metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_remap", srv.engineExecutionConfiguration.NormalizationCacheSize, s.remapVariablesCache.Metrics)) + } + if s.persistedOperationCache != nil { metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("persisted_query_normalization", 1024, s.persistedOperationCache.Metrics)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router-tests/normalization_cache_test.go(1 hunks)router/core/graph_server.go(4 hunks)router/core/operation_processor.go(10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router-tests/normalization_cache_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router/core/graph_server.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (4)
Run(105-122)Config(284-341)Environment(1731-1767)GraphQLRequest(1907-1915)router/core/graphql_handler.go (1)
NormalizationCacheHeader(39-39)
router/core/graph_server.go (1)
router/core/operation_processor.go (3)
NormalizationCacheEntry(755-759)VariablesNormalizationCacheEntry(761-767)RemapVariablesCacheEntry(769-773)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_test
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: image_scan
- GitHub Check: integration_test (./telemetry)
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
router/core/graph_server.go
Outdated
| if s.planCache != nil { | ||
| s.planCache.Close() | ||
| } | ||
|
|
||
| if s.persistedOperationCache != nil { | ||
| s.persistedOperationCache.Close() | ||
| } | ||
|
|
||
| if s.normalizationCache != nil { | ||
| s.normalizationCache.Close() | ||
| } | ||
|
|
||
| if s.variablesNormalizationCache != nil { | ||
| s.variablesNormalizationCache.Close() | ||
| } | ||
|
|
||
| if s.remapVariablesCache != nil { | ||
| s.remapVariablesCache.Close() | ||
| } | ||
|
|
||
| if s.complexityCalculationCache != nil { | ||
| s.complexityCalculationCache.Close() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move this to a function to reduce the size a little bit. Something like
// Or define a unexported interface with a close function
func closeResources[T io.Closer](caches ...T) {
for _, cache := range caches {
if cache != nil {
cache.Close()
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache does not satisfy the "io.Closer" interface. I will simplify it by removing redundant checks for nils. Method "Close" has that check.
router/core/operation_processor.go
Outdated
| // fmt.Println("####### NormalizeVariables: variables len:", len(o.kit.doc.Input.Variables)) | ||
| // fmt.Println("####### NormalizeVariables: variables:", string(o.kit.doc.Input.Variables)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed?
There are multiple comments with fmt statements in this file. Should they be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| type VariablesNormalizationCacheEntry struct { | ||
| normalizedRepresentation string | ||
| uploadsMapping []uploads.UploadPathMapping | ||
| id uint64 | ||
| variables json.RawMessage | ||
| reparse bool | ||
| } | ||
|
|
||
| type RemapVariablesCacheEntry struct { | ||
| internalID uint64 | ||
| normalizedRepresentation string | ||
| remapVariables map[string]string | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a few comments and descriptions here to understand the purpose and the fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a couple of comments there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
router/core/operation_processor.go (2)
813-817: Consider handling or documenting jsonparser.Delete errors.The
jsonparser.Deletecalls on Line 816 can return errors, but they're being silently ignored. While delete operations on valid JSON should rarely fail, consider either:
- Handling the error (log or return it)
- Adding a comment explaining why it's safe to ignore
Example:
for _, varName := range skipIncludeVariableNames { - o.parsedOperation.Request.Variables = jsonparser.Delete(o.parsedOperation.Request.Variables, varName) + var err error + o.parsedOperation.Request.Variables, err = jsonparser.Delete(o.parsedOperation.Request.Variables, varName) + if err != nil { + // Log or handle error + } }
943-950: Address or remove the TODO-style comment.The comment on Line 943 suggests uncertainty about
keyGenreset discipline: "should not be needed if we properly reset after use - check do we have any remaining places where we do not reset keygen - maybe wrap into a type which will reset once we got key".After reviewing the code, the resets appear properly placed (lines 885, 950, 995, 1045, 1053, 1193, 1202). Consider either:
- Removing the comment if the concern has been addressed
- Creating a wrapper type as suggested for better encapsulation
- Converting this to a proper TODO with a tracking ticket
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router/core/operation_processor.go(12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
GraphQLRequest(1909-1917)
🔇 Additional comments (7)
router/core/operation_processor.go (7)
48-78: LGTM! Documentation improvements.The enhanced comments provide clearer explanations of the field purposes and when they're available.
106-128: LGTM! Cache fields properly added.The new cache fields follow the established naming conventions and type patterns used for other caches in the struct.
163-176: LGTM! Cache fields properly integrated.The cache fields are correctly added to the OperationCache struct following existing patterns.
767-794: LGTM! Well-documented cache entry types.The cache entry structures are well-designed with clear documentation. The
reparseflag inVariablesNormalizationCacheEntryis a smart optimization to avoid unnecessary document reparsing when the normalized form hasn't changed.
889-990: LGTM! Well-implemented caching logic with smart optimization.The caching implementation is solid:
- Cache key generation appropriately uses both variables and normalized representation
- The
reparseflag optimization avoids unnecessary document reparsing when the normalized form is unchanged (lines 954-963)- Proper error handling and cache miss fallback
- Upload mappings are correctly preserved in cache entries
992-1073: LGTM! RemapVariables caching correctly implemented.The caching logic is well-implemented:
- Cache key appropriately uses only the normalized representation (line 993), since variable remapping depends on operation structure, not input variable values
- Correct handling of cache hits with document reparsing (lines 1008-1010)
- Proper error handling and fallback
- keyGen properly reset after each use
1432-1441: LGTM! Cache initialization properly wired.The new caches are correctly integrated into the
NewOperationProcessorinitialization:
- Condition check includes the new cache types (lines 1432-1433)
- Fields properly assigned from options (lines 1439-1440)
- Follows the established pattern for other caches
Use cache for variables normalization and variables remapping.
Closes ENG-8477
Summary by CodeRabbit
Performance
Tests