Skip to content

Conversation

@kkk777-7
Copy link
Member

@kkk777-7 kkk777-7 commented Nov 16, 2025

What this PR does / why we need it:

Introduce Translator Context to improve GatewayAPI translator performance.

Now, Gateway API translator methods hold a lot of local maps to improve retrievals.
These can be all moved to a common preprocessing step, saved in the Translator context and reused across methods.

Also each resource is currently retrieved by performing a linear search over resource.Resources.
Therefore, for resources that are accessed frequently, maintaining a map is expected to improve CPU performance.

This PR's scope

  • Service Map
  • NamespaceMap
  • ServiceImportMap
  • BackendMap
  • SecretMap
  • ConfigMapMap
  • ClusterTrustBundleMap
  • EndpointSliceMap

Although the following items were initially considered, the improvements were minimal and would increase the complexity, so they were excluded from the scope. Ref : #7535 (comment)

  • Policy Target Gateway Map
  • Policy Target Route Map

Which issue(s) this PR fixes:

Fixes #6711

Release Notes: No

Benchmark Detail

gobench result

gatewayapi translator benchmark (vs latest main branch)

  • Execution Time (sec/op) improves especially on large workloads, 50% faster.
    • large workload : HTTPRoute(2k), GRPCRoute(250), UDPRoute(100), ConfigMap(500), SP(500), BTP(500), EEP(500), Service(2k), EndpointSlice(2k)
  • no regressions of memory usage or allocation counts in any workloads.
                               │    main.txt    │               fix.txt                │
                               │     sec/op     │    sec/op      vs base               │
GatewayAPITranslator/small-10     2.012m ± 164%   2.188m ± 144%        ~ (p=0.132 n=6)
GatewayAPITranslator/medium-10    5.752m ±   6%   5.497m ±  35%        ~ (p=0.132 n=6)
GatewayAPITranslator/large-10    116.67m ±   5%   64.12m ±   9%  -45.04% (p=0.002 n=6)
geomean                           11.05m          9.171m         -17.02%

                               │    main.txt    │               fix.txt                │
                               │      B/op      │      B/op       vs base              │
GatewayAPITranslator/small-10    803.7Ki ± 277%   801.9Ki ± 278%       ~ (p=0.900 n=6)
GatewayAPITranslator/medium-10   3.018Mi ±   1%   3.006Mi ±   3%       ~ (p=0.065 n=6)
GatewayAPITranslator/large-10    23.35Mi ±   0%   23.32Mi ±   0%  -0.10% (p=0.002 n=6)
geomean                          3.810Mi          3.801Mi         -0.24%

                               │   main.txt    │               fix.txt               │
                               │   allocs/op   │   allocs/op    vs base              │
GatewayAPITranslator/small-10    12.89k ± 127%   12.90k ± 127%  +0.06% (p=0.035 n=6)
GatewayAPITranslator/medium-10   52.63k ±   0%   52.64k ±   0%  +0.03% (p=0.002 n=6)
GatewayAPITranslator/large-10    412.3k ±   0%   412.3k ±   0%       ~ (p=0.065 n=6)
geomean                          65.41k          65.43k         +0.03%

@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 95.20548% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.21%. Comparing base (f609278) to head (25a048e).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/securitypolicy.go 78.57% 4 Missing and 2 partials ⚠️
internal/gatewayapi/contexts.go 96.62% 2 Missing and 1 partial ⚠️
internal/gatewayapi/backendtlspolicy.go 90.00% 0 Missing and 2 partials ⚠️
internal/gatewayapi/backendtrafficpolicy.go 94.44% 0 Missing and 1 partial ⚠️
internal/gatewayapi/listener.go 94.11% 0 Missing and 1 partial ⚠️
internal/gatewayapi/route.go 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7535      +/-   ##
==========================================
- Coverage   71.57%   71.21%   -0.37%     
==========================================
  Files         231      274      +43     
  Lines       42625    34898    -7727     
==========================================
- Hits        30507    24851    -5656     
+ Misses      10344     8256    -2088     
- Partials     1774     1791      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kkk777-7
Copy link
Member Author

wait #7534

@kkk777-7 kkk777-7 marked this pull request as ready for review November 16, 2025 12:39
@kkk777-7 kkk777-7 requested a review from a team as a code owner November 16, 2025 12:39
@kkk777-7
Copy link
Member Author

/retest

@shreealt
Copy link
Contributor

shreealt commented Nov 16, 2025

@kkk777-7 you might wanna merge the main branch to this branch to include the disk space cleaner.

@zirain zirain force-pushed the perf-translator-context branch from 9527aa1 to ce0cfcc Compare November 16, 2025 13:01
@zhaohuabing
Copy link
Member

The GetService in the resource.go can be deleted - it's no longer used.

func (r *Resources) GetService(namespace, name string) *corev1.Service {
for _, svc := range r.Services {
if svc.Namespace == namespace && svc.Name == name {
return svc
}
}
return nil
}

@kkk777-7 kkk777-7 force-pushed the perf-translator-context branch from f9b5a6a to 3d4c714 Compare November 18, 2025 17:05
@zhaohuabing
Copy link
Member

zhaohuabing commented Nov 19, 2025

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

This could increase memory usage - shouldn't be much increase since we just mirror the slices that already in resource using pointers, but we should benchmark both CPU and memory to ensure we don’t introduce significant overhead.

@zirain
Copy link
Member

zirain commented Nov 19, 2025

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

we should do that.

@kkk777-7
Copy link
Member Author

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

+1
Even with the addition of a 2k service map, the memory increase was minimal, so i think the benefit on CPU performance will be actually the bigger win!
I’ll add these in context and share the benchmark results :)

@zirain
Copy link
Member

zirain commented Nov 19, 2025

GetEndpointSlicesForBackend is another bottleneck.

zirain
zirain previously approved these changes Nov 20, 2025
@zirain zirain mentioned this pull request Nov 20, 2025
GatewayControllerName: string(rs.GatewayClass.Spec.ControllerName),
GatewayClassName: gwapiv1.ObjectName(rs.GatewayClass.Name),
GlobalRateLimitEnabled: true,
EndpointRoutingDisabled: false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TranslateGatewayAPIToXds sets EndpointRoutingDisabled: true.
so, added new bench test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was done for convenience, can we unset it, and collapse the tests into 1 ?

- fqdn:
hostname: backend-v3.gateway-conformance-infra.svc.cluster.local
port: 8080
- apiVersion: gateway.envoyproxy.io/v1alpha1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkk777-7
Copy link
Member Author

Hi @zirain, @zhaohuabing, got good results 🎉
In large-scale environment, memory usage barely increased, and the CPU became 50% faster.

                               │    main.txt    │               fix.txt                │
                               │     sec/op     │    sec/op      vs base               │
GatewayAPITranslator/small-10     2.012m ± 164%   2.188m ± 144%        ~ (p=0.132 n=6)
GatewayAPITranslator/medium-10    5.752m ±   6%   5.497m ±  35%        ~ (p=0.132 n=6)
GatewayAPITranslator/large-10    116.67m ±   5%   64.12m ±   9%  -45.04% (p=0.002 n=6)
geomean                           11.05m          9.171m         -17.02%

                               │    main.txt    │               fix.txt                │
                               │      B/op      │      B/op       vs base              │
GatewayAPITranslator/small-10    803.7Ki ± 277%   801.9Ki ± 278%       ~ (p=0.900 n=6)
GatewayAPITranslator/medium-10   3.018Mi ±   1%   3.006Mi ±   3%       ~ (p=0.065 n=6)
GatewayAPITranslator/large-10    23.35Mi ±   0%   23.32Mi ±   0%  -0.10% (p=0.002 n=6)
geomean                          3.810Mi          3.801Mi         -0.24%

                               │   main.txt    │               fix.txt               │
                               │   allocs/op   │   allocs/op    vs base              │
GatewayAPITranslator/small-10    12.89k ± 127%   12.90k ± 127%  +0.06% (p=0.035 n=6)
GatewayAPITranslator/medium-10   52.63k ±   0%   52.64k ±   0%  +0.03% (p=0.002 n=6)
GatewayAPITranslator/large-10    412.3k ±   0%   412.3k ±   0%       ~ (p=0.065 n=6)
geomean                          65.41k          65.43k         +0.03%

Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

go.test.benchmark: ## Run benchmark tests for translation performance
@$(LOG_TARGET)
go test -timeout=15m -run='^$$' -bench=. -benchmem -benchtime=1x -count=6 ./test/gobench
go test -timeout=15m -run='^$$' -bench='$(GO_BENCH_TESTNAME)' -benchmem -benchtime=1x -count=6 ./test/gobench
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed ?

}
medium := baseYAML + backendYAML + tlsSecretYAML + clientTrafficPolicyYAML +
genHTTPRoutes(50) +
genHTTPRoutes(200) +
Copy link
Contributor

@arkodg arkodg Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this is bumped to 200, can we stick to previous value of 50 so its easier to compare with old values and easier to run locally

genService(200)
large := baseYAML + backendYAML + tlsSecretYAML + clientTrafficPolicyYAML +
genHTTPRoutes(500) +
genHTTPRoutes(2000) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (t *Translator) Translate(resources *resource.Resources) (*TranslateResult, error) {
var errs error

translatorContext := &TranslatorContext{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment around // Preprocessing to improve Get operations ?

also this state should be saved inside t *Translator so you dont need to pass it around via func arg

- host: 7.7.7.7
port: 8080
metadata:
kind: Service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey why is this removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the previous discussion: #7535 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, can we rename service-1 in YAML to another name, and update the changes in input file so address this ?
we are relying on translator setting kind to Service, so we need to make sure the tests account for it

t.EndpointSliceMap = make(map[backendServiceKey][]*discoveryv1.EndpointSlice)

for _, slice := range slices {
var kind string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move these outside for loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preprocess context in Gateway API Translator

5 participants