Increase benchmark sampling 5x for CI stability#187
Merged
NullVoxPopuli merged 10 commits intoember-tooling:mainfrom Mar 20, 2026
Merged
Increase benchmark sampling 5x for CI stability#187NullVoxPopuli merged 10 commits intoember-tooling:mainfrom
NullVoxPopuli merged 10 commits intoember-tooling:mainfrom
Conversation
Uses a Node.js module loader hook to intercept mitata's lib.mjs at load time and increase k_min_cpu_time and k_min_samples by 5x, without modifying node_modules on disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
🏎️ Benchmark Comparison
Full mitata output``` clk: ~2.57 GHz cpu: AMD EPYC 7763 64-Core Processor runtime: node 24.14.0 (x64-linux) benchmark avg (min … max) p75 / p99 (min … top 1%) ------------------------------------------- ------------------------------- gts small (control) 1.41 ms/iter 1.08 ms █ (822.43 µs … 30.25 ms) 8.16 ms █ ( 4.08 kb … 7.36 mb) 531.44 kb █▆▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ gts small (experiment) 1.32 ms/iter 1.03 ms █ (784.25 µs … 30.18 ms) 6.40 ms █ ( 28.20 kb … 7.09 mb) 526.11 kb █▆▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ┌ ┐ ╷┌──┬ ╷ gts small (control) ├┤ │────────────────────────────────────────┤ ╵└──┴ ╵ ┌──┬ ╷ gts small (experiment) │ │──────────────────────────────┤ └──┴ ╵ └ ┘ 784.25 µs 4.47 ms 8.16 ms summary gts small (experiment) 1.07x faster than gts small (control) ------------------------------------------- ------------------------------- gts medium (control) 5.36 ms/iter 5.43 ms █ (3.12 ms … 34.83 ms) 28.86 ms █ ( 29.81 kb … 14.75 mb) 2.31 mb █▇▄▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁ gts medium (experiment) 5.25 ms/iter 5.10 ms █ (3.05 ms … 34.24 ms) 29.02 ms █ (229.98 kb … 13.21 mb) 2.30 mb ██▄▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ┌ ┐ ┌───┬ ╷ gts medium (control) │ │────────────────────────────────────────┤ └───┴ ╵ ┌───┬ ╷ gts medium (experiment) │ │────────────────────────────────────────┤ └───┴ ╵ └ ┘ 3.05 ms 16.04 ms 29.02 ms (1.21 ms … 22.96 ms) 17.77 ms █ ( 41.61 kb … 4.15 mb) 1.79 mb █▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ hbs medium (experiment) 1.99 ms/iter 1.37 ms █ (1.24 ms … 23.94 ms) 17.81 ms █ (560.34 kb … 3.40 mb) 1.79 mb █▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ┌ ┐ ┌─┬ ╷ hbs medium (control) │ │──────────────────────────────────────────┤ └─┴ ╵ ┌─┬ ╷ hbs medium (experiment) │ │──────────────────────────────────────────┤ └─┴ ╵ └ ┘ 1.21 ms 9.51 ms 17.81 ms summary hbs medium (control) 1x faster than hbs medium (experiment) ------------------------------------------- ------------------------------- hbs large (control) 10.33 ms/iter 9.51 ms █ (7.15 ms … 33.60 ms) 25.69 ms █ ( 6.51 mb … 9.94 mb) 8.04 mb █▄▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▃▃▂▁ hbs large (experiment) 10.29 ms/iter 9.65 ms █ (7.22 ms … 33.76 ms) 24.40 ms █ ( 6.81 mb … 10.48 mb) 8.04 mb █▃▁▂▂▂▁▂▂▁▁▁▁▁▁▁▁▁▂▄▂ ┌ ┐ ┌───────┬ ╷ hbs large (control) │ │────────────────────────────────────┤ └───────┴ ╵ ┌───────┬ ╷ hbs large (experiment) │ │─────────────────────────────────┤ └───────┴ ╵ └ ┘ 7.15 ms 16.42 ms 25.69 ms summary hbs large (experiment) 1x faster than hbs large (control) ``` |
- Enable .gc('inner') on all bench calls — forces GC between iterations
and subtracts GC time from measurements, eliminating GC-spike noise
- Switch PR comment summary from avg to p50 (median), matching the CLI
formatter — median is far more robust to outliers from CPU scaling
and system noise on shared CI runners
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
grr 🏎️ Benchmark Comparison
Full mitata output``` clk: ~3.37 GHz cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz runtime: node 24.14.0 (x64-linux) benchmark avg (min … max) p75 / p99 (min … top 1%) ------------------------------------------- ------------------------------- gts small (control) 9.86 ms/iter 12.97 ms █ ▃ (2.64 ms … 18.31 ms) 17.80 ms █▁▁▁▂▂▅▁██▆▅▂█▆▄▁▅▂▅▂ gc( 44.94 ms … 70.10 ms) 875.01 kb (475.12 kb… 1.60 mb) gts small (experiment) 9.06 ms/iter 12.61 ms █ (2.66 ms … 16.79 ms) 16.76 ms █▁▁▁▂▃▃▃▃▅▅▅▄▃▃▅▃▂▁▅▃ gc( 44.69 ms … 71.25 ms) 837.25 kb (218.52 kb… 1.38 mb) ┌ ┐ ╷ ┌───────┬─────────┐ ╷ gts small (control) ├────────────┤ │ ├─────────────┤ ╵ └───────┴─────────┘ ╵ ╷┌─────────────────┬──────────┐ ╷ gts small (experiment) ├┤ │ ├───────────┤ ╵└─────────────────┴──────────┘ ╵ └ ┘ 2.64 ms 10.22 ms 17.80 ms summary gts small (experiment) 1.09x faster than gts small (control) ------------------------------------------- ------------------------------- gts medium (control) 22.23 ms/iter 25.12 ms ▂▅█ ▅▂ ██▅▅ ▂ (14.08 ms … 30.00 ms) 29.67 ms ▄▄▄▄███▄██▇████▇▇█▄▇▄ gc( 45.45 ms … 57.61 ms) 2.84 mb ( 1.99 mb… 4.52 mb) gts medium (experiment) 20.25 ms/iter 22.63 ms ▃ ▃▆█▆ █ (13.00 ms … 26.97 ms) 26.17 ms ▆▁▆▄▆█▄▆█▄██████▆█▆▁▄ gc( 45.68 ms … 59.56 ms) 2.65 mb ( 2.19 mb… 4.65 mb) ┌ ┐ ╷ ┌─────────┬───────┐ ╷ gts medium (control) ├───────────┤ │ ├───────────┤ ╵ └─────────┴───────┘ ╵ ╷ ┌─────┬─────┐ ╷ gts medium (experiment) ├─────────────┤ │ ├─────────┤ ╵ └─────┴─────┘ ╵ └ ┘ 13.00 ms 21.34 ms 29.67 ms (4.91 ms … 14.53 ms) 14.46 ms ▄▃▁▃▆▁▁▄▁▄▆▆▄████▆▁██ gc( 41.13 ms … 56.76 ms) 2.08 mb ( 1.65 mb… 3.54 mb) hbs medium (experiment) 11.19 ms/iter 13.20 ms █▇▂ ▂ (5.01 ms … 16.51 ms) 15.35 ms ▄▃▁▁▃▄▆▃▃▆▄███▃▃██▆▁▄ gc( 41.54 ms … 56.76 ms) 2.03 mb ( 1.98 mb… 2.66 mb) ┌ ┐ ╷ ┌───┬───────┐ ╷ hbs medium (control) ├──────────────────────┤ │ ├─────┤ ╵ └───┴───────┘ ╵ ╷ ┌────┬────────┐ ╷ hbs medium (experiment) ├─────────────────────┤ │ ├────────┤ ╵ └────┴────────┘ ╵ └ ┘ 4.91 ms 10.13 ms 15.35 ms summary hbs medium (experiment) 1x faster than hbs medium (control) ------------------------------------------- ------------------------------- hbs large (control) 28.04 ms/iter 30.06 ms ▂█ ▆ (19.87 ms … 36.19 ms) 35.29 ms ▃▃▁▁▃▇██▅▇████▅▇▅▃▁▇▃ gc( 44.98 ms … 59.76 ms) 9.03 mb ( 8.23 mb… 9.93 mb) hbs large (experiment) 28.30 ms/iter 30.23 ms ▂ ▆▂▂ █▂▂ (21.52 ms … 36.01 ms) 35.96 ms ▃▃█▅▇▁███▇███▇▃▅▁▃▅▃▅ gc( 43.63 ms … 59.30 ms) 9.02 mb ( 8.62 mb… 10.06 mb) ┌ ┐ ╷ ┌───────┬─────┐ ╷ hbs large (control) ├──────────────┤ │ ├─────────────┤ ╵ └───────┴─────┘ ╵ ╷ ┌──────┬────┐ ╷ hbs large (experiment) ├───────────┤ │ ├───────────────┤ ╵ └──────┴────┘ ╵ └ ┘ 19.87 ms 27.91 ms 35.96 ms summary hbs large (control) 1.01x faster than hbs large (experiment) ``` |
Instead of patching mitata internals via ESM loader hooks, run each
parse call in a loop (1000/200/100 for small/medium/large) so that
individual GC spikes get diluted within each sample. This is simpler
and more effective than increasing mitata's sampling constants.
Also reverts gc('inner') which was counterproductive — the 44-71ms
GC between iterations trashed CPU caches and increased variance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
Much closer: 🏎️ Benchmark Comparison
Full mitata output |
Small: 1000→2000, medium: 200→500. The previous run showed gts small still drifting at -3.4% while medium/large were within 1%. More iterations per sample further dilutes GC and scheduling noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
sus 🏎️ Benchmark Comparison
Full mitata output``` clk: ~3.10 GHz cpu: AMD EPYC 7763 64-Core Processor runtime: node 24.14.0 (x64-linux) benchmark avg (min … max) p75 / p99 (min … top 1%) ------------------------------------------- ------------------------------- gts small (control) 2.22 s/iter 2.29 s █ █ █ (1.95 s … 2.73 s) 2.40 s ▅▅ ▅ ▅█ ▅ █ █ (286.61 mb … 326.65 mb) 309.96 mb ██▁█▁▁▁██▁▁█▁▁▁█▁▁▁▁█ gts small (experiment) 2.15 s/iter 2.23 s █ █ (1.88 s … 2.68 s) 2.31 s ▅ ▅ ▅ █▅ ▅▅ ▅ █ (295.43 mb … 323.14 mb) 312.31 mb █▁▁█▁█▁██▁▁▁▁██▁█▁▁▁█ ┌ ┐ ╷ ┌────────────────┬─────┐ ╷ gts small (control) ├─────┤ │ ├─────────┤ ╵ └────────────────┴─────┘ ╵ ╷ ┌─────────────┬──────┐ ╷ gts small (experiment) ├────────┤ │ ├──────┤ ╵ └─────────────┴──────┘ ╵ └ ┘ 1.88 s 2.14 s 2.40 s summary gts small (experiment) 1.03x faster than gts small (control) ------------------------------------------- ------------------------------- gts medium (control) 2.28 s/iter 2.33 s █ (2.07 s … 2.80 s) 2.39 s ▅ ▅▅▅▅▅ █ ▅ ▅ ▅ (344.27 mb … 364.06 mb) 354.16 mb █▁▁▁▁█████▁█▁▁▁▁█▁█▁█ gts medium (experiment) 2.48 s/iter 2.40 s █ (2.22 s … 2.99 s) 2.91 s █▅ (244.74 mb … 359.36 mb) 320.67 mb ▇▁▇▇██▁▁▁▁▁▁▁▁▁▁▁▇▁▁▇ ┌ ┐ ╷ ┌─────┬──┐ ╷ gts medium (control) ├────┤ │ ├──┤ ╵ └─────┴──┘ ╵ ╷ ┌───────┬ ╷ gts medium (experiment) ├─────┤ │──────────────────────┤ ╵ └───────┴ ╵ └ ┘ 2.07 s 2.49 s 2.91 s (935.11 ms … 1.23 s) 1.06 s ▅▅█ ▅ ▅ █▅▅ ▅ (181.07 mb … 234.72 mb) 193.72 mb ███▁▁█▁▁▁▁█▁▁███▁▁▁▁█ hbs medium (experiment) 1.01 s/iter 1.04 s █ █ █ (936.12 ms … 1.18 s) 1.05 s █ █ ▅▅ ▅ ▅▅█ (187.74 mb … 232.52 mb) 205.45 mb █▁▁█▁▁▁▁██▁▁▁▁▁█▁▁███ ┌ ┐ ╷ ┌──────────────────────┬───┐ ╷ hbs medium (control) ├───┤ │ ├─────────────┤ ╵ └──────────────────────┴───┘ ╵ ╷ ┌────────────────────┬────────────┐╷ hbs medium (experiment) ├─────┤ │ ├┤ ╵ └────────────────────┴────────────┘╵ └ ┘ 935.11 ms 995.94 ms 1.06 s summary hbs medium (control) 1x faster than hbs medium (experiment) ------------------------------------------- ------------------------------- hbs large (control) 1.07 s/iter 1.09 s █ (991.87 ms … 1.25 s) 1.15 s ▅█ ▅ ▅▅▅ ▅▅ ▅ ▅ (166.53 mb … 214.17 mb) 173.51 mb ██▁█▁▁▁███▁▁██▁█▁▁▁▁█ hbs large (experiment) 1.06 s/iter 1.09 s █ █ (982.06 ms … 1.24 s) 1.15 s ▅ █ █ ▅ ▅ ▅ ▅▅ ▅ (165.13 mb … 215.05 mb) 172.79 mb █▁█▁█▁█▁█▁▁█▁▁██▁▁▁▁█ ┌ ┐ ╷┌───────────────────┬────┐ ╷ hbs large (control) ├┤ │ ├───────────────┤ ╵└───────────────────┴────┘ ╵ ╷ ┌────────────────┬───────┐ ╷ hbs large (experiment) ├────┤ │ ├──────────────┤ ╵ └────────────────┴───────┘ ╵ └ ┘ 982.06 ms 1.06 s 1.15 s summary hbs large (experiment) 1.01x faster than hbs large (control) ``` |
Author
|
Locally I get this: |
Author
|
Locally (with |
- New run-bench.sh wrapper: applies taskset -c 0 (CPU pinning) and nice -n -20 (high priority, when running as root) for pnpm bench - bench-compare.mjs: also uses nice -n -20 when running as root - local-bench-summary.sh: offers to apply CPU governor and boost fixes interactively instead of just warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
CI results (fork PR so it logged instead of commenting):
8/9 within 2% on a shared CI runner. gts small is the only outlier at -7.3% — likely still some GC noise on the smallest/fastest fixture even with 2000 iterations. |
Wraps each parse call with mitata's do_not_optimize() to ensure V8 doesn't dead-code eliminate any part of the return value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
Local results with 7/9 within 2% — even in a VM without root priority. |
Instead of requiring the user to run the whole command as root (which loses PATH and can't find node/pnpm), use 'sudo nice -n -20 node ...' so only the nice call is elevated. Activates automatically when passwordless sudo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract shared formatTime, deltaEmoji, parsePairs into bench-utils.mjs - Simplify run-bench.sh to just CPU pinning (no sudo/nice complexity) - Simplify bench-compare.mjs to just CPU pinning - Replace interactive sudo prompt in local-bench-summary.sh with a "tips for more reliable results" section printed after results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NullVoxPopuli
approved these changes
Mar 20, 2026
Member
|
it's at least better locally |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tests/bench-sampler.mjs+tests/bench-sampler-hooks.mjs) that intercepts mitata'slib.mjsat load time and multipliesk_min_cpu_timeandk_min_samplesby 5xpnpm benchandpnpm bench:compare(used by CI) now use the higher samplingTest plan
pnpm benchruns successfully with the loader hooknode_modules/mitata/src/lib.mjsis unchanged after running🤖 Generated with Claude Code