feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

CPerezz · 2025-10-03T09:20:36Z

🗒️ Description

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on non-existing storage slots using ERC20 balanceOf() queries.

The idea of this benchmark is to exploit within a single or series of N contracts calls to non-existing addresses. On this way, we force clients to resolve as many tree branches as possible.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

marioevz

Did a quick pass and it looks good to me overall.

I left a couple of questions as comments. Thanks!

tests/benchmark/stateful/bloatnet/test_single_opcode.py

LouisTsai-Csie

Some small suggestion and question, you could ignore these comment if it does not make sense!

tests/benchmark/stateful/bloatnet/test_single_opcode.py

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on non-existing storage slots using ERC20 balanceOf() queries. The idea of this benchmark is to exploit within a single or series of N contracts calls to non-existing addresses. On this way, we force clients to resolve as many tree branches as possible.

Add test_sstore_erc20_approve that benchmarks SSTORE operations by calling approve(spender, amount) on pre-deployed ERC20 contracts. Follows the same pattern as the SLOAD benchmark: - Auto-discovers ERC20 contracts from stubs - Splits gas budget evenly across all discovered contracts - Uses counter as both spender address and amount - Forces SSTOREs to allowance mapping storage slots The test measures client performance when writing to many storage slots across multiple contracts, stressing state-handling write operations.

Fixed gas calculation for test_sstore_erc20_approve to ensure accurate gas usage prediction and prevent transaction reverts: Key fixes: - Added memory expansion cost (15 gas per contract) - Corrected G_LOW gas values in comments (5 gas, not 3) - Separated per-contract overhead from per-iteration costs - Improved cost calculation clarity with detailed opcode breakdown Gas calculation (10M gas, 3 contracts): - Intrinsic: 21,000 - Overhead per contract: 38 - Cost per iteration: 20,226 - Calls per contract: 164 - Expected gas used: 9,972,306 (99.72% utilization)

…atios Add test_mixed_sload_sstore to test_multi_opcode.py that combines SLOAD and SSTORE operations with parameterized gas distribution ratios (50-50, 70-30, 90-10). The test stresses clients with mixed read/write workloads by: - Dividing gas budget evenly across all discovered ERC20 contract stubs - Splitting each contract's allocation by the specified percentage ratio - Executing balanceOf (cold SLOAD on empty slots) for the SLOAD portion - Executing approve (SSTORE to new allowance slots) for the SSTORE portion Verified gas calculations for 10M gas budget with 3 contracts (50-50 ratio): - SLOAD operations: ~2,312 gas/iteration → 719 calls per contract - SSTORE operations: ~20,226 gas/iteration → 82 calls per contract - Total operations: 2,403 state operations (2,157 SLOADs + 246 SSTOREs) - Gas usage: 9.98M / 10M (16K buffer, no out-of-gas errors) This benchmark enables testing different read/write ratios to identify client performance characteristics under varying state operation mixes.

…back Address review comments by optimizing loop efficiency: 1. Move function selector MSTORE outside loops (Comment ethereum#2) - BALANCEOF_SELECTOR and APPROVE_SELECTOR now stored once per contract - Saves 3 gas (G_VERY_LOW) per iteration - Total savings: ~6,471 gas for 50-50 ratio with 10M budget and 3 contracts 2. Remove unused return data from CALL operations (Comment ethereum#1) - Changed ret_offset=96/128, ret_size=32 to ret_offset=0, ret_size=0 - Eliminates unnecessary memory expansion - Minor gas savings, cleaner implementation Skipped Comment ethereum#3 (use Op.GAS for addresses): - Would lose determinism (GAS varies per iteration) - Adds complexity for minimal benefit - Counter still needed for loop control Changes applied to: - test_sload_empty_erc20_balanceof - test_sstore_erc20_approve - test_mixed_sload_sstore (both SLOAD and SSTORE loops)

LouisTsai-Csie

@CPerezz , I’ve left some suggestions. Please take a look and let me know if they’re unclear or not practical. These changes might not reduce much gas usage, but i wonder if they could help simplify the layout a bit.

I've not yet reviewed test_multi_opcode.py, but i believe it would be quick if we have consensus on the other test cases!

tests/benchmark/stateful/bloatnet/test_single_opcode.py

…alldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] and MEM[64] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata

…calldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata

…x calldata encoding - Move selectors MSTORE outside for-loop (saves gas per contract) - Use separate memory regions for balanceOf and approve to avoid conflicts - Fix calldata encoding by using correct args_offset for proper ABI format - Selectors now properly positioned at start of calldata

…stently - Reuse MEM[0] for both selectors (sequential operations, no conflict) - Reuse MEM[32] for both counters (balanceOf then approve) - Reuse MEM[64] and MEM[96] for parameters - Consistent args_offset=28 for both operations (was 28 and 128) - Matches single-opcode test pattern for easier understanding - Reduces memory footprint from 196 bytes to 96 bytes

LouisTsai-Csie

The code structure looks good, but I did not dive deeply into the per-iteration cost calculation yet, as the implementation may still change a bit.

Leave a comment for gas cost calculation for external calls to erc20 tokens

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

spencer-tb · 2025-10-16T08:57:58Z

Hey, just an FYI. We are about to finalize "The Weld" - moving EEST to EELS. It's best to keep this PR up here in EEST for now. We will then ask you to reopen the PR but in EELS somepoint next week. Hope that okay!

More context: https://steel.ethereum.foundation/blog/blog_posts/2025-09-11_weld-announcement/

…o single-opcode tests - Add parametrization for num_contracts [1, 5, 10, 20, 100] - Implement stub prefix filtering based on test function name - Add validation to error if insufficient matching stubs - Add SSTORE benchmark architecture documentation - Create README with setup instructions and stubs.json format

…ameter

… correct gas calculations - Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test - Implement stub prefix filtering for all benchmarks - Fix gas cost calculations to account for COLD/WARM account access - CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100) - SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes - Update gas calculation formulas to solve for calls per contract correctly

LouisTsai-Csie

@CPerezz Only one comment from my side! But i would leave the final review to @marioevz as you had a discussion about the stub restructuring.

And here i assume the gas usage fits in your scenario, so i do not strictly calculated the gas cost here - we could do it later using this feature: ethereum/execution-specs#1479

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

tests/benchmark/stateful/bloatnet/test_single_opcode.py

… correct gas calculations - Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test - Implement stub prefix filtering for all benchmarks - Fix gas cost calculations to account for COLD/WARM account access - CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100) - SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes - Update gas calculation formulas to solve for calls per contract correctly

LouisTsai-Csie

Thanks for the PR, please remember to rebase

marioevz reviewed Oct 3, 2025

View reviewed changes

CPerezz commented Oct 6, 2025

View reviewed changes

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

LouisTsai-Csie reviewed Oct 6, 2025

View reviewed changes

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

CPerezz added 5 commits October 6, 2025 17:42

CPerezz force-pushed the feat/bloatnet-sload-sstore-benchmarks branch from e0ae1ee to 552638e Compare October 6, 2025 15:48

CPerezz requested review from LouisTsai-Csie and marioevz October 6, 2025 15:54

LouisTsai-Csie requested changes Oct 7, 2025

View reviewed changes

LouisTsai-Csie mentioned this pull request Oct 9, 2025

Add automatic gas cost calculation for opcode sequences ethereum/execution-specs#1479

Open

CPerezz changed the title ~~feat(benchmark): add SLOAD benchmark test with multi-contract support~~ feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support Oct 7, 2025

CPerezz added 4 commits October 8, 2025 14:25

LouisTsai-Csie requested changes Oct 13, 2025

View reviewed changes

tests/benchmark/stateful/bloatnet/test_multi_opcode.py Outdated Show resolved Hide resolved

CPerezz marked this pull request as ready for review October 16, 2025 13:35

CPerezz added 4 commits October 16, 2025 15:38

fix(benchmark): add type annotations to test functions

511aaaf

fix(benchmark): add AddressStubs type annotation to address_stubs par…

f25b55f

…ameter

LouisTsai-Csie requested changes Oct 17, 2025

View reviewed changes

tests/benchmark/stateful/bloatnet/test_multi_opcode.py Outdated Show resolved Hide resolved

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

LouisTsai-Csie approved these changes Oct 23, 2025

View reviewed changes

LouisTsai-Csie merged commit 54b46ea into ethereum:main Oct 23, 2025
16 of 17 checks passed

feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

Conversation

CPerezz commented Oct 3, 2025

🗒️ Description

✅ Checklist

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spencer-tb commented Oct 16, 2025

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LouisTsai-Csie left a comment •

edited

Loading