Skip to content

Conversation

@marioevz
Copy link
Member

@marioevz marioevz commented Nov 21, 2025

🗒️ Description

This PR optimizes the fill process to remove virtually all python bottlenecks and leave the t8n execution as the only time consumed.

It does so via several optimizations, but first the profiling results so we can breakdown the differences.

Profiling Results

This is the profiling results for the transition tool call during the filling of a benchmark engine-x blockchain test with the following properties:

  • Pre-allocation groupings generated for all benchmark tests in a single group.
  • All accounts from all tests are contained in a ~250MB allocation passed to the t8n in each block.

Profiling before this PR:
Screenshot 2025-11-25 at 18 01 59
Profiling file (use SnakeViz to visualize): profile.before.out.tar.gz

Main time consumers:

  1. The model_validate pydantic method, 1st orange box, mainly used to load the result from the t8n tool, which includes the full modified pre-alloc (30.4s, 58.8% of the total).
  • A lot of the time spent in this box is spent in our to_bytes conversion method.
  1. The write_json_files method, 2nd orange box, used to dump the pre-allocation, environment and transactions for the t8n to read them during the state transition execution, which included dumping json objects (already dumped by pydantic) into files (8.27s, 15.81% of the total).
  2. Evmone-t8n process run, light blue box, is the actual call to spawn the t8n process (7.73 seconds, 14.77% of the total).
  3. The model_dump method, which is used to dump memory objects into json objects to eventually pass them to t8n tool (4.07s, 7.77% of the total).

Profiling after this PR:
Screenshot 2025-11-25 at 18 02 10
Profiling file: profile.after.out.tar.gz

Main time consumers:

  1. Evmone-t8n process run, light blue box (7.61 seconds, 95.13% of the total).
  2. The new to_files method(0.389s, 4.86% of the total).

Optimizations

Pydantic bytes optimization

The original method we used to validate hexadecimal bytes in pydantic was too flexible and even removed white spaces in order to properly parse the string (using regex even), and this method was called on every instantiation of a Bytes type which was extremely slow.

We removed this flexibility and introduced a CoerceBytes method which can be used only when the user wants to define a hexadecimal string that is more readable.

Some other minor optimizations also were included in this, for details check commit 509d17c541.

Remove Unnecessary State Root Calculations

During the filling of a pre-allocation grouping test, we used the alloc.state_root() method to pass this into the header of the genesis of the filled test.

This method calculates the state root using python and is extremely expensive, and since the pre-allocation group is very big since it contains all accounts of all tests, this was very slow.

This PR creates GroupPreAlloc subclass of Alloc which overrides the state_root parameter to look for the pre-calculated value since the state root hash never changes.

A significant refactor was necessary for this, and for more details check f59397e81d.

LazyAlloc class to defer model validation

This PR introduces a new class called LazyAlloc, which allows to defer validation of the pre-alloc until it's needed in decoded format.

Most of the times, and specially for tests with many blocks, this allocation is not necessary in-between calls and the only real allocation that we use for post-comparison is the one returned by the last block (with a few exceptions).

With this in mind, and since this information is validated->dumped always in the same format, when we receive the allocation in string from the t8n, LazyAlloc keeps it as-is, and then when we pass the pre-alloc of the next block to the next t8n instance, we simply pass along the string as we got it from the t8n, skipping validation and dumping entirely.

If any test requires the allocation in validated format, LazyAlloc.get_alloc() is called which validates, caches, and returns the Alloc object.

See 164de2dee8 for details.

Cache Pydantic Model Dump of the Pre-Alloc Groups

Last optimization is to keep a cache of the result of model_dump or model_dump_json for the genesis pre-allocation groups, since it's always the same, and it's always required in the same format, only the first test in the pytest worker has to do this and it's not needed for the rest of the worker's runtime.

See fa3be8e061 for details.

The profiler

Introduces a profiler which dumps profiling information when --evm-dump-dir is passed.

This feature was used to debug and fix all the issues in this PR.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, just left a nit 👍🏼

@fselmo fselmo self-requested a review November 21, 2025 17:01
@marioevz
Copy link
Member Author

@fselmo I've pushed some other fixes to this branch but I think there's still some optimizations we can do, however those other ones do not seem so easy to pin down as the ones in this PR, so I think we can try to merge this one and then leave the rest for a follow up PR.

@marioevz
Copy link
Member Author

@fselmo I think I had a breakthrough, I'm implementing now and I'll try to push tomorrow so please hold-off on reviewing 🙏

@danceratopz
Copy link
Member

If you can wait until tomorrow, I'd love to review. I just ran into a state_root recalculation issue while working on enginex and want to check this addresses it. Brain's a bit frazzled to check it now.

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, really nice finds @marioevz 🔥. I left a comment on a bug in a comparison check.

Something to think about... should the LazyAlloc cache its own state root so we don't decouple the two? I don't think there's a ton of risk separating them but if we use an underscored _state_root and define something like get_state_root that pulls from the cached _state_root and if it's not there we compute it? Something like this instead of tracking them separately?

Should we also add some basic unit tests or is it enough that these are always used to check if anything breaks or doesn't?

Looks great though, excited for this to get in 🚀

if (
modified_tool_output.alloc.root.keys()
!= modified_tool_output.alloc.root.keys()
modified_tool_output.alloc.get_alloc().root.keys()
Copy link
Contributor

@fselmo fselmo Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same. I think we need a base_tool_alloc.root.keys() here? It looks like they were the same before too... maybe this was already broken before?

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in these profiling charts is incredible! I'm taking a deeper look now.

Did you generate the pre-alloc group for all tests (fill stage 1) and then profile filling single test (fill stage 2)?

Here's my process - did you do follow a similar process?

  1. Generate the full pre-alloc group for all tests:
    uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \ 
        --rootdir . \
        --generate-pre-alloc-groups \
        --evm-bin=evmone-t8n \
        -m "benchmark and blockchain_test_engine_x" 
        --output=/tmp/fixtures \
        --evm-dump-dir=/tmp/evm-debug |
        tests/benchmark/ \
        --clean
        ```
  2. Profile via a single test:
    rm -rf /evm/evm-debug/;
    uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \
        --rootdir . \
        --generate-all-formats \
        --use-pre-alloc-groups \
        --evm-bin=evmone-t8n \
        -m "benchmark and blockchain_test_engine_x" \
        --output=/tmp/fixtures \
        --evm-dump-dir=/tmp/evm-debug \
        tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[fork_Prague-blockchain_test_engine_x-opcode_ADD-]
  3. Visualize:
    uvx snakeviz "$(find /tmp/evm-debug -name 'profile.out' | head -1)"
    

It seems a bit over the top to enable the profiler for anyone requesting traces from fill via --evm-debug-dir. How about a --profile flag for this? This is more about noise in the output directory than performance; the profile adds <10% perf overhead.

@danceratopz
Copy link
Member

The optimized CI fail should be fixed by #1813

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The speed-up is really amazing.

This PR does resolve the performance issues I saw in the enginex PR regarding pre-alloc access and state root calculation 🚀

Small suggestion in marioevz#2

Comment on lines +231 to +232
if isinstance(input_bytes, str):
input_bytes = sub(r"\s+", "", input_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, the use of CoerceBytes should typically be discouraged, as it will be less efficient for string inputs. But it's a convenient way to keep the EOF containers defs with white spaces as they are.


_pre_alloc_group: "PreAllocGroup" = PrivateAttr(init=False)
_model_dump_cache: ModelDumpCache | None = PrivateAttr(None)
_cache_miss_count: int = PrivateAttr(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in marioevz#2

Model post init method to set GroupPreAlloc reference.
"""
super().model_post_init(__context)
self.pre._pre_alloc_group = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it after a mo, but it's kinda obfuscated. Suggestion here marioevz#2

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing optimization. Lets get this merged!
@danceratopz's changes look good to me as well.

One small comment from my side. We shoud consider adding some basic tests to our custom classes in the future (LazyAlloc/CoerceBytes/GroupPreAlloc).

Comment on lines +290 to +293
def write_json_file(data: str, file_path: str) -> None:
"""Write a JSON file to the given path."""
with open(file_path, "w") as f:
f.write(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell we don't use or need this.

Suggested change
def write_json_file(data: str, file_path: str) -> None:
"""Write a JSON file to the given path."""
with open(file_path, "w") as f:
f.write(data)

@spencer-tb spencer-tb added C-enhance Category: an improvement or new feature P-high A-test-cli-fill Area: Tests Fill CLI—runs tests and generates fixtures (eg. `p/t/s/e/cli/pytest_commands/fill.py`) A-test-client-clis Area: execution spec tests client clis labels Dec 1, 2025
@spencer-tb
Copy link
Contributor

We should add this to the changelog as well. I think a rebase is needed for CI to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-cli-fill Area: Tests Fill CLI—runs tests and generates fixtures (eg. `p/t/s/e/cli/pytest_commands/fill.py`) A-test-client-clis Area: execution spec tests client clis C-enhance Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants