-
Notifications
You must be signed in to change notification settings - Fork 391
fix(testing): Optimize fill #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: forks/osaka
Are you sure you want to change the base?
Conversation
packages/testing/src/execution_testing/base_types/base_types.py
Outdated
Show resolved
Hide resolved
fselmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, just left a nit 👍🏼
864b444 to
2091245
Compare
|
@fselmo I've pushed some other fixes to this branch but I think there's still some optimizations we can do, however those other ones do not seem so easy to pin down as the ones in this PR, so I think we can try to merge this one and then leave the rest for a follow up PR. |
|
@fselmo I think I had a breakthrough, I'm implementing now and I'll try to push tomorrow so please hold-off on reviewing 🙏 |
089aa02 to
976c290
Compare
fix(base_types): unnecessary cast
976c290 to
fa3be8e
Compare
|
If you can wait until tomorrow, I'd love to review. I just ran into a state_root recalculation issue while working on |
fselmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, really nice finds @marioevz 🔥. I left a comment on a bug in a comparison check.
Something to think about... should the LazyAlloc cache its own state root so we don't decouple the two? I don't think there's a ton of risk separating them but if we use an underscored _state_root and define something like get_state_root that pulls from the cached _state_root and if it's not there we compute it? Something like this instead of tracking them separately?
Should we also add some basic unit tests or is it enough that these are always used to check if anything breaks or doesn't?
Looks great though, excited for this to get in 🚀
| if ( | ||
| modified_tool_output.alloc.root.keys() | ||
| != modified_tool_output.alloc.root.keys() | ||
| modified_tool_output.alloc.get_alloc().root.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the same. I think we need a base_tool_alloc.root.keys() here? It looks like they were the same before too... maybe this was already broken before?
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference in these profiling charts is incredible! I'm taking a deeper look now.
Did you generate the pre-alloc group for all tests (fill stage 1) and then profile filling single test (fill stage 2)?
Here's my process - did you do follow a similar process?
- Generate the full pre-alloc group for all tests:
uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \ --rootdir . \ --generate-pre-alloc-groups \ --evm-bin=evmone-t8n \ -m "benchmark and blockchain_test_engine_x" --output=/tmp/fixtures \ --evm-dump-dir=/tmp/evm-debug | tests/benchmark/ \ --clean ```
- Profile via a single test:
rm -rf /evm/evm-debug/; uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \ --rootdir . \ --generate-all-formats \ --use-pre-alloc-groups \ --evm-bin=evmone-t8n \ -m "benchmark and blockchain_test_engine_x" \ --output=/tmp/fixtures \ --evm-dump-dir=/tmp/evm-debug \ tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[fork_Prague-blockchain_test_engine_x-opcode_ADD-]
- Visualize:
uvx snakeviz "$(find /tmp/evm-debug -name 'profile.out' | head -1)"
It seems a bit over the top to enable the profiler for anyone requesting traces from fill via --evm-debug-dir. How about a --profile flag for this? This is more about noise in the output directory than performance; the profile adds <10% perf overhead.
|
The optimized CI fail should be fixed by #1813 |
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The speed-up is really amazing.
This PR does resolve the performance issues I saw in the enginex PR regarding pre-alloc access and state root calculation 🚀
Small suggestion in marioevz#2
| if isinstance(input_bytes, str): | ||
| input_bytes = sub(r"\s+", "", input_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, the use of CoerceBytes should typically be discouraged, as it will be less efficient for string inputs. But it's a convenient way to keep the EOF containers defs with white spaces as they are.
|
|
||
| _pre_alloc_group: "PreAllocGroup" = PrivateAttr(init=False) | ||
| _model_dump_cache: ModelDumpCache | None = PrivateAttr(None) | ||
| _cache_miss_count: int = PrivateAttr(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in marioevz#2
| Model post init method to set GroupPreAlloc reference. | ||
| """ | ||
| super().model_post_init(__context) | ||
| self.pre._pre_alloc_group = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it after a mo, but it's kinda obfuscated. Suggestion here marioevz#2
spencer-tb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing optimization. Lets get this merged!
@danceratopz's changes look good to me as well.
One small comment from my side. We shoud consider adding some basic tests to our custom classes in the future (LazyAlloc/CoerceBytes/GroupPreAlloc).
| def write_json_file(data: str, file_path: str) -> None: | ||
| """Write a JSON file to the given path.""" | ||
| with open(file_path, "w") as f: | ||
| f.write(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell we don't use or need this.
| def write_json_file(data: str, file_path: str) -> None: | |
| """Write a JSON file to the given path.""" | |
| with open(file_path, "w") as f: | |
| f.write(data) |
|
We should add this to the changelog as well. I think a rebase is needed for CI to pass. |
🗒️ Description
This PR optimizes the fill process to remove virtually all python bottlenecks and leave the t8n execution as the only time consumed.
It does so via several optimizations, but first the profiling results so we can breakdown the differences.
Profiling Results
This is the profiling results for the transition tool call during the filling of a benchmark engine-x blockchain test with the following properties:
Profiling before this PR:

Profiling file (use SnakeViz to visualize): profile.before.out.tar.gz
Main time consumers:
model_validatepydantic method, 1st orange box, mainly used to load the result from the t8n tool, which includes the full modified pre-alloc (30.4s, 58.8% of the total).to_bytesconversion method.write_json_filesmethod, 2nd orange box, used to dump the pre-allocation, environment and transactions for the t8n to read them during the state transition execution, which included dumping json objects (already dumped by pydantic) into files (8.27s, 15.81% of the total).model_dumpmethod, which is used to dump memory objects into json objects to eventually pass them to t8n tool (4.07s, 7.77% of the total).Profiling after this PR:

Profiling file: profile.after.out.tar.gz
Main time consumers:
to_filesmethod(0.389s, 4.86% of the total).Optimizations
Pydantic bytes optimization
The original method we used to validate hexadecimal bytes in pydantic was too flexible and even removed white spaces in order to properly parse the string (using regex even), and this method was called on every instantiation of a
Bytestype which was extremely slow.We removed this flexibility and introduced a
CoerceBytesmethod which can be used only when the user wants to define a hexadecimal string that is more readable.Some other minor optimizations also were included in this, for details check commit
509d17c541.Remove Unnecessary State Root Calculations
During the filling of a pre-allocation grouping test, we used the
alloc.state_root()method to pass this into the header of the genesis of the filled test.This method calculates the state root using python and is extremely expensive, and since the pre-allocation group is very big since it contains all accounts of all tests, this was very slow.
This PR creates
GroupPreAllocsubclass ofAllocwhich overrides thestate_rootparameter to look for the pre-calculated value since the state root hash never changes.A significant refactor was necessary for this, and for more details check
f59397e81d.LazyAllocclass to defer model validationThis PR introduces a new class called
LazyAlloc, which allows to defer validation of the pre-alloc until it's needed in decoded format.Most of the times, and specially for tests with many blocks, this allocation is not necessary in-between calls and the only real allocation that we use for post-comparison is the one returned by the last block (with a few exceptions).
With this in mind, and since this information is validated->dumped always in the same format, when we receive the allocation in string from the t8n,
LazyAllockeeps it as-is, and then when we pass the pre-alloc of the next block to the next t8n instance, we simply pass along the string as we got it from the t8n, skipping validation and dumping entirely.If any test requires the allocation in validated format,
LazyAlloc.get_alloc()is called which validates, caches, and returns theAllocobject.See
164de2dee8for details.Cache Pydantic Model Dump of the Pre-Alloc Groups
Last optimization is to keep a cache of the result of
model_dumpormodel_dump_jsonfor the genesis pre-allocation groups, since it's always the same, and it's always required in the same format, only the first test in the pytest worker has to do this and it's not needed for the rest of the worker's runtime.See
fa3be8e061for details.The profiler
Introduces a profiler which dumps profiling information when
--evm-dump-diris passed.This feature was used to debug and fix all the issues in this PR.
🔗 Related Issues or PRs
N/A.
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture