Simplify sparse_set::try_emplace iterator computation to improve MSVC codegen #1286
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In our application we're often using entity components as single frame signals to communicate between systems. This means in some systems we're adding a lot of entity data each frame.
When profiling one such system I noticed that a surprising amount of time was being spent in
sparse_set::try_emplace
. Surprisingly 75%+ of that time was spent on the last line returning the iterator.In this code it appears to spill the iterator to the stack 3 times, seemingly once for each operation done on the iterator in the C++ code. Each operation overrides the iterator index value on the stack before reading it back to a register. Because the iterator was partially written to, each time this causes a store-to-load forwarding stall.
When simplifying the code so that only a single iterator is created directly in code, the generated assembly reduces down to just the few instructions required to write the iterator out to the stack return address once.

In our application, this change reduced our time spent in
try_emplace
from 33ms per second of test time to around 7ms per second of test time. This test is not fully deterministic and contains a full game scenario simulation so YMMV.Compiler used: MSVC 19.38.33135 (Default for Unreal Engine 5.6)
Optimization flags: /Ox /Ot