Skip to content

Conversation

teddemunnik
Copy link

@teddemunnik teddemunnik commented Sep 20, 2025

In our application we're often using entity components as single frame signals to communicate between systems. This means in some systems we're adding a lot of entity data each frame.

When profiling one such system I noticed that a surprising amount of time was being spent in sparse_set::try_emplace. Surprisingly 75%+ of that time was spent on the last line returning the iterator.

Pasted image 20250919214350

In this code it appears to spill the iterator to the stack 3 times, seemingly once for each operation done on the iterator in the C++ code. Each operation overrides the iterator index value on the stack before reading it back to a register. Because the iterator was partially written to, each time this causes a store-to-load forwarding stall.

When simplifying the code so that only a single iterator is created directly in code, the generated assembly reduces down to just the few instructions required to write the iterator out to the stack return address once.
Pasted image 20250919233044

In our application, this change reduced our time spent in try_emplace from 33ms per second of test time to around 7ms per second of test time. This test is not fully deterministic and contains a full game scenario simulation so YMMV.

Compiler used: MSVC 19.38.33135 (Default for Unreal Engine 5.6)
Optimization flags: /Ox /Ot

@skypjack skypjack self-requested a review September 21, 2025 15:51
@skypjack skypjack self-assigned this Sep 21, 2025
@skypjack skypjack added the enhancement accepted requests, sooner or later I'll do it label Sep 21, 2025
@skypjack
Copy link
Owner

Hold on. Is this in debug or release mode?

@skypjack skypjack changed the base branch from master to wip September 21, 2025 16:54
@teddemunnik
Copy link
Author

teddemunnik commented Sep 21, 2025

I profiled it in an optimized build, I'm not sure why MSVC did such a poor job optimizing it.
See: https://godbolt.org/z/4zfETcqe3 output line 649

Note: I made a tiny change in entt in there to remove the virtual keyword from try_emplace, which should help optimization if anything. I could not get it to show the generated assembly for try_emplace otherwise.

Appears like MSVC started generating better code from 19.43 (latest), but none of the versions before do.

@skypjack skypjack merged commit ff8544f into skypjack:wip Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement accepted requests, sooner or later I'll do it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants