fix: disable LTO on mods lib and synchronize entity processing#125
fix: disable LTO on mods lib and synchronize entity processing#125Guffawaffle wants to merge 1 commit intonetniV:mainfrom
Conversation
- Disable LTO (build.optimization.lto) on mods static lib to prevent LTCG codegen from breaking SPUD detour hooks at runtime (C0000005) - Replace detached std::thread in submit_async and RTC handler with synchronous execution to fix GS cookie stack corruption (C0000409) - Remove .vscode/settings.json from tracking (personal editor config) - Add .smartergpt/ to .gitignore
Why disable LTO instead of fixing the root cause?The The problem: SPUD installs detours by patching function prologues at runtime, after linking is long finished. LTCG has no visibility into these runtime patches. When it inlines or eliminates a function that SPUD later tries to detour, the patch either lands on the wrong code or targets an address that no longer exists. The compiler is making correct optimizations based on incomplete information. Alternatives considered
This is a pragmatic workaround, not a root-cause fix. If SPUD gains LTO compatibility in the future, the |
QA ChecklistPrerequisites
Bug 1: LTO Disabled
Bug 2: Synchronous Entity Processing
Sync Functionality
Regression
|
Dev Note: Why entity processing must be synchronousDuring testing, we explored reducing the game-thread footprint of the synchronous entity processing fix. Documenting findings here for future reference. What was triedReplaced the synchronous
This was a middle ground between the original per-call ResultSame C0000005 crash at ~20% game load. Identical to the original detached-thread crash. ConclusionThe issue is not specific to detached threads or Synchronous inline processing is the only working approach. The actual frame cost is negligible in practice — it's just protobuf parse + diff + JSON serialize + queue push (no I/O). HTTP sending is already async via |
|
We do NOT want to run the processing synchronously, that is a bad idea. |
|
Also, you're PRs are NOT clean... you have mixed up various changes and bled them into other PRs |
|
Yeah this one cascades into others because, without it, I crash on init on pretty much anything I touch so this one does get pulled in. I add a "depends on" to kinda point towards this but it's not clean when looking at a PR. I'll step back and examine more about this init crash tomorrow. I can clearly say, without this PR, any other changes I make become a crap shoot on if it will crash on init (20% load) or not. [edit: this might also be fixed by the thin hook > dispatcher > handlers method on the other ticket] I'll step back to this with the lessons I picked up today and rework it. I've learned a lot. Likely shouldn't have tossed any of these as prs yet but, in the dev cycle I work on, this is just normal to get a pr open and then folks can start reviewing as you iterate. Helps to get early feedback imo. |
|
Closing — This did not full resolve the crash issues, or even noticably improve the experience. |
Summary
Two latent crash bugs found during hotkey feature development. Both cause intermittent crashes that are extremely difficult to reproduce because they depend on code layout, timing, and optimization decisions.
Bug 1: LTO/LTCG breaks SPUD detour hooks
Symptom: Intermittent
C0000005(access violation) at detour call sites. Adding or removing any struct member changes whether it crashes because it shifts code layout.Root Cause: The
modsstatic library compiles with/GL(Whole Program Optimization) by default, emitting MSIL bytecode. When the linker performs LTCG on the final DLL, it re-optimizes and re-lays-out function bodies — but SPUD's detour trampolines were patched against pre-LTCG addresses. The trampoline jumps into relocated/rewritten instructions.Fix:
set_policy("build.optimization.lto", false)on themodstarget inmods/xmake.lua.Why this wasn't caught before: LTO is a link-time concern. Everything compiled and linked fine. Crashes only manifested at runtime in code paths that got relocated, and the crash location moved depending on what code was added/removed — making it look like data corruption rather than codegen.
Bug 2: Detached threads cause stack corruption
Symptom: Intermittent
C0000409(GS cookie / stack buffer overrun) insubmit_asyncor RTC handler.Root Cause:
std::thread().detach()insubmit_asynccreates fire-and-forget threads accessing shared state without synchronization. When the OS reclaims a detached thread's stack while it's still executing, or multiple threads race on shared containers, the GS security cookie fails.Fix: Replace
std::thread().detach()with synchronous execution. Entity processing is fast enough that blocking is not a problem.Other changes
.vscode/settings.jsonremoved from tracking (personal editor config).gitignoreupdated to exclude.smartergpt/Testing