-
Notifications
You must be signed in to change notification settings - Fork 162
Spinlock detection and fastforwarding for accel-sim #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…work-public into spinlock_fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces spinlock detection and fast-forwarding capabilities for accel-sim's trace-based simulation to mitigate timing inaccuracies caused by spinlocks in captured traces.
- Implements a two-phase spinlock detection tool that identifies non-deterministic code regions by comparing instruction execution histograms across multiple runs
- Adds fast-forwarding mechanism to tracer that limits the number of spinlock iterations recorded in traces
- Integrates spinlock handling options into the trace generation pipeline
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
File | Description |
---|---|
util/tracer_nvbit/tracer_tool/tracer_tool.cu | Adds spinlock fast-forwarding logic and instruction index tracking |
util/tracer_nvbit/tracer_tool/inject_funcs.cu | Updates instrumentation to pass instruction indices |
util/tracer_nvbit/tracer_tool/common.h | Adds instruction index field to trace structure |
util/tracer_nvbit/tracer_tool/Makefile | Updates C++ standard to C++17 for new features |
util/tracer_nvbit/run_hw_trace.py | Integrates spinlock detection and handling into trace workflow |
util/tracer_nvbit/others/spinlock_tool/* | New spinlock detection tool implementation |
util/tracer_nvbit/Makefile | Includes spinlock tool in build process |
util/tracer_nvbit/.gitignore | Simplifies ignore patterns |
util/job_launching/apps/define-all-apps.yml | Adds spinlock test application |
.github/workflows/main.yml | Adds matrix testing for spinlock handling options |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
.github/workflows/main.yml
Outdated
runs-on: tgrogers-gpu01 | ||
strategy: | ||
matrix: | ||
spinlock_handling: ["none", "fast_forward"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would run every thing twice, including the builds. Can we just run the tracer twice? And how is CI determining if fast forward is correct? Just by functionally passing the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Okay, I will change the CI script to just run the tracer.
- Right now, CI won't test if fast forward is correct; it will just trace it and run the generated trace, making sure fast-forward won't mess up non-spinlock kernels.
I can also change the CI script to run and trace the spinlock app, and then test to check if the total number of instructions is reduced or not.
- args: 16 | ||
accel-sim-mem: 1G | ||
|
||
Spinlock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dedicated "suite" instead of part of uBench? Then can we only run fast forward on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I saw that on gpu-app-collection Spinlock is part of the uBench. Can you move this under ubench suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, but the correlation of it will be terrible with just fast-forwarding when running with the ubench suite, given that this app is acquiring a highly contested lock.
I gave it a dedicated suite as all the atomic kernels also got a dedicated suite, even though they are under the ubench folder in the gpu-app-collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay valid point
/* Add Source code line number for current instr */ | ||
nvbit_add_call_arg_const_val32(instr, (int)line_num); | ||
/* Add instruction index for current instr (spinlock detection) */ | ||
nvbit_add_call_arg_const_val32(instr, (uint32_t)instr->getIdx()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong. Just curious, what is id? Why not just use PC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just index for each instruction in the kernel's instruction array. I am pretty sure it is just Offset/16.
+ " ; " | ||
) | ||
|
||
for path, content in [("run.sh", tracer_contents), ("run_spinlock_detection.sh", spinlock_contents)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thi is okay. What about writing to the same script? Any benefit for seperating them? I just think it might be confusing for someone want to run the script manually. Just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spinlock detection is decoupled from the actual tracing. You just need to run it once to generate the instruction indices. I think that is why I separate those two.
Force merge as the failure is unrelated to code changes. |
Some changes to help mitigate the spinlock issue with the accel-sim trace-based simulation, as the captured traces cannot reflect the true behavior of spinlocks and will skew the simulation result.
The fix has two components: detection and handling
tracer_tool/others/spinlock_tool
Collect individual kernel launches from different contexts and diff based on kernel launch order.collective instruction histogram.
recv_thread_fun
of accel-sim's tracer, ifENABLE_SPINLOCK_FAST_FORWARD
env is set to1
, the host receiving thread will keep onlySPINLOCK_ITER_TO_KEEP
iters of spinlock for each spinlock section.