Spinlock detection and fastforwarding for accel-sim #484

William-An · 2025-09-16T19:04:31Z

Some changes to help mitigate the spinlock issue with the accel-sim trace-based simulation, as the captured traces cannot reflect the true behavior of spinlocks and will skew the simulation result.

The fix has two components: detection and handling

Spinlock detection
- Code in tracer_tool/others/spinlock_tool
- Detection of spinlock is done by
  1. Running the target application multiple times and collecting the instruction histogram between runs.
  2. Diffing the two runs' histograms for instructions with different counts.
  3. These instructions are the non-deterministic code regions that are likely to be spinlocks.
  4. This methodology is based on Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator
- There are two approaches for collecting the instruction histograms:
  1. ~~Collect individual kernel launches from different contexts and diff based on kernel launch order.~~
    - This approach is simple, but the CUDA context launch order might not be consistent throughout different runs.
    - Also, for multiprocessing applications, the CUDA kernel launch order might also not be consistent.
  2. Summing up instruction histograms by kernel names and diff based on each kernel's
    collective instruction histogram.
    - This approach should work as long as the actual work (non-spinlock section) is deterministic between program runs.
    - With simple modulo hashing to make sure the histogram does not overflow.
Spinlock handling via fast-forwarding
- During the host side recv_thread_fun of accel-sim's tracer, if ENABLE_SPINLOCK_FAST_FORWARD env is set to 1, the host receiving thread will keep only SPINLOCK_ITER_TO_KEEP iters of spinlock for each spinlock section.
- Note that the fast-forwarding will only be done on the innermost spinlock loop with a nested spinlock loop with deterministic work between two inner spinlock loops.

…work-public into spinlock_fix

Copilot

Pull Request Overview

This pull request introduces spinlock detection and fast-forwarding capabilities for accel-sim's trace-based simulation to mitigate timing inaccuracies caused by spinlocks in captured traces.

Implements a two-phase spinlock detection tool that identifies non-deterministic code regions by comparing instruction execution histograms across multiple runs
Adds fast-forwarding mechanism to tracer that limits the number of spinlock iterations recorded in traces
Integrates spinlock handling options into the trace generation pipeline

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
util/tracer_nvbit/tracer_tool/tracer_tool.cu	Adds spinlock fast-forwarding logic and instruction index tracking
util/tracer_nvbit/tracer_tool/inject_funcs.cu	Updates instrumentation to pass instruction indices
util/tracer_nvbit/tracer_tool/common.h	Adds instruction index field to trace structure
util/tracer_nvbit/tracer_tool/Makefile	Updates C++ standard to C++17 for new features
util/tracer_nvbit/run_hw_trace.py	Integrates spinlock detection and handling into trace workflow
util/tracer_nvbit/others/spinlock_tool/*	New spinlock detection tool implementation
util/tracer_nvbit/Makefile	Includes spinlock tool in build process
util/tracer_nvbit/.gitignore	Simplifies ignore patterns
util/job_launching/apps/define-all-apps.yml	Adds spinlock test application
.github/workflows/main.yml	Adds matrix testing for spinlock handling options

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

util/tracer_nvbit/tracer_tool/tracer_tool.cu

util/tracer_nvbit/others/spinlock_tool/common.h

util/tracer_nvbit/run_hw_trace.py

util/tracer_nvbit/others/spinlock_tool/spinlock_tool.cu

util/tracer_nvbit/others/spinlock_tool/common.h

JRPan · 2025-09-30T16:24:50Z

.github/workflows/main.yml

    runs-on: tgrogers-gpu01
+    strategy:
+      matrix:
+        spinlock_handling: ["none", "fast_forward"]


This would run every thing twice, including the builds. Can we just run the tracer twice? And how is CI determining if fast forward is correct? Just by functionally passing the test?

Okay, I will change the CI script to just run the tracer.

Right now, CI won't test if fast forward is correct; it will just trace it and run the generated trace, making sure fast-forward won't mess up non-spinlock kernels.

I can also change the CI script to run and trace the spinlock app, and then test to check if the total number of instructions is reduced or not.

JRPan · 2025-09-30T16:26:01Z

util/job_launching/apps/define-all-apps.yml

            - args: 16
              accel-sim-mem: 1G

+Spinlock:


Is this a dedicated "suite" instead of part of uBench? Then can we only run fast forward on this one?

Actually I saw that on gpu-app-collection Spinlock is part of the uBench. Can you move this under ubench suite?

I can, but the correlation of it will be terrible with just fast-forwarding when running with the ubench suite, given that this app is acquiring a highly contested lock.

I gave it a dedicated suite as all the atomic kernels also got a dedicated suite, even though they are under the ubench folder in the gpu-app-collection.

okay valid point

JRPan · 2025-09-30T16:30:56Z

util/tracer_nvbit/tracer_tool/tracer_tool.cu

        /* Add Source code line number for current instr */
        nvbit_add_call_arg_const_val32(instr, (int)line_num);
+        /* Add instruction index for current instr (spinlock detection) */
+        nvbit_add_call_arg_const_val32(instr, (uint32_t)instr->getIdx());


Nothing wrong. Just curious, what is id? Why not just use PC?

Just index for each instruction in the kernel's instruction array. I am pretty sure it is just Offset/16.

JRPan · 2025-09-30T16:39:40Z

util/tracer_nvbit/run_hw_trace.py

+            + " ; "
+        )
+
+        for path, content in [("run.sh", tracer_contents), ("run_spinlock_detection.sh", spinlock_contents)]:


Thi is okay. What about writing to the same script? Any benefit for seperating them? I just think it might be confusing for someone want to run the script manually. Just asking.

Spinlock detection is decoupled from the actual tracing. You just need to run it once to generate the instruction indices. I think that is why I separate those two.

William-An · 2025-10-10T21:43:32Z

Force merge as the failure is unrelated to code changes.

William-An and others added 12 commits September 15, 2025 10:01

add spinlock detection tool

85d6cf5

use dprintf for debug msg

9a23e6b

integrate spinlock fastforwarding with accel-sim tracer

6387ede

add custom rundir support for spinlock tool

c7e67f4

add spinlock detection script to the run_hw_trace.py

30e9c19

Automated Format

7d30c2a

track kernel histogram for every launch in every context by kernel name

48a09f1

update tracer tool with per-kernel histogram

b08340e

Merge branch 'spinlock_fix' of github.com:purdue-aalp/accel-sim-frame…

2c68dee

…work-public into spinlock_fix

format to pass ci

ea101fb

update test app

85d1b87

Merge branch 'dev' into spinlock_fix

08a6b5b

William-An marked this pull request as ready for review September 23, 2025 20:16

William-An added 9 commits September 23, 2025 16:20

move test app to gpu-app-collection

f2bcdfa

update script for spinlock handling

79f819f

update ci to include spinlock tracer run

d21015b

add spinlock test app to accel-sim yaml

b577da3

fix a bug when detecting spinlock

186e1bf

fix bug

cad11ca

fix filename too long issue and clean intermediate files by default

1f5a2ed

fix path issue

20e5f99

fix histogram path for merged histo and add readme

fae2d48

William-An requested review from JRPan and Copilot September 25, 2025 14:18

Copilot AI reviewed Sep 25, 2025

View reviewed changes

address PR review and update top-level readme

a10a587

JRPan reviewed Sep 30, 2025

View reviewed changes

William-An added 3 commits October 6, 2025 13:48

update CI for PR

58d3a63

build spinlock

007d1a8

clone recursively for building GPU ubench

5595b4a

William-An added 2 commits October 7, 2025 09:44

remove sim compare for spinlock since it takes too long to complete

82a48a2

move spinlock tracer test to weekly and fix a bug in it

3257e47

William-An requested a review from JRPan October 10, 2025 18:53

Merge branch 'dev' into spinlock_fix

db3ed3f

JRPan approved these changes Oct 10, 2025

View reviewed changes

William-An merged commit 96d7923 into accel-sim:dev Oct 10, 2025
12 of 14 checks passed

Spinlock detection and fastforwarding for accel-sim #484

Spinlock detection and fastforwarding for accel-sim #484

Uh oh!

Conversation

William-An commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

William-An Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

William-An commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

William-An commented Sep 16, 2025 •

edited

Loading

William-An Sep 30, 2025 •

edited

Loading