[asm] Add pipelined double-buffering support with SGPR rotation by harsh-nod · Pull Request #876 · iree-org/wave

harsh-nod · 2026-02-12T02:41:16Z

Implement memref iter_arg handling for pipelined GEMM with g2s in the C++ WaveASM backend. When scf.for carries memref iter_args for double-buffering, the LDS base offsets are now materialized as SGPRs and rotated at the loop tail using s_mov_b32 swap sequences.

Key changes:

RegionBuilder: detect LDS memref iter_args, resolve to SGPR offsets, propagate through block args, handle cross-swap at yield
TranslateFromMLIR: use V_ADD_U32 directly with SGPR offsets in vector.load/store (V_MOV_B32 rejects SGPR sources)
AMDGPUHandlers: handle dynamic SGPR-carried LDS base offsets in gather_to_lds m0 computation, prefer SALU when both operands are SGPRs
LinearScanPass: fix block arg type propagation to use allocation mapping directly instead of condition iter_arg types (broken for cross-swap patterns)
AssemblyEmitter: emit SGPR rotation copies at loop tail, detecting independent swap pairs and using 3-instruction swap with temporary

Implement memref iter_arg handling for pipelined GEMM with g2s in the C++ WaveASM backend. When scf.for carries memref iter_args for double-buffering, the LDS base offsets are now materialized as SGPRs and rotated at the loop tail using s_mov_b32 swap sequences. Key changes: - RegionBuilder: detect LDS memref iter_args, resolve to SGPR offsets, propagate through block args, handle cross-swap at yield - TranslateFromMLIR: use V_ADD_U32 directly with SGPR offsets in vector.load/store (V_MOV_B32 rejects SGPR sources) - AMDGPUHandlers: handle dynamic SGPR-carried LDS base offsets in gather_to_lds m0 computation, prefer SALU when both operands are SGPRs - LinearScanPass: fix block arg type propagation to use allocation mapping directly instead of condition iter_arg types (broken for cross-swap patterns) - AssemblyEmitter: emit SGPR rotation copies at loop tail, detecting independent swap pairs and using 3-instruction swap with temporary Signed-off-by: Harsh Menon <harsh.menon@amd.com>

harsh-nod force-pushed the mubuf_asm branch from fca7578 to fe2d1c1 Compare February 13, 2026 04:14

panditsa approved these changes Feb 16, 2026

View reviewed changes

harsh-nod merged commit 47653e7 into iree-org:main Feb 17, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[asm] Add pipelined double-buffering support with SGPR rotation#876

[asm] Add pipelined double-buffering support with SGPR rotation#876
harsh-nod merged 1 commit intoiree-org:mainfrom
harsh-nod:mubuf_asm

harsh-nod commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harsh-nod commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants