Question about Triton GPU IR SharedEncoding #2026
-
| There is very little documents about the  I have a new MMA layout attribute for Intel XMX layout for lowering the tt.dot to Intel XMX engine. (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md#horizontal-packing-for-a-c-and-result) The convert layout will be decomposed from blocked->shared->mma in the optimization passes. Like: I read the https://github.com/openai/triton/blob/5df904233c11a65bd131ead7268f84cca7804275/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td#L52. My question is how to understand the meaning of the  | 
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
| I cant help you ! | 
Beta Was this translation helpful? Give feedback.
-
| It means the elements in the tensor are in shared memory. And the mapping from each element of the tensor to shared memory addresses is represented by the swizzling parameters:  | 
Beta Was this translation helpful? Give feedback.
-
| 
 How is the layout characterized by the  | 
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
| 
 Can you explain more about to minimize the bank conflicts? | 
Beta Was this translation helpful? Give feedback.
-
| @zhanglx13 Thanks for the warm help. I think my question is clear now. | 
Beta Was this translation helpful? Give feedback.
Say we have a 16 (M) by 16 (N) tensor A and each element is a f32. And we want to do swizzling along the N dim (row).
We want to swizzle the elements within each row when putting the elements in shared memory. Here is how the parameters control the swizzling behavior
perPhase, which is calculated as perPhase = 128 / (elementsPerRow * elementTypeInBytes). In this example, perPhase = 128 / (16*4) = 2, which means every 2 rows have the same swizzling patternmaxPhasemeans how many patterns in total do we want. This is usually set according to how shared memory is acces…