-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Consider adding some more attributes related to memory. For example Storage, StorageTouched, and/or StorageTouchedPerRep so it is clear how much memory is used by the kernel, how much is ever touched, and how much is touched per repetition. Currently it is hard to know if you are using enough memory to avoid caching effects in some kernels due to how the Bytes counters are defined.
The Bytes*PerRep attributes can be a little misleading and inconsistent. They do not mean how many unique bytes are encountered per repetition. The BytesReadPerRep is the sum over loops in a repetition of the number of unique bytes read in that loop. Similarly for BytesWrittenPerRep for bytes written. For BytesAtomicModifyWrittenPerRep the atomic read-modify-write operations are sometimes counted per unique byte as in Basic_PI_ATOMIC and sometimes per operation even if there are multiple on the same byte as in Algorithm_ATOMIC. This means that in kernels with multiple loops per repetition the same byte may be counted more than once in both reads and writes, multiple times in atomic read-modify-writes, and multiple times in different loops. For example in the Apps_PRESSURE and Apps_ENERGY kernels the same memory is counted multiple times as there are multiple loops per repetition. This means that BytesPerRep is not the total number of unique bytes encountered per repetition by the kernel, and is not the amount of storage used by the kernel. The intention was for it to be the amount of data transferred between the last level of the cache and memory if there is perfect caching within a sub-kernel/loop and no caching between sub-kernels/loops.
Also fix some of the bytes counters, it looks like BytesReadPerRep is wrong for Apps_PRESSURE for its second loop.