- 
                Notifications
    You must be signed in to change notification settings 
- Fork 583
Description
Hi,
Config Used:  SM86_RTX3070 GPU configs [ unmodified ]
Benchmark used - pathfinder - https://github.com/yuhc/gpu-rodinia/tree/master/cuda/pathfinder
cuda version - cuda-11.7
The default pathfinder benchmark don't use any cudaStreamCreater API. So all the kernels that are running should run on default stream "0".
But after running the benchmark, the kernel_stream_id value prints garbage value, when the stats are printer per kernel.
Methodology to reproduce garbage value in kernel_stream_id:
- source setup_environment
- make
- cd bechmakr_dir
- nvcc pathfinder.cu
- ./pathfinder 511 512 16 > log.txt
- cat log.txt | grep kernel_stream_id
kernel_stream_id = 0
kernel_stream_id = 5927
kernel_stream_id = 94117936320480
kernel_stream_id = 0
kernel_stream_id = 5933
kernel_stream_id = 140735697102168
kernel_stream_id = 140340347922992
kernel_stream_id = 5392
kernel_stream_id = 140340344489776
kernel_stream_id = 5940
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 6981
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
kernel_stream_id = 140340344425120
After looking into the code, L2 cache stats are accumulated over the field, streamID. So this garbage value kinds of leads to incosistent L2 cache stats
A follow up-
If we are not launching the cudaStreams, then shouldn't all kernels run on the default stream 0??