Useful kernels for parallel programming.
ScanKernel implements prefix sum for uint32_t values.
CompactKernel implements stream compaction for values of user-specified size.
RadixSortKernel implements radix sort for uint32_t values. (WIP. Not yet optimized.)