The comparison between cublaslt and deepgemm in test_fp8.py is unaligned

It's great that PR #198 added cublas implementation to test_fp8.py, so we can directly observe the performance differences.
That said, I noticed the cublas implementation uses tensorwise scaling(cublas default behavior) instead of blockwise scaling, which makes the comparison with deepgemm less fair and potentially misleading.
Would it be possible to align the scaling strategy? I suppose cublas currently has supported the blockwise scaling strategy same with deepgemm: https://docs.nvidia.com/cuda/cublas/#element-1d-and-128x128-2d-block-scaling-for-fp8-data-types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The comparison between cublaslt and deepgemm in test_fp8.py is unaligned #199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The comparison between cublaslt and deepgemm in test_fp8.py is unaligned #199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions