-
Notifications
You must be signed in to change notification settings - Fork 706
Open
Description
Recently we get the B200 and test the "tcgen05.mma.cta_group::1.kind::f8f6f4". We find the accumulator maintain 25bits mantissa, higher compared to H100 (13bit mantissa).
- we want to confirm our findings of 25bits is reliable?
- if more mantissa bits are reserved, does the deepgemm still calculate a group of 128 in tensor core and then move to accumulate in cuda core?
- we also test the "tcgen05.mma.cta_group::1.kind::mxf4nvf4" and "tcgen05.mma.cta_group::1.kind::mxf4", but the number of mantissa bits in accumulator is not sure, 34,35,36,37bits are tested.Do you ever conduct the test or have some reference?

Waiting for your reply and suggestion. Thank you a lot~
Metadata
Metadata
Assignees
Labels
No labels