Skip to content

B200(sm=100a) FP8 accumulator bits #176

@lisuying214

Description

@lisuying214

Recently we get the B200 and test the "tcgen05.mma.cta_group::1.kind::f8f6f4". We find the accumulator maintain 25bits mantissa, higher compared to H100 (13bit mantissa).

  1. we want to confirm our findings of 25bits is reliable?
  2. if more mantissa bits are reserved, does the deepgemm still calculate a group of 128 in tensor core and then move to accumulate in cuda core?
  3. we also test the "tcgen05.mma.cta_group::1.kind::mxf4nvf4" and "tcgen05.mma.cta_group::1.kind::mxf4", but the number of mantissa bits in accumulator is not sure, 34,35,36,37bits are tested.Do you ever conduct the test or have some reference?
Image

Waiting for your reply and suggestion. Thank you a lot~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions