Skip to content

Correctly implement lower convex envelope in RCP pruning logic #368

@nv-rborkar

Description

@nv-rborkar

As the comment at https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/rcp_checker.py#L246 says, the loop does not correctly implement the "lower convex envelope" that was specified in the original RCP specification. There is an off by one error. If a point X needs to be pruned because it is greater than the interpolation of X-1 and X+1, then the point to the left of point X (X-1) needs to be retested against the interpolation of points X-2 and X+1. The increment at line 256 should be in an else clause.
This bug leads to bad RCPs not getting pruned which leads to submissions getting either unfairly rejected or unfairly "scaled" when they have a global batch size near the bad RCP point

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions