Skip to content

clarify the device FP config queries (e.g. CL_DEVICE_SINGLE_FP_CONFIG) #1499

@bashbaug

Description

@bashbaug

The current descriptions of the device floating-point queries is a bit confusing and ambiguous and we should consider clarifying them. To be clear, I don't think we can change the meaning of the queries since they've been around since OpenCL 1.0, but we should ensure that the descriptions clearly convey what the queries are useful for, and perhaps include some cautions what they are NOT useful for.

We may also want to revisit the "required support" for some of the queries, depending on the scope of the clarifications.

My key confusion is that some of the query results are describing behavioral differences that can be directly observed and tested, some others appear to be providing performance guideance, and it is not clear which queries fall into which categories.

For example, I think CL_FP_DENORM is behavioral. If the device does not support CL_FP_DENORM, then in at least some cases denorms may be flushed to zero, whereas if the device does support CL_FP_DENORM, then denorms must be preserved.

I think CL_FP_INF_NAN is probably behavioral also, although it looks like the spec allows it to be omitted for CL_DEVICE_HALF_FP_CONFIG. Is this a bug? If a full profile device does not support CL_FP_INF_NAN for half-precision, do the rules from the embedded profile describing the behavior without CL_FP_INF_NAN apply, or something else?

Things get murkier with CL_FP_ROUND_TO_NEAREST, CL_FP_ROUND_TO_ZERO, and CL_FP_ROUND_TO_INF. Are these behavioral, or performance hints? If they are behavioral, is it only defining the "default rounding mode", when no conversion function is used with an explicit rounding mode (how is the "default rounding mode" determined when more than one bit is set, note that this is required for CL_DEVICE_DOUBLE_FP_CONFIG, at least for OpenCL 1.x devices)? Is the answer different for full profile vs. embedded profile devices?

We clarified in #1391 that CL_FP_FMA is a performance hint, but it's required for CL_DEVICE_DOUBLE_FP_CONFIG. Is this a bug? Should there be any relationship between CL_FP_FMA and the OpenCL C preprocessor defines for FP_FAST_FMAF, FP_FAST_FMA, and FP_FAST_FMA_HALF? Also, just to be explicit, the fma built-in function must be correctly rounded regardless of CL_FP_FMA, correct?

I think CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT is behavioral, and it allows 32-bit floating-point divide and square root to be correctly rounded using the -cl-fp32-correctly-rounded-divide-sqrt build option, but it does not indicate whether divides and square roots are correctly rounded by default, nor whether correctly rounded divides and square roots are performant.

I think CL_FP_SOFT_FLOAT is purely a performance hint.

Some specific recommended actions:

  1. Clarify whether CL_DEVICE_HALF_FP_CONFIG may omit CL_FP_INF_NAN for full profile devices. Consider generalizing the behavior without CL_FP_INF_NAN beyond embedded profile devices and fp32.
  2. Figure out what the rounding mode bits mean, how they affect the "default rounding mode", and check the places in the API and OpenCL C spec that refer to the "default rounding mode". IMHO, this is the most important clarification.
  3. Consider whether CL_DEVICE_DOUBLE_FP_CONFIG should require CL_FP_FMA, given that it is a performance hint.
  4. Consider whether CL_FP_FMA should tie to the FP_FAST_FMA macros in OpenCL C.
  5. Clarify that CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT indicates that divides and square roots may be correctly rounded, not necessarily that they are correctly rounded, nor that they are fast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions