clarify the device FP config queries (e.g. CL_DEVICE_SINGLE_FP_CONFIG)

The current descriptions of the [device floating-point queries ](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_DEVICE_SINGLE_FP_CONFIG) is a bit confusing and ambiguous and we should consider clarifying them.  To be clear, I don't think we can _change_ the meaning of the queries since they've been around since OpenCL 1.0, but we should ensure that the descriptions clearly convey what the queries are useful for, and perhaps include some cautions what they are NOT useful for.

We may also want to revisit the "required support" for some of the queries, depending on the scope of the clarifications.

My key confusion is that some of the query results are describing behavioral differences that can be directly observed and tested, some others appear to be providing performance guideance, and it is not clear which queries fall into which categories.

For example, I think `CL_FP_DENORM` is behavioral.  If the device does not support `CL_FP_DENORM`, then in at least some cases denorms may be flushed to zero, whereas if the device does support `CL_FP_DENORM`, then denorms must be preserved.

I think `CL_FP_INF_NAN` is probably behavioral also, although it looks like the spec allows it to be omitted for [CL_DEVICE_HALF_FP_CONFIG](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_DEVICE_HALF_FP_CONFIG).  Is this a bug?  If a full profile device does not support `CL_FP_INF_NAN` for half-precision, do the rules from the embedded profile describing the behavior without `CL_FP_INF_NAN` apply, or something else?

Things get murkier with `CL_FP_ROUND_TO_NEAREST`, `CL_FP_ROUND_TO_ZERO`, and `CL_FP_ROUND_TO_INF`.  Are these behavioral, or performance hints?  If they are behavioral, is it only defining the "default rounding mode", when no conversion function is used with an explicit rounding mode (how is the "default rounding mode" determined when more than one bit is set, note that this is required for [CL_DEVICE_DOUBLE_FP_CONFIG](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_DEVICE_DOUBLE_FP_CONFIG), at least for OpenCL 1.x devices)?  Is the answer different for full profile vs. embedded profile devices?

We clarified in #1391 that `CL_FP_FMA` is a performance hint, but it's required for [CL_DEVICE_DOUBLE_FP_CONFIG](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_DEVICE_DOUBLE_FP_CONFIG).  Is this a bug?  Should there be any relationship between `CL_FP_FMA` and the OpenCL C preprocessor defines for `FP_FAST_FMAF`, `FP_FAST_FMA`, and `FP_FAST_FMA_HALF`?  Also, just to be explicit, the `fma` built-in function must be correctly rounded regardless of `CL_FP_FMA`, correct?

I think `CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT` is behavioral, and it _allows_ 32-bit floating-point divide and square root to be correctly rounded using the `-cl-fp32-correctly-rounded-divide-sqrt` build option, but it does not indicate whether divides and square roots are correctly rounded _by default_, nor whether correctly rounded divides and square roots are performant.

I think `CL_FP_SOFT_FLOAT` is purely a performance hint.

Some specific recommended actions:

1. Clarify whether `CL_DEVICE_HALF_FP_CONFIG` may omit `CL_FP_INF_NAN` for full profile devices.  Consider generalizing the behavior without `CL_FP_INF_NAN` beyond embedded profile devices and fp32.
2. Figure out what the rounding mode bits mean, how they affect the "default rounding mode", and check the places in the API and OpenCL C spec that refer to the "default rounding mode".  IMHO, this is the most important clarification.
3. Consider whether `CL_DEVICE_DOUBLE_FP_CONFIG` should require `CL_FP_FMA`, given that it is a performance hint.
4. Consider whether `CL_FP_FMA` should tie to the `FP_FAST_FMA` macros in OpenCL C.
5. Clarify that `CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT` indicates that divides and square roots *may* be correctly rounded, not necessarily that they *are* correctly rounded, nor that they are fast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify the device FP config queries (e.g. CL_DEVICE_SINGLE_FP_CONFIG) #1499

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

clarify the device FP config queries (e.g. CL_DEVICE_SINGLE_FP_CONFIG) #1499

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions