Skip to content

Conversation

@wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Sep 9, 2025

Remove the dependency on the libclc build-time configuration for __clc_fp*_subnormals_supported. The check is now implemented with LLVM intrinsics so it can be resolved during target lowering or at runtime.
Improve __clc_flush_denormal_if_not_supported implementation as well. It doesn't use __clc_fp*_subnormals_supported which canonicalizes sNaN and thus the new implementation is more foldable.
-fdenormal-fp-math-f32=dynamic build flag is required to defer denormal handling.
Remove cmake option ENABLE_RUNTIME_SUBNORMAL and related code.

Deleted clc_sw_fma since software emulation is not desired.
__clc_fma is now implemented with __builtin_elementwise_fma.

Resolves #153148

…al_if_not_supported

Remove the dependency on the libclc build-time configuration for
__clc_fp*_subnormals_supported. The check is now implemented with LLVM
intrinsics so it can be resolved during target lowering or at runtime.

Improve __clc_flush_denormal_if_not_supported implementation as well.
It doesn't use __clc_fp*_subnormals_supported which canonicalizes sNaN
and thus the new implementation is more foldable.

Remove cmake option ENABLE_RUNTIME_SUBNORMAL and related code.

Resolves llvm#153148

Co-authored-by: Matt Arsenault <[email protected]>
@llvmbot llvmbot added the libclc libclc OpenCL library label Sep 9, 2025
@wenju-he wenju-he requested a review from arsenm September 9, 2025 09:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the subnormal support detection and handling in libclc by replacing build-time configuration with runtime intrinsic-based checks. The implementation now uses LLVM's __builtin_canonicalizef and __builtin_isfpclass intrinsics to determine subnormal support at runtime rather than relying on cmake configuration options.

Key changes:

  • Replaces build-time subnormal configuration with runtime intrinsic-based detection
  • Removes cmake option ENABLE_RUNTIME_SUBNORMAL and related build logic
  • Improves __clc_flush_denormal_if_not_supported implementation to be more optimizable

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
libclc/clc/lib/generic/math/clc_subnormal_config.cl New implementation using LLVM intrinsics for runtime subnormal detection
libclc/clc/include/clc/math/clc_subnormal_config.h Removes unused function declaration
libclc/clc/include/clc/math/math.h Refactors flush denormal function and removes header dependency
Multiple math files Removes unnecessary include of subnormal config header
Multiple SOURCES files Removes subnormal config files from build lists
libclc/CMakeLists.txt Removes cmake option and related build configuration

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the build using -Xclang -fdenormal-fp-math-f32=dynamic already? If not it should do that globally

@wenju-he
Copy link
Contributor Author

wenju-he commented Sep 9, 2025

Is the build using -Xclang -fdenormal-fp-math-f32=dynamic already? If not it should do that globally

Thanks, I forgot that part. Done in 4608f77

@wenju-he wenju-he requested a review from arsenm September 9, 2025 10:55
Comment on lines 68 to 72
// Avoid calling __clc_fp32_subnormals_supported here: it uses
// llvm.canonicalize, which quiets sNaN.
return __builtin_fabsf(x) < 0x1p-149f
? __builtin_elementwise_copysign(0.0f, x)
: x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer matches the name or behavior. This is an unconditional flush to zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If denormal is not supported

  • returns signed zero for denormal input since __builtin_fabsf result is 0
  • return x if x is not denormal

If denormal is supported

  • returns x as is regardless if x is denormal or not.

So this function only flushes to zero if denormal is not supported, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no support check here. __builtin_fabsf(x) < 0x1p-149f will be true for 0 or a denormal value, regardless of whether the fcmp flushes the input

Also you can use __builtin_elementwise_abs to be consistently using elementwise builtins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no support check here. __builtin_fabsf(x) < 0x1p-149f will be true for 0 or a denormal value, regardless of whether the fcmp flushes the input

Also you can use __builtin_elementwise_abs to be consistently using elementwise builtins

thanks @arsenm. Now I see what you mean. Renamed __clc_flush_denormal_if_not_supported to __clc_soft_flush_denormal in 3f665ce and use __builtin_elementwise_abs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed __clc_flush_denormal_if_not_supported to __clc_soft_flush_denormal in 3f665ce and use __builtin_elementwise_abs

kindly ping @arsenm for review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is now right, but the usage is not. You do not want to unconditionally force a denormal flush, pretty much anywhere.

I also believe most of the explicit flushes / canonicalizes in the math function implementations can be deleted with some shuffling around. I removed most of these in the rocm-device-libs implementations a few years ago. A few more remain but I believe they can also be avoided

@wenju-he wenju-he requested a review from arsenm September 10, 2025 23:03
Comment on lines 130 to 132
a = __clc_soft_flush_denormal(a);
b = __clc_soft_flush_denormal(b);
c = __clc_soft_flush_denormal(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally forcing flush of denormals is not desirable. In this context I'm not sure why it's trying to flush in the first place.

The below code extracting the exponent can be replaced with frexp, and the return c on the above paths is missing a canonicalize.

But on a deeper level I don't think libclc should be trying to provide a software FMA implementation in the first place; that's a decision for the compiler when codegening llvm.fma, surely compiler-rt already has an implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally forcing flush of denormals is not desirable. In this context I'm not sure why it's trying to flush in the first place.

The below code extracting the exponent can be replaced with frexp, and the return c on the above paths is missing a canonicalize.

But on a deeper level I don't think libclc should be trying to provide a software FMA implementation in the first place; that's a decision for the compiler when codegening llvm.fma, surely compiler-rt already has an implementation?

Deleted clc_sw_fma in 7b290a2

Now clc_fma is implemented with __builtin_elementwise_fma

Copy link
Contributor

@Maetveis Maetveis Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely compiler-rt already has an implementation?

It doesn't. LLVM libc has one but it uses FP64, so I don't think it is of much help. I'd expect most targets that don't have hardware fma don't have fp64 either.

I think dropping sw fma would impact:

  • SPIR-V, which then starts generating the GLS.std.450 extended instruction FMA. The problem there is that instruction is (AFAICT) allowed to round intermediate products, but the OpenCL spec doesn't allow that. I'm not sure if drivers actually implement it as fused or not.
    Arguably the lowering @llvm.fma to this instruction is bug in LLVM as @llvm.fma is specified to be fused without fast math flags.
  • Not all old R600 targets have FMA, I think this change would be breaking them. These are >10 years old GPUs at this point though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think r600 always had FMA, it's just not "fast" on all of them. In any case, the backed is obligated to implement llvm.fma correctly

@wenju-he wenju-he requested review from Maetveis and arsenm October 3, 2025 06:03
@wenju-he wenju-he requested a review from Maetveis October 3, 2025 07:05
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dropping the soft FMA is beyond the scope of this patch, but it is something I think should be done

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
_CLC_DEF bool __clc_fp64_subnormals_supported() {
#ifdef CLC_SPIRV
// SPIR-V doesn't support llvm.canonicalize for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just fix that instead of special casing it here? It's not difficult to implement

Comment on lines +11 to +12
x = __clc_soft_flush_denormal(x);
y = __clc_soft_flush_denormal(y);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. In the rocm-device-libs version of this, I managed to delete the explicit canonicalizes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@wenju-he wenju-he Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ROCm/llvm-project/blob/0e9e3946cb257d1ed7b119333db451805865b36b/amd/device-libs/ocml/src/remainderF_base.h#L47

See this series of patches: ROCm@b3beb93 ROCm@9a7bc19 ROCm@e9198f7

Should just copy what these did

I have tried to port both libclc __clc_remquo and ocml remquo2 to replace intel gpu implementation at https://github.com/intel/intel-graphics-compiler/blob/fc97dc482697b320667a52914f1225556f0856e8/IGC/BiFModule/Implementation/Math/remquo.cl#L12-L104, however, the ported code can't pass OpenCL CTS test ./test_bruteforce remquo on intel gpu.
Can I copy intel gpu implementation to overwrite libclc __clc_remquo?

Comment on lines 68 to 72
// Avoid calling __clc_fp32_subnormals_supported here: it uses
// llvm.canonicalize, which quiets sNaN.
return __builtin_fabsf(x) < 0x1p-149f
? __builtin_elementwise_copysign(0.0f, x)
: x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is now right, but the usage is not. You do not want to unconditionally force a denormal flush, pretty much anywhere.

I also believe most of the explicit flushes / canonicalizes in the math function implementations can be deleted with some shuffling around. I removed most of these in the rocm-device-libs implementations a few years ago. A few more remain but I believe they can also be avoided

}
return x;
_CLC_OVERLOAD _CLC_INLINE float __clc_soft_flush_denormal(float x) {
// Avoid calling __clc_fp32_subnormals_supported here: it uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have less trouble just using canonicalize for now and trying to relax it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have less trouble just using canonicalize for now and trying to relax it later

do you mean reverting __clc_soft_flush_denormal to use __clc_fp32_subnormals_supported which uses llvm.canonicalize, or just replacing use of __clc_fp32_subnormals_supported with __builtin_elementwise_canonicalize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libclc should remove __CLC_SUBNORMAL_DISABLE and reimplement __clc_fp32_subnormals_supported

4 participants