-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[IR] Mark vector intrinsics speculatable #162334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-risc-v Author: Ramkumar Ramachandra (artagnon) ChangesThe vector intrinsics in questions have no undefined behavior, and have no other effect besides returning the result, and do return a result: they should hence be marked speculatable and willreturn. At the moment, there are no optimizations that are enabled by these attributes, and the patch is mainly just documentation. Full diff: https://github.com/llvm/llvm-project/pull/162334.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 96da698538314..fef80a88bfbc2 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2422,12 +2422,12 @@ def int_loop_dependence_war_mask:
def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],
- [IntrNoMem]>;
+ [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
def int_experimental_get_vector_length:
DefaultAttrsIntrinsic<[llvm_i32_ty],
[llvm_anyint_ty, llvm_i32_ty, llvm_i1_ty],
- [IntrNoMem,
+ [IntrNoMem, IntrSpeculatable, IntrWillReturn,
ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;
def int_experimental_cttz_elts:
@@ -2614,7 +2614,7 @@ def int_memset_element_unordered_atomic
//===------------------------ Reduction Intrinsics ------------------------===//
//
-let IntrProperties = [IntrNoMem, IntrSpeculatable] in {
+let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
def int_vector_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[LLVMVectorElementType<0>,
@@ -2753,41 +2753,63 @@ def int_preserve_static_offset : DefaultAttrsIntrinsic<[llvm_ptr_ty],
def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>],
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
def int_vector_splice : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
LLVMMatchType<0>,
llvm_i32_ty],
- [IntrNoMem, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<2>>]>;
//===---------- Intrinsics to query properties of scalable vectors --------===//
-def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
+def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [],
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
//===---------- Intrinsics to perform subvector insertion/extraction ------===//
def int_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<2>>]>;
def int_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty, llvm_i64_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<1>>]>;
foreach n = 2...8 in {
def int_vector_interleave#n : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
!listsplat(LLVMOneNthElementsVectorType<0, n>, n),
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
def int_vector_deinterleave#n : DefaultAttrsIntrinsic<!listsplat(LLVMOneNthElementsVectorType<0, n>, n),
[llvm_anyvector_ty],
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
}
//===-------------- Intrinsics to perform partial reduction ---------------===//
def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+ [llvm_anyvector_ty,
+ llvm_anyvector_ty],
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll b/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
index d73900dcd0bea..83b494a1be0d3 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
@@ -2288,7 +2288,7 @@ define void @tgamma_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
}
;.
; CHECK: attributes #[[ATTR0]] = { "target-features"="+v" }
-; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
; CHECK: attributes #[[ATTR2]] = { "vector-function-abi-variant"="_ZGVrNxv_acos(Sleef_acosdx_u10rvvm2)" }
; CHECK: attributes #[[ATTR3]] = { "vector-function-abi-variant"="_ZGVrNxv_acosf(Sleef_acosfx_u10rvvm2)" }
; CHECK: attributes #[[ATTR4]] = { "vector-function-abi-variant"="_ZGVrNxv_acosh(Sleef_acoshdx_u10rvvm2)" }
diff --git a/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll b/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
index 9acc6d6601292..09f583f9242d5 100644
--- a/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
+++ b/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
@@ -39,5 +39,4 @@ declare <4 x float> @llvm.exp.v4f32(<4 x float>) #0
declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>) #0
; CHECK: attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
-; CHECK-NEXT: attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
|
@llvm/pr-subscribers-llvm-transforms Author: Ramkumar Ramachandra (artagnon) ChangesThe vector intrinsics in questions have no undefined behavior, and have no other effect besides returning the result, and do return a result: they should hence be marked speculatable and willreturn. At the moment, there are no optimizations that are enabled by these attributes, and the patch is mainly just documentation. Full diff: https://github.com/llvm/llvm-project/pull/162334.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 96da698538314..fef80a88bfbc2 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2422,12 +2422,12 @@ def int_loop_dependence_war_mask:
def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],
- [IntrNoMem]>;
+ [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
def int_experimental_get_vector_length:
DefaultAttrsIntrinsic<[llvm_i32_ty],
[llvm_anyint_ty, llvm_i32_ty, llvm_i1_ty],
- [IntrNoMem,
+ [IntrNoMem, IntrSpeculatable, IntrWillReturn,
ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;
def int_experimental_cttz_elts:
@@ -2614,7 +2614,7 @@ def int_memset_element_unordered_atomic
//===------------------------ Reduction Intrinsics ------------------------===//
//
-let IntrProperties = [IntrNoMem, IntrSpeculatable] in {
+let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
def int_vector_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[LLVMVectorElementType<0>,
@@ -2753,41 +2753,63 @@ def int_preserve_static_offset : DefaultAttrsIntrinsic<[llvm_ptr_ty],
def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>],
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
def int_vector_splice : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
LLVMMatchType<0>,
llvm_i32_ty],
- [IntrNoMem, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<2>>]>;
//===---------- Intrinsics to query properties of scalable vectors --------===//
-def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
+def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [],
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
//===---------- Intrinsics to perform subvector insertion/extraction ------===//
def int_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<2>>]>;
def int_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty, llvm_i64_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn,
+ ImmArg<ArgIndex<1>>]>;
foreach n = 2...8 in {
def int_vector_interleave#n : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
!listsplat(LLVMOneNthElementsVectorType<0, n>, n),
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
def int_vector_deinterleave#n : DefaultAttrsIntrinsic<!listsplat(LLVMOneNthElementsVectorType<0, n>, n),
[llvm_anyvector_ty],
- [IntrNoMem]>;
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
}
//===-------------- Intrinsics to perform partial reduction ---------------===//
def int_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
- [llvm_anyvector_ty, llvm_anyvector_ty],
- [IntrNoMem]>;
+ [llvm_anyvector_ty,
+ llvm_anyvector_ty],
+ [IntrNoMem,
+ IntrSpeculatable,
+ IntrWillReturn]>;
//===----------------- Pointer Authentication Intrinsics ------------------===//
//
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll b/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
index d73900dcd0bea..83b494a1be0d3 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/veclib-function-calls.ll
@@ -2288,7 +2288,7 @@ define void @tgamma_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
}
;.
; CHECK: attributes #[[ATTR0]] = { "target-features"="+v" }
-; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
; CHECK: attributes #[[ATTR2]] = { "vector-function-abi-variant"="_ZGVrNxv_acos(Sleef_acosdx_u10rvvm2)" }
; CHECK: attributes #[[ATTR3]] = { "vector-function-abi-variant"="_ZGVrNxv_acosf(Sleef_acosfx_u10rvvm2)" }
; CHECK: attributes #[[ATTR4]] = { "vector-function-abi-variant"="_ZGVrNxv_acosh(Sleef_acoshdx_u10rvvm2)" }
diff --git a/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll b/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
index 9acc6d6601292..09f583f9242d5 100644
--- a/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
+++ b/llvm/test/Transforms/PreISelIntrinsicLowering/AArch64/expand-exp.ll
@@ -39,5 +39,4 @@ declare <4 x float> @llvm.exp.v4f32(<4 x float>) #0
declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>) #0
; CHECK: attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
-; CHECK-NEXT: attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to test this with a case where we hoist a speculative vector intrinsic (and not hoist the remainder/division
I think we hoist based on Instruction::mayHaveSideEffects, which is unrelated to speculatable in LICM? Before the change, I have the following LICM test: not sure what I'm missing? define i32 @vp_udiv(<2 x i32> %inv.q, <2 x i32> %inv.d) {
; CHECK-LABEL: define i32 @vp_udiv(
; CHECK-SAME: <2 x i32> [[INV_Q:%.*]], <2 x i32> [[INV_D:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: [[VP_UDIV:%.*]] = call <2 x i32> @llvm.vp.udiv.v2i32(<2 x i32> [[INV_Q]], <2 x i32> [[INV_D]], <2 x i1> splat (i1 true), i32 2)
; CHECK-NEXT: [[EXTRACT:%.*]] = extractelement <2 x i32> [[VP_UDIV]], i32 0
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV]], [[EXTRACT]]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
; CHECK-NEXT: br i1 [[LOOP_COND]], label %[[LOOP]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: [[IV_LCSSA:%.*]] = phi i32 [ [[IV]], %[[LOOP]] ]
; CHECK-NEXT: ret i32 [[IV_LCSSA]]
;
entry:
br label %loop
loop:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
%vp.udiv = call <2 x i32> @llvm.vp.udiv.v2i32(<2 x i32> %inv.q, <2 x i32> %inv.d, <2 x i1> splat (i1 1), i32 2)
%extract = extractelement <2 x i32> %vp.udiv, i32 0
%loop.cond = icmp ult i32 %iv, %extract
%iv.next = add i32 %iv, 1
br i1 %loop.cond, label %loop, label %exit
exit:
ret i32 %iv
} |
@artagnon You need to place the intrinsic in a code path that is not unconditionally executed on loop entry. (E.g. after early exit or abnormal exit.) |
The vector intrinsics in questions have no undefined behavior, and have no other effect besides returning the result, and do return a result: they should hence be marked speculatable and willreturn. At the moment, there are no optimizations that are enabled by these attributes, and the patch is mainly just documentation.
f233c12
to
1dcf6ff
Compare
// CHECK-C-NEXT: [[ADD:%.*]] = add i32 [[SLICE_BASE]], [[TMP3]] | ||
// CHECK-C-NEXT: [[TMP5:%.*]] = add i32 [[ADD]], 15 | ||
// CHECK-C-NEXT: tail call void @llvm.aarch64.sme.ld1b.horiz(<vscale x 16 x i1> [[PG]], ptr [[TMP2]], i32 0, i32 [[TMP5]]) | ||
// CHECK-C-NEXT: [[SVL:%.*]] = shl nuw nsw i64 [[TMP0]], 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the changes here. How does marking intrinsics speculatable result in more nowrap flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstCombine is doing this:
*** IR Dump After InstCombinePass on test_svstnt1_vnum_u8_x2 ***
entry:
- %0 = insertvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } poison, <vscale x 16 x i8> %v.coerce0, 0
- %1 = insertvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %0, <vscale x 16 x i8> %v.coerce1, 1
- %2 = getelementptr <vscale x 16 x i8>, ptr %base, i64 %vnum
+ %0 = call i64 @llvm.vscale.i64()
+ %1 = shl nuw nsw i64 %0, 4
+ %.idx = mul i64 %vnum, %1
+ %2 = getelementptr i8, ptr %base, i64 %.idx
call void @llvm.aarch64.sve.stnt1.pn.x2.nxv16i8(<vscale x 16 x i8> %v.coerce0, <vscale x 16 x i8> %v.coerce1, target("aarch64.svcount") %pn, ptr %2)
ret void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not able to figure out the exact code snippet in InstCombine that's responsible for this :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like ptradd canonicalization. But I still don't understand why this patch changes anything about the nowrap handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the variable names I wonder if it's a case of no longer loosing the wrap flags rather than gaining them? From CodeGenFunction::EmitSMELd1St1
:
llvm::Value *StreamingVectorLengthCall =
Builder.CreateMul(Builder.CreateCall(StreamingVectorLength),
llvm::ConstantInt::get(Int64Ty, 8), "svl",
/* HasNUW */ true, /* HasNSW */ true);
llvm::Value *Mulvl =
Builder.CreateMul(StreamingVectorLengthCall, Ops[4], "mulvl");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I finally found the piece of code that sets nuw/nsw. First, we do PtrAdd canonicalization like Nikita said:
Value *llvm::emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL,
User *GEP, bool NoAssumptions) {
// ...
// nusw implies nsw for the offset arithmetic.
bool NSW = GEPOp->hasNoUnsignedSignedWrap() && !NoAssumptions;
bool NUW = GEPOp->hasNoUnsignedWrap() && !NoAssumptions;
Here, nuw/nsw doesn't get set because the original GEP doesn't have nuw/nusw. However, this creates a shufflevector, which is later handled by:
// Splatting the first element of the result of a BinOp, where any of the
// BinOp's operands are the result of a first element splat can be simplified to
// splatting the first element of the result of the BinOp
Instruction *InstCombinerImpl::simplifyBinOpSplats(ShuffleVectorInst &SVI) {
// ...
auto *BinOp = cast<BinaryOperator>(Op0);
if (!isSafeToSpeculativelyExecuteWithVariableReplaced(BinOp))
return nullptr;
Value *NewBO = Builder.CreateBinOp(BinOp->getOpcode(), X, Y);
if (auto NewBOI = dyn_cast<Instruction>(NewBO))
NewBOI->copyIRFlags(BinOp);
So yes, I think it is a case of preserving existing nowrap flags, as Paul remarked.
EDIT: isSafeToSpeculativelyExecute has to be called on the vscale (or a CallInst, which is not a BinOp), for there to be changes. Even after hours of digging, it is still mysterious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The vector intrinsics in question have no undefined behavior, and have no other effect besides returning the result: they should hence be marked speculatable.
The vector intrinsics in question have no undefined behavior, and have no other effect besides returning the result: they should hence be marked speculatable.
The vector intrinsics in question have no undefined behavior, and have no other effect besides returning the result: they should hence be marked speculatable.