Skip to content

Conversation

Varnike
Copy link

@Varnike Varnike commented Sep 25, 2025

Added new register FPSCR_RM to correctly model interactions with rounding mode control bits of fpscr and to avoid performance regressions in normal non-strictfp case

This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.

…l instructions that read/write fpscr rounding bits as doing so

Added new register FPSCR_RM to correctly model interactions with rounding
mode control bits of fpscr and avoid performance degradation for
normal (non-strictfp) case.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Erik Enikeev (Varnike)

Changes

Added new register FPSCR_RM to correctly model interactions with rounding mode control bits of fpscr and to avoid performance regressions in normal non-strictfp case

This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.


Patch is 160.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160698.diff

27 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp (+1)
  • (modified) llvm/lib/Target/ARM/ARMInstrVFP.td (+62-28)
  • (modified) llvm/lib/Target/ARM/ARMRegisterInfo.td (+6-2)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/arm-instruction-select-combos.mir (+8-8)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/select-fp.mir (+208-180)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/select-pr35926.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/bf16_fast_math.ll (+9-9)
  • (modified) llvm/test/CodeGen/ARM/cmse-vlldm-no-reorder.mir (+2-2)
  • (modified) llvm/test/CodeGen/ARM/cortex-m7-wideops.mir (+9-8)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool-thumb.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool2-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool3-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16_fast_math.ll (+43-43)
  • (modified) llvm/test/CodeGen/ARM/ipra-reg-usage.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/misched-prevent-erase-history-of-subunits.mir (+2-2)
  • (modified) llvm/test/CodeGen/ARM/vlldm-vlstm-uops.mir (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/emptyblock.mir (+34-34)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-mov.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/lstp-insertion-position.mir (+6-6)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-after-dlstp.mir (+4-4)
  • (modified) llvm/test/CodeGen/Thumb2/pipeliner-inlineasm.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/scavenge-lr.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-fixedii-le.mir (+6-6)
  • (modified) llvm/test/CodeGen/Thumb2/swp-fixedii.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-regpressure.mir (+80-80)
diff --git a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
index e94220af05a0d..e2404397cc8c2 100644
--- a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
@@ -232,6 +232,7 @@ getReservedRegs(const MachineFunction &MF) const {
   markSuperRegs(Reserved, ARM::SP);
   markSuperRegs(Reserved, ARM::PC);
   markSuperRegs(Reserved, ARM::FPSCR);
+  markSuperRegs(Reserved, ARM::FPSCR_RM);
   markSuperRegs(Reserved, ARM::APSR_NZCV);
   if (TFI->isFPReserved(MF))
     markSuperRegs(Reserved, STI.getFramePointerReg());
diff --git a/llvm/lib/Target/ARM/ARMInstrVFP.td b/llvm/lib/Target/ARM/ARMInstrVFP.td
index 31650e0137beb..bc51e99412422 100644
--- a/llvm/lib/Target/ARM/ARMInstrVFP.td
+++ b/llvm/lib/Target/ARM/ARMInstrVFP.td
@@ -338,7 +338,7 @@ def : MnemonicAlias<"vstm", "vstmia">;
 
 def VLLDM : AXSI4FR<"vlldm${p}\t$Rn, $regs", 0, 1>,
             Requires<[HasV8MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
 // T1: assembly does not contains the register list.
@@ -348,7 +348,7 @@ def : InstAlias<"vlldm${p}\t$Rn", (VLLDM GPRnopc:$Rn, pred:$p, 0)>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLLDM_T2 : AXSI4FR<"vlldm${p}\t$Rn, $regs", 1, 1>,
             Requires<[HasV8_1MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
                                         D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29, D30, D31];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
@@ -356,8 +356,8 @@ def VLLDM_T2 : AXSI4FR<"vlldm${p}\t$Rn, $regs", 1, 1>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLSTM : AXSI4FR<"vlstm${p}\t$Rn, $regs", 0, 0>,
             Requires<[HasV8MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV];
-    let Uses = [VPR, FPSCR, FPSCR_NZCV, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM];
+    let Uses = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
 // T1: assembly does not contain the register list.
@@ -367,8 +367,8 @@ def : InstAlias<"vlstm${p}\t$Rn", (VLSTM GPRnopc:$Rn, pred:$p, 0)>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLSTM_T2 : AXSI4FR<"vlstm${p}\t$Rn, $regs", 1, 0>,
             Requires<[HasV8_1MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV];
-    let Uses = [VPR, FPSCR, FPSCR_NZCV, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM];
+    let Uses = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
                                         D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29, D30, D31];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
@@ -435,14 +435,14 @@ def : VFP2MnemonicAlias<"fstmfdx", "fstmdbx">;
 // FP Binary Operations.
 //
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDD  : ADbI<0b11100, 0b11, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpALU64, "vadd", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fadd DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPALU64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDS  : ASbIn<0b11100, 0b11, 0, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpALU32, "vadd", ".f32\t$Sd, $Sn, $Sm",
@@ -453,21 +453,21 @@ def VADDS  : ASbIn<0b11100, 0b11, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDH  : AHbI<0b11100, 0b11, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpALU16, "vadd", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fadd (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPALU32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBD  : ADbI<0b11100, 0b11, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpALU64, "vsub", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fsub DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPALU64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBS  : ASbIn<0b11100, 0b11, 1, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpALU32, "vsub", ".f32\t$Sd, $Sn, $Sm",
@@ -478,42 +478,42 @@ def VSUBS  : ASbIn<0b11100, 0b11, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBH  : AHbI<0b11100, 0b11, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpALU16, "vsub", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fsub (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
             Sched<[WriteFPALU32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VDIVD  : ADbI<0b11101, 0b00, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpDIV64, "vdiv", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fdiv DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPDIV64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VDIVS  : ASbI<0b11101, 0b00, 0, 0,
                   (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                   IIC_fpDIV32, "vdiv", ".f32\t$Sd, $Sn, $Sm",
                   [(set SPR:$Sd, (fdiv SPR:$Sn, SPR:$Sm))]>,
              Sched<[WriteFPDIV32]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM]  in
 def VDIVH  : AHbI<0b11101, 0b00, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpDIV16, "vdiv", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fdiv (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPDIV32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULD  : ADbI<0b11100, 0b10, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpMUL64, "vmul", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fmul DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPMUL64, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULS  : ASbIn<0b11100, 0b10, 0, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpMUL32, "vmul", ".f32\t$Sd, $Sn, $Sm",
@@ -524,21 +524,21 @@ def VMULS  : ASbIn<0b11100, 0b10, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULH  : AHbI<0b11100, 0b10, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpMUL16, "vmul", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fmul (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPMUL32, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULD : ADbI<0b11100, 0b10, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpMUL64, "vnmul", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fneg (fmul DPR:$Dn, (f64 DPR:$Dm))))]>,
              Sched<[WriteFPMUL64, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULS : ASbI<0b11100, 0b10, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                   IIC_fpMUL32, "vnmul", ".f32\t$Sd, $Sn, $Sm",
@@ -549,7 +549,7 @@ def VNMULS : ASbI<0b11100, 0b10, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULH : AHbI<0b11100, 0b10, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpMUL16, "vnmul", ".f16\t$Sd, $Sn, $Sm",
@@ -621,7 +621,7 @@ def : Pat<(fmul (fneg SPR:$a), SPR:$b),
           (VNMULS SPR:$a, SPR:$b)>, Requires<[NoHonorSignDependentRounding]>;
 
 // These are encoded as unary instructions.
-let Defs = [FPSCR_NZCV] in {
+let Defs = [FPSCR_NZCV], mayRaiseFPException = 1, Uses = [FPSCR_RM] in {
 def VCMPED : ADuI<0b11101, 0b11, 0b0100, 0b11, 0,
                   (outs), (ins DPR:$Dd, DPR:$Dm),
                   IIC_fpCMP64, "vcmpe", ".f64\t$Dd, $Dm", "",
@@ -684,7 +684,7 @@ def VABSH  : AHuI<0b11101, 0b11, 0b0000, 0b11, 0,
                    IIC_fpUNA16, "vabs", ".f16\t$Sd, $Sm",
                    [(set (f16 HPR:$Sd), (fabs (f16 HPR:$Sm)))]>;
 
-let Defs = [FPSCR_NZCV] in {
+let Defs = [FPSCR_NZCV], mayRaiseFPException = 1, Uses = [FPSCR_RM] in {
 def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
                    (outs), (ins DPR:$Dd),
                    IIC_fpCMP64, "vcmpe", ".f64\t$Dd, #0", "",
@@ -742,6 +742,7 @@ def VCMPZH  : AHuI<0b11101, 0b11, 0b0101, 0b01, 0,
 }
 } // Defs = [FPSCR_NZCV]
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTDS  : ASuI<0b11101, 0b11, 0b0111, 0b11, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    IIC_fpCVTDS, "vcvt", ".f64.f32\t$Dd, $Sm", "",
@@ -762,6 +763,7 @@ def VCVTDS  : ASuI<0b11101, 0b11, 0b0111, 0b11, 0,
 }
 
 // Special case encoding: bits 11-8 is 0b1011.
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTSD  : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
                     IIC_fpCVTSD, "vcvt", ".f32.f64\t$Sd, $Dm", "",
                     [(set SPR:$Sd, (fpround DPR:$Dm))]>,
@@ -787,7 +789,7 @@ def VCVTSD  : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
 }
 
 // Between half, single and double-precision.
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
                  /* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm", "",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -799,7 +801,7 @@ def : FP16Pat<(f32 (fpextend (f16 HPR:$Sm))),
 def : FP16Pat<(f16_to_fp GPR:$a),
               (VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sda, SPR:$Sm),
                  /* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm", "$Sd = $Sda",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -821,7 +823,7 @@ def : FP16Pat<(insertelt (v4f16 DPR:$src1), (f16 (fpround (f32 SPR:$src2))), imm
                                              SPR:$src2),
                                     (SSubReg_f16_reg imm:$lane)))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),
                  /* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm", "",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -835,7 +837,7 @@ def : FP16Pat<(f32 (fpextend (extractelt (v4f16 DPR:$src), imm_odd:$lane))),
                 (v2f32 (COPY_TO_REGCLASS (v4f16 DPR:$src), DPR_VFP2)),
                 (SSubReg_f16_reg imm_odd:$lane)))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTSH: ASuI<0b11101, 0b11, 0b0011, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sda, SPR:$Sm),
                  /* FIXME */ IIC_fpCVTHS, "vcvtt", ".f16.f32\t$Sd, $Sm", "$Sd = $Sda",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -853,6 +855,7 @@ def : FP16Pat<(insertelt (v4f16 DPR:$src1), (f16 (fpround (f32 SPR:$src2))), imm
                                              SPR:$src2),
                                     (SSubReg_f16_reg imm:$lane)))>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in 
 def VCVTBHD : ADuI<0b11101, 0b11, 0b0010, 0b01, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    NoItinerary, "vcvtb", ".f64.f16\t$Dd, $Sm", "",
@@ -876,6 +879,7 @@ def : FP16Pat<(f64 (f16_to_fp GPR:$a)),
               (VCVTBHD (COPY_TO_REGCLASS GPR:$a, SPR))>,
               Requires<[HasFPARMv8, HasDPVFP]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBDH : ADuI<0b11101, 0b11, 0b0011, 0b01, 0,
                    (outs SPR:$Sd), (ins SPR:$Sda, DPR:$Dm),
                    NoItinerary, "vcvtb", ".f16.f64\t$Sd, $Dm", "$Sd = $Sda",
@@ -901,6 +905,7 @@ def : FP16Pat<(fp_to_f16 (f64 DPR:$a)),
               (i32 (COPY_TO_REGCLASS (VCVTBDH (IMPLICIT_DEF), DPR:$a), GPR))>,
                    Requires<[HasFPARMv8, HasDPVFP]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTHD : ADuI<0b11101, 0b11, 0b0010, 0b11, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    NoItinerary, "vcvtt", ".f64.f16\t$Dd, $Sm", "",
@@ -915,6 +920,7 @@ def VCVTTHD : ADuI<0b11101, 0b11, 0b0010, 0b11, 0,
   let hasSideEffects = 0;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTDH : ADuI<0b11101, 0b11, 0b0011, 0b11, 0,
                    (outs SPR:$Sd), (ins SPR:$Sda, DPR:$Dm),
                    NoItinerary, "vcvtt", ".f16.f64\t$Sd, $Dm", "$Sd = $Sda",
@@ -1140,18 +1146,21 @@ defm VRINTN : vrint_inst_anpm<"n", 0b01, froundeven>;
 defm VRINTP : vrint_inst_anpm<"p", 0b10, fceil>;
 defm VRINTM : vrint_inst_anpm<"m", 0b11, ffloor>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTD : ADuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs DPR:$Dd), (ins DPR:$Dm),
                   IIC_fpSQRT64, "vsqrt", ".f64\t$Dd, $Dm", "",
                   [(set DPR:$Dd, (fsqrt (f64 DPR:$Dm)))]>,
              Sched<[WriteFPSQRT64]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTS : ASuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs SPR:$Sd), (ins SPR:$Sm),
                   IIC_fpSQRT32, "vsqrt", ".f32\t$Sd, $Sm", "",
                   [(set SPR:$Sd, (fsqrt SPR:$Sm))]>,
              Sched<[WriteFPSQRT32]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTH : AHuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs HPR:$Sd), (ins HPR:$Sm),
                   IIC_fpSQRT16, "vsqrt", ".f16\t$Sd, $Sm",
@@ -1757,7 +1766,7 @@ def : VFPPat<(i32 (fp_to_uint_sat (f16 HPR:$a), i32)),
              (COPY_TO_REGCLASS (VTOUIZH (f16 HPR:$a)), GPR)>;
 
 // And the Z bit '0' variants, i.e. use the rounding mode specified by FPSCR.
-let Uses = [FPSCR] in {
+let Uses = [FPSCR_RM] in {
 def VTOSIRD : AVConv1IsD_Encode<0b11101, 0b11, 0b1101, 0b1011,
                                 (outs SPR:$Sd), (ins DPR:$Dm),
                                 IIC_fpCVTDI, "vcvtr", ".s32.f64\t$Sd, $Dm",
@@ -2029,6 +2038,7 @@ def VULTOD : AVConv1XInsD_Encode<0b11101, 0b11, 0b1011, 0b1011, 1,
 } // End of 'let Constraints = "$a = $dst" in'
 
 // BFloat16  - Single precision, unary, predicated
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 class BF16_VCVT<string opc, bits<2> op7_6>
    : VFPAI<(outs SPR:$Sd), (ins SPR:$dst, SPR:$Sm),
            VFPUnaryFrm, NoItinerary,
@@ -2063,6 +2073,7 @@ def BF16_VCVTT : BF16_VCVT<"vcvtt", 0b11>;
 // FP Multiply-Accumulate Operations.
 //
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAD : ADbI<0b11100, 0b00, 0, 0,
                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                  IIC_fpMAC64, "vmla", ".f64\t$Dd, $Dn, $Dm",
@@ -2072,6 +2083,7 @@ def VMLAD : ADbI<0b11100, 0b00, 0, 0,
               Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
               Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAS : ASbIn<0b11100, 0b00, 0, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vmla", ".f32\t$Sd, $Sn, $Sm",
@@ -2085,6 +2097,7 @@ def VMLAS : ASbIn<0b11100, 0b00, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAH : AHbI<0b11100, 0b00, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vmla", ".f16\t$Sd, $Sn, $Sm",
@@ -2104,6 +2117,7 @@ def : Pat<(fadd_mlx HPR:$dstin, (fmul_su (f16 HPR:$a), HPR:$b)),
           Requires<[HasFullFP16,DontUseNEONForFP, UseFPVMLx]>;
 
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSD : ADbI<0b11100, 0b00, 1, 0,
                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                  IIC_fpMAC64, "vmls", ".f64\t$Dd, $Dn, $Dm",
@@ -2113,6 +2127,7 @@ def VMLSD : ADbI<0b11100, 0b00, 1, 0,
               Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
               Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSS : ASbIn<0b11100, 0b00, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vmls", ".f32\t$Sd, $Sn, $Sm",
@@ -2126,6 +2141,7 @@ def VMLSS : ASbIn<0b11100, 0b00, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSH : AHbI<0b11100, 0b00, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vmls", ".f16\t$Sd, $Sn, $Sm",
@@ -2144,6 +2160,7 @@ def : Pat<(fsub_mlx HPR:$dstin, (fmul_su (f16 HPR:$a), HPR:$b)),
           (VMLSH HPR:$dstin, (f16 HPR:$a), HPR:$b)>,
           Requires<[HasFullFP16,DontUseNEONForFP,UseFPVMLx]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAD : ADbI<0b11100, 0b01, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                   IIC_fpMAC64, "vnmla", ".f64\t$Dd, $Dn, $Dm",
@@ -2153,6 +2170,7 @@ def VNMLAD : ADbI<0b11100, 0b01, 1, 0,
                 Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
                 Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAS : ASbI<0b11100, 0b01, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vnmla", ".f32\t$Sd, $Sn, $Sm",
@@ -2166,6 +2184,7 @@ def VNMLAS : ASbI<0b11100, 0b01, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAH : AHbI<0b11100, 0b01, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vnmla", ".f16\t$Sd, $Sn, $Sm",
@@ -2196,6 +2215,7 @@ def : Pat<(fsub_mlx (fneg HPR:$dstin), (fmul_su (f16 HPR:$a), HPR:$b)),
           (VNMLAH HPR:$dstin, (f16 HP...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-backend-arm

Author: Erik Enikeev (Varnike)

Changes

Added new register FPSCR_RM to correctly model interactions with rounding mode control bits of fpscr and to avoid performance regressions in normal non-strictfp case

This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.


Patch is 160.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160698.diff

27 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp (+1)
  • (modified) llvm/lib/Target/ARM/ARMInstrVFP.td (+62-28)
  • (modified) llvm/lib/Target/ARM/ARMRegisterInfo.td (+6-2)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/arm-instruction-select-combos.mir (+8-8)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/select-fp.mir (+208-180)
  • (modified) llvm/test/CodeGen/ARM/GlobalISel/select-pr35926.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/bf16_fast_math.ll (+9-9)
  • (modified) llvm/test/CodeGen/ARM/cmse-vlldm-no-reorder.mir (+2-2)
  • (modified) llvm/test/CodeGen/ARM/cortex-m7-wideops.mir (+9-8)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool-thumb.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool2-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16-litpool3-arm.mir (+1-1)
  • (modified) llvm/test/CodeGen/ARM/fp16_fast_math.ll (+43-43)
  • (modified) llvm/test/CodeGen/ARM/ipra-reg-usage.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/misched-prevent-erase-history-of-subunits.mir (+2-2)
  • (modified) llvm/test/CodeGen/ARM/vlldm-vlstm-uops.mir (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/emptyblock.mir (+34-34)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-mov.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/lstp-insertion-position.mir (+6-6)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-after-dlstp.mir (+4-4)
  • (modified) llvm/test/CodeGen/Thumb2/pipeliner-inlineasm.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/scavenge-lr.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-fixedii-le.mir (+6-6)
  • (modified) llvm/test/CodeGen/Thumb2/swp-fixedii.mir (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/swp-regpressure.mir (+80-80)
diff --git a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
index e94220af05a0d..e2404397cc8c2 100644
--- a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
@@ -232,6 +232,7 @@ getReservedRegs(const MachineFunction &MF) const {
   markSuperRegs(Reserved, ARM::SP);
   markSuperRegs(Reserved, ARM::PC);
   markSuperRegs(Reserved, ARM::FPSCR);
+  markSuperRegs(Reserved, ARM::FPSCR_RM);
   markSuperRegs(Reserved, ARM::APSR_NZCV);
   if (TFI->isFPReserved(MF))
     markSuperRegs(Reserved, STI.getFramePointerReg());
diff --git a/llvm/lib/Target/ARM/ARMInstrVFP.td b/llvm/lib/Target/ARM/ARMInstrVFP.td
index 31650e0137beb..bc51e99412422 100644
--- a/llvm/lib/Target/ARM/ARMInstrVFP.td
+++ b/llvm/lib/Target/ARM/ARMInstrVFP.td
@@ -338,7 +338,7 @@ def : MnemonicAlias<"vstm", "vstmia">;
 
 def VLLDM : AXSI4FR<"vlldm${p}\t$Rn, $regs", 0, 1>,
             Requires<[HasV8MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
 // T1: assembly does not contains the register list.
@@ -348,7 +348,7 @@ def : InstAlias<"vlldm${p}\t$Rn", (VLLDM GPRnopc:$Rn, pred:$p, 0)>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLLDM_T2 : AXSI4FR<"vlldm${p}\t$Rn, $regs", 1, 1>,
             Requires<[HasV8_1MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
                                         D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29, D30, D31];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
@@ -356,8 +356,8 @@ def VLLDM_T2 : AXSI4FR<"vlldm${p}\t$Rn, $regs", 1, 1>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLSTM : AXSI4FR<"vlstm${p}\t$Rn, $regs", 0, 0>,
             Requires<[HasV8MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV];
-    let Uses = [VPR, FPSCR, FPSCR_NZCV, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM];
+    let Uses = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
 // T1: assembly does not contain the register list.
@@ -367,8 +367,8 @@ def : InstAlias<"vlstm${p}\t$Rn", (VLSTM GPRnopc:$Rn, pred:$p, 0)>,
 // The register list has no effect on the encoding, it is for assembly/disassembly purposes only.
 def VLSTM_T2 : AXSI4FR<"vlstm${p}\t$Rn, $regs", 1, 0>,
             Requires<[HasV8_1MMainline, Has8MSecExt]> {
-    let Defs = [VPR, FPSCR, FPSCR_NZCV];
-    let Uses = [VPR, FPSCR, FPSCR_NZCV, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
+    let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM];
+    let Uses = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0,  D1,  D2,  D3,  D4,  D5,  D6,  D7,  D8,  D9,  D10, D11, D12, D13, D14, D15,
                                         D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29, D30, D31];
     let DecoderMethod = "DecodeLazyLoadStoreMul";
 }
@@ -435,14 +435,14 @@ def : VFP2MnemonicAlias<"fstmfdx", "fstmdbx">;
 // FP Binary Operations.
 //
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDD  : ADbI<0b11100, 0b11, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpALU64, "vadd", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fadd DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPALU64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDS  : ASbIn<0b11100, 0b11, 0, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpALU32, "vadd", ".f32\t$Sd, $Sn, $Sm",
@@ -453,21 +453,21 @@ def VADDS  : ASbIn<0b11100, 0b11, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VADDH  : AHbI<0b11100, 0b11, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpALU16, "vadd", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fadd (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPALU32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBD  : ADbI<0b11100, 0b11, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpALU64, "vsub", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fsub DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPALU64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBS  : ASbIn<0b11100, 0b11, 1, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpALU32, "vsub", ".f32\t$Sd, $Sn, $Sm",
@@ -478,42 +478,42 @@ def VSUBS  : ASbIn<0b11100, 0b11, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSUBH  : AHbI<0b11100, 0b11, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpALU16, "vsub", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fsub (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
             Sched<[WriteFPALU32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VDIVD  : ADbI<0b11101, 0b00, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpDIV64, "vdiv", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fdiv DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPDIV64]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VDIVS  : ASbI<0b11101, 0b00, 0, 0,
                   (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                   IIC_fpDIV32, "vdiv", ".f32\t$Sd, $Sn, $Sm",
                   [(set SPR:$Sd, (fdiv SPR:$Sn, SPR:$Sm))]>,
              Sched<[WriteFPDIV32]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM]  in
 def VDIVH  : AHbI<0b11101, 0b00, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpDIV16, "vdiv", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fdiv (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPDIV32]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULD  : ADbI<0b11100, 0b10, 0, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpMUL64, "vmul", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fmul DPR:$Dn, (f64 DPR:$Dm)))]>,
              Sched<[WriteFPMUL64, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULS  : ASbIn<0b11100, 0b10, 0, 0,
                    (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                    IIC_fpMUL32, "vmul", ".f32\t$Sd, $Sn, $Sm",
@@ -524,21 +524,21 @@ def VMULS  : ASbIn<0b11100, 0b10, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMULH  : AHbI<0b11100, 0b10, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpMUL16, "vmul", ".f16\t$Sd, $Sn, $Sm",
                   [(set (f16 HPR:$Sd), (fmul (f16 HPR:$Sn), (f16 HPR:$Sm)))]>,
              Sched<[WriteFPMUL32, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Dn = $Dd" in
+let TwoOperandAliasConstraint = "$Dn = $Dd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULD : ADbI<0b11100, 0b10, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
                   IIC_fpMUL64, "vnmul", ".f64\t$Dd, $Dn, $Dm",
                   [(set DPR:$Dd, (fneg (fmul DPR:$Dn, (f64 DPR:$Dm))))]>,
              Sched<[WriteFPMUL64, ReadFPMUL, ReadFPMUL]>;
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULS : ASbI<0b11100, 0b10, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
                   IIC_fpMUL32, "vnmul", ".f32\t$Sd, $Sn, $Sm",
@@ -549,7 +549,7 @@ def VNMULS : ASbI<0b11100, 0b10, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
-let TwoOperandAliasConstraint = "$Sn = $Sd" in
+let TwoOperandAliasConstraint = "$Sn = $Sd", mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMULH : AHbI<0b11100, 0b10, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
                   IIC_fpMUL16, "vnmul", ".f16\t$Sd, $Sn, $Sm",
@@ -621,7 +621,7 @@ def : Pat<(fmul (fneg SPR:$a), SPR:$b),
           (VNMULS SPR:$a, SPR:$b)>, Requires<[NoHonorSignDependentRounding]>;
 
 // These are encoded as unary instructions.
-let Defs = [FPSCR_NZCV] in {
+let Defs = [FPSCR_NZCV], mayRaiseFPException = 1, Uses = [FPSCR_RM] in {
 def VCMPED : ADuI<0b11101, 0b11, 0b0100, 0b11, 0,
                   (outs), (ins DPR:$Dd, DPR:$Dm),
                   IIC_fpCMP64, "vcmpe", ".f64\t$Dd, $Dm", "",
@@ -684,7 +684,7 @@ def VABSH  : AHuI<0b11101, 0b11, 0b0000, 0b11, 0,
                    IIC_fpUNA16, "vabs", ".f16\t$Sd, $Sm",
                    [(set (f16 HPR:$Sd), (fabs (f16 HPR:$Sm)))]>;
 
-let Defs = [FPSCR_NZCV] in {
+let Defs = [FPSCR_NZCV], mayRaiseFPException = 1, Uses = [FPSCR_RM] in {
 def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
                    (outs), (ins DPR:$Dd),
                    IIC_fpCMP64, "vcmpe", ".f64\t$Dd, #0", "",
@@ -742,6 +742,7 @@ def VCMPZH  : AHuI<0b11101, 0b11, 0b0101, 0b01, 0,
 }
 } // Defs = [FPSCR_NZCV]
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTDS  : ASuI<0b11101, 0b11, 0b0111, 0b11, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    IIC_fpCVTDS, "vcvt", ".f64.f32\t$Dd, $Sm", "",
@@ -762,6 +763,7 @@ def VCVTDS  : ASuI<0b11101, 0b11, 0b0111, 0b11, 0,
 }
 
 // Special case encoding: bits 11-8 is 0b1011.
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTSD  : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
                     IIC_fpCVTSD, "vcvt", ".f32.f64\t$Sd, $Dm", "",
                     [(set SPR:$Sd, (fpround DPR:$Dm))]>,
@@ -787,7 +789,7 @@ def VCVTSD  : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
 }
 
 // Between half, single and double-precision.
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
                  /* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm", "",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -799,7 +801,7 @@ def : FP16Pat<(f32 (fpextend (f16 HPR:$Sm))),
 def : FP16Pat<(f16_to_fp GPR:$a),
               (VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sda, SPR:$Sm),
                  /* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm", "$Sd = $Sda",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -821,7 +823,7 @@ def : FP16Pat<(insertelt (v4f16 DPR:$src1), (f16 (fpround (f32 SPR:$src2))), imm
                                              SPR:$src2),
                                     (SSubReg_f16_reg imm:$lane)))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),
                  /* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm", "",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -835,7 +837,7 @@ def : FP16Pat<(f32 (fpextend (extractelt (v4f16 DPR:$src), imm_odd:$lane))),
                 (v2f32 (COPY_TO_REGCLASS (v4f16 DPR:$src), DPR_VFP2)),
                 (SSubReg_f16_reg imm_odd:$lane)))>;
 
-let hasSideEffects = 0 in
+let hasSideEffects = 0, mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTSH: ASuI<0b11101, 0b11, 0b0011, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sda, SPR:$Sm),
                  /* FIXME */ IIC_fpCVTHS, "vcvtt", ".f16.f32\t$Sd, $Sm", "$Sd = $Sda",
                  [/* Intentionally left blank, see patterns below */]>,
@@ -853,6 +855,7 @@ def : FP16Pat<(insertelt (v4f16 DPR:$src1), (f16 (fpround (f32 SPR:$src2))), imm
                                              SPR:$src2),
                                     (SSubReg_f16_reg imm:$lane)))>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in 
 def VCVTBHD : ADuI<0b11101, 0b11, 0b0010, 0b01, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    NoItinerary, "vcvtb", ".f64.f16\t$Dd, $Sm", "",
@@ -876,6 +879,7 @@ def : FP16Pat<(f64 (f16_to_fp GPR:$a)),
               (VCVTBHD (COPY_TO_REGCLASS GPR:$a, SPR))>,
               Requires<[HasFPARMv8, HasDPVFP]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTBDH : ADuI<0b11101, 0b11, 0b0011, 0b01, 0,
                    (outs SPR:$Sd), (ins SPR:$Sda, DPR:$Dm),
                    NoItinerary, "vcvtb", ".f16.f64\t$Sd, $Dm", "$Sd = $Sda",
@@ -901,6 +905,7 @@ def : FP16Pat<(fp_to_f16 (f64 DPR:$a)),
               (i32 (COPY_TO_REGCLASS (VCVTBDH (IMPLICIT_DEF), DPR:$a), GPR))>,
                    Requires<[HasFPARMv8, HasDPVFP]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTHD : ADuI<0b11101, 0b11, 0b0010, 0b11, 0,
                    (outs DPR:$Dd), (ins SPR:$Sm),
                    NoItinerary, "vcvtt", ".f64.f16\t$Dd, $Sm", "",
@@ -915,6 +920,7 @@ def VCVTTHD : ADuI<0b11101, 0b11, 0b0010, 0b11, 0,
   let hasSideEffects = 0;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VCVTTDH : ADuI<0b11101, 0b11, 0b0011, 0b11, 0,
                    (outs SPR:$Sd), (ins SPR:$Sda, DPR:$Dm),
                    NoItinerary, "vcvtt", ".f16.f64\t$Sd, $Dm", "$Sd = $Sda",
@@ -1140,18 +1146,21 @@ defm VRINTN : vrint_inst_anpm<"n", 0b01, froundeven>;
 defm VRINTP : vrint_inst_anpm<"p", 0b10, fceil>;
 defm VRINTM : vrint_inst_anpm<"m", 0b11, ffloor>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTD : ADuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs DPR:$Dd), (ins DPR:$Dm),
                   IIC_fpSQRT64, "vsqrt", ".f64\t$Dd, $Dm", "",
                   [(set DPR:$Dd, (fsqrt (f64 DPR:$Dm)))]>,
              Sched<[WriteFPSQRT64]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTS : ASuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs SPR:$Sd), (ins SPR:$Sm),
                   IIC_fpSQRT32, "vsqrt", ".f32\t$Sd, $Sm", "",
                   [(set SPR:$Sd, (fsqrt SPR:$Sm))]>,
              Sched<[WriteFPSQRT32]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VSQRTH : AHuI<0b11101, 0b11, 0b0001, 0b11, 0,
                   (outs HPR:$Sd), (ins HPR:$Sm),
                   IIC_fpSQRT16, "vsqrt", ".f16\t$Sd, $Sm",
@@ -1757,7 +1766,7 @@ def : VFPPat<(i32 (fp_to_uint_sat (f16 HPR:$a), i32)),
              (COPY_TO_REGCLASS (VTOUIZH (f16 HPR:$a)), GPR)>;
 
 // And the Z bit '0' variants, i.e. use the rounding mode specified by FPSCR.
-let Uses = [FPSCR] in {
+let Uses = [FPSCR_RM] in {
 def VTOSIRD : AVConv1IsD_Encode<0b11101, 0b11, 0b1101, 0b1011,
                                 (outs SPR:$Sd), (ins DPR:$Dm),
                                 IIC_fpCVTDI, "vcvtr", ".s32.f64\t$Sd, $Dm",
@@ -2029,6 +2038,7 @@ def VULTOD : AVConv1XInsD_Encode<0b11101, 0b11, 0b1011, 0b1011, 1,
 } // End of 'let Constraints = "$a = $dst" in'
 
 // BFloat16  - Single precision, unary, predicated
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 class BF16_VCVT<string opc, bits<2> op7_6>
    : VFPAI<(outs SPR:$Sd), (ins SPR:$dst, SPR:$Sm),
            VFPUnaryFrm, NoItinerary,
@@ -2063,6 +2073,7 @@ def BF16_VCVTT : BF16_VCVT<"vcvtt", 0b11>;
 // FP Multiply-Accumulate Operations.
 //
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAD : ADbI<0b11100, 0b00, 0, 0,
                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                  IIC_fpMAC64, "vmla", ".f64\t$Dd, $Dn, $Dm",
@@ -2072,6 +2083,7 @@ def VMLAD : ADbI<0b11100, 0b00, 0, 0,
               Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
               Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAS : ASbIn<0b11100, 0b00, 0, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vmla", ".f32\t$Sd, $Sn, $Sm",
@@ -2085,6 +2097,7 @@ def VMLAS : ASbIn<0b11100, 0b00, 0, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLAH : AHbI<0b11100, 0b00, 0, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vmla", ".f16\t$Sd, $Sn, $Sm",
@@ -2104,6 +2117,7 @@ def : Pat<(fadd_mlx HPR:$dstin, (fmul_su (f16 HPR:$a), HPR:$b)),
           Requires<[HasFullFP16,DontUseNEONForFP, UseFPVMLx]>;
 
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSD : ADbI<0b11100, 0b00, 1, 0,
                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                  IIC_fpMAC64, "vmls", ".f64\t$Dd, $Dn, $Dm",
@@ -2113,6 +2127,7 @@ def VMLSD : ADbI<0b11100, 0b00, 1, 0,
               Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
               Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSS : ASbIn<0b11100, 0b00, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vmls", ".f32\t$Sd, $Sn, $Sm",
@@ -2126,6 +2141,7 @@ def VMLSS : ASbIn<0b11100, 0b00, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VMLSH : AHbI<0b11100, 0b00, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vmls", ".f16\t$Sd, $Sn, $Sm",
@@ -2144,6 +2160,7 @@ def : Pat<(fsub_mlx HPR:$dstin, (fmul_su (f16 HPR:$a), HPR:$b)),
           (VMLSH HPR:$dstin, (f16 HPR:$a), HPR:$b)>,
           Requires<[HasFullFP16,DontUseNEONForFP,UseFPVMLx]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAD : ADbI<0b11100, 0b01, 1, 0,
                   (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
                   IIC_fpMAC64, "vnmla", ".f64\t$Dd, $Dn, $Dm",
@@ -2153,6 +2170,7 @@ def VNMLAD : ADbI<0b11100, 0b01, 1, 0,
                 Requires<[HasVFP2,HasDPVFP,UseFPVMLx]>,
                 Sched<[WriteFPMAC64, ReadFPMAC, ReadFPMUL, ReadFPMUL]>;
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAS : ASbI<0b11100, 0b01, 1, 0,
                   (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
                   IIC_fpMAC32, "vnmla", ".f32\t$Sd, $Sn, $Sm",
@@ -2166,6 +2184,7 @@ def VNMLAS : ASbI<0b11100, 0b01, 1, 0,
   let D = VFPNeonA8Domain;
 }
 
+let mayRaiseFPException = 1, Uses = [FPSCR_RM] in
 def VNMLAH : AHbI<0b11100, 0b01, 1, 0,
                   (outs HPR:$Sd), (ins HPR:$Sdin, HPR:$Sn, HPR:$Sm),
                   IIC_fpMAC16, "vnmla", ".f16\t$Sd, $Sn, $Sm",
@@ -2196,6 +2215,7 @@ def : Pat<(fsub_mlx (fneg HPR:$dstin), (fmul_su (f16 HPR:$a), HPR:$b)),
           (VNMLAH HPR:$dstin, (f16 HP...
[truncated]

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the list and this looks good (baring mve and neon, which we will need to do later). Should we add these VFP instructions too:

VCVTASD
VCVTASH
VCVTASS
VCVTAUD
VCVTAUH
VCVTAUS
VCVTMSD
VCVTMSH
VCVTMSS
VCVTMUD
VCVTMUH
VCVTMUS
VCVTNSD
VCVTNSH
VCVTNSS
VCVTNUD
VCVTNUH
VCVTNUS
VCVTPSD
VCVTPSH
VCVTPSS
VCVTPUD
VCVTPUH
VCVTPUS
VFP_VMAXNMD
VFP_VMAXNMH
VFP_VMAXNMS
VFP_VMINNMD
VFP_VMINNMH
VFP_VMINNMS
VJCVT
VRINTAD
VRINTAH
VRINTAS
VRINTMD
VRINTMH
VRINTMS
VRINTND
VRINTNH
VRINTNS
VRINTPD
VRINTPH
VRINTPS
VRINTRD
VRINTRH
VRINTRS
VRINTXD
VRINTXH
VRINTXS
VRINTZD
VRINTZH
VRINTZS
VSHTOD
VSHTOH
VSHTOS
VSITOD
VSITOH
VSITOS
VSLTOD
VSLTOH
VSLTOS
VTOSHD
VTOSHH
VTOSHS
VTOSIRD
VTOSIRH
VTOSIRS
VTOSIZD
VTOSIZH
VTOSIZS
VTOSLD
VTOSLH
VTOSLS
VTOUHD
VTOUHH
VTOUHS
VTOUIRD
VTOUIRH
VTOUIRS
VTOUIZD
VTOUIZH
VTOUIZS
VTOULD
VTOULH
VTOULS
VUHTOD
VUHTOH
VUHTOS
VUITOD
VUITOH
VUITOS
VULTOD
VULTOH
VULTOS

def VLLDM : AXSI4FR<"vlldm${p}\t$Rn, $regs", 0, 1>,
Requires<[HasV8MMainline, Has8MSecExt]> {
let Defs = [VPR, FPSCR, FPSCR_NZCV, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
let Defs = [VPR, FPSCR, FPSCR_NZCV, FPSCR_RM, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be added? I was hoping it would be covered by FPSCR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

@Varnike
Copy link
Author

Varnike commented Oct 2, 2025

Should we add these VFP instructions too:

Added mayRaiseFPException and marked appropriate instructions from the list as using rounding bits.

@Varnike Varnike requested a review from davemgreen October 3, 2025 11:46
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I looked through again, do we need to add these too?

VJCVT
VRINTAD
VRINTAH
VRINTAS
VRINTMD
VRINTMH
VRINTMS
VRINTND
VRINTNH
VRINTNS
VRINTPD
VRINTPH
VRINTPS
VRINTRD
VRINTRH
VRINTRS
VRINTZD
VRINTZH
VRINTZS
VSHTOD
VSHTOH
VSHTOS
VSITOD
VSITOH
VSITOS
VSLTOD
VSLTOH
VSLTOS
VUHTOD
VUHTOH
VUHTOS
VUITOD
VUITOH
VUITOS
VULTOD
VULTOH
VULTOS

Otherwise this looks OK to me.

@Varnike
Copy link
Author

Varnike commented Oct 5, 2025

As far as I've seen from their description, these instructions don't use rounding mode bits and don't raise fp exceptions. For this reason, I didn't change them in the commit.

@davemgreen
Copy link
Collaborator

Oh OK, I see because of the exact flag. For the fcvts too? It seems they would set exceptions on nan inputs.

@Varnike
Copy link
Author

Varnike commented Oct 6, 2025

I’ve gone through the documentation again, and you’re right. Thanks for pointing that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants