Skip to content

Commit 31aa4ed

Browse files
committed
Squashed 'lib/simde/simde/' changes from cbef1c15..59f77984
59f77984 arm neon: Add float16 multi-vectors to native aliases 4b279d62 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 was fixed in GCC 15.x 5c8f50ec https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782 was fixed in GCC 13 677f2cbe Avoid undefined behaviour with signed integer multiplication (#1296) 2096f755 arm64 gcc FRINT: skip native call on GCC 3ea33047 x86 sse2 for loongarch: fix GCC build failure (#1287) a532a12c riscv64: Fallback to autovec without mrvv-vector-bits flag. (#1282) 85632ca8 arm neon riscv64: add min.h and max.h RVV implementations. (#1283) ca1e942d neon riscv64: Enable RVV segment load/store only when we have `__riscv_zvlsseg` flag. (#1285) cf8e6a73 riscv64: Enable V feature when both zve64d and zvl128b are present (#1284) c7f26b73 x86 avx for loongarch: use vfcmp_clt to save one instruction in `_mm_cmp_{sd,ss}` and `_mm256_cmp_pd` a8ae10d9 x86 sse2,avx2 loongarch impl: let compiler to generate instructions based on imm8 bb0282e3 x86 misc fixes for AVX512{F,VL}_NATIVE d458d8fd x86 sse2,sse3, avx: silence some false-positive warnings about unitialized structs 4184e0d4 start preparing to release SIMDe 0.8.4 87ecd64a x86 sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high ca9449c1 Fix incorrect UQRSHL implementation. 8d90b041 arm neon: fix `cmla{_rot{90,180,270},}_lane` with correct test-suite on ARMv8.3 system 500454a2 arm neon: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. 02ba9222 arm neon gcc-12 FRINT workaround 02d81577 arm neon FCMLA with 16-bit floats, requires the FP16 feature 8caaee79 arm neon: FRINT{32,64}{X,Z} native calls require ARMv8.5 438ddcff remove extraneous semicolons from many macro-defined functions 9f73373f wasm simd128: fix a FAST_NANS error on arm64 0bd19a99 Fix vqdmulhs_s32 native alias. 62f40d4b x86 avx2: small fixes for loongarch d656b4d7 x86 sse2: small fixes for loongarch 8f56d4ff Remove incorrect qrdmulh SSE code. 8c421df1 arm neon: define native alias only under the inverse of the conditions of a pass-through 25e70ce7 simde-aes: gcc 13.2+ ignore unused variable warnings 69c9cd5c arm neon qdmlal: fix saturation (#1194) 34136823 Fix vqshlud_n_s64 implementation to be 64-bit. 483a4bcc Fix qdmlsl instructions f275fffd arm neon qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) d95bd9d7 arm neon qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 4b900704 x86 sse2: fix `_mm_pause` for RISCV systems 0be41ec7 risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions 70fc574b arm: Rename ARM ROL/ROR functions with a SIMDE prefix. a39bd6dd arm neon sli_n: Fix invalid shift warnings (#1253) 7bd2bb70 arm neon `_vext_p6`: reverse logic to avoid GCC14 i586 bug (#1251) b4bf72e1 x86 clmul simde_x_bitreverse_u64: add loongarch implementation (#1249) 04f9b4ca x86 avx: reoptimized simde_mm256_addsub_ps/d with lasx 54d35298 x86 fma: add loongarch lasx optimized implementations adefb8dc x86 f16c: add loongarch lasx optimized implementations b720dcb7 x86 avx512f: added fmaddsub implementation (#1246) 5c9f6aa1 x86 sse4.2: add loongarch lsx optimized implementations 78370371 x86 sse4.1: add loongarch lsx optimized implementations 0bfc2312 x86 ssse3: add loongarch lsx optimized implementations fcae0eee x86 sse3: add loongarch lsx optimized implementations af646726 x86 sse: add loongarch lsx optimized implementations 2ad64c9f x86 avx2: add loongarch lasx optimized implementations (#1241) 5cae2261 x86 avx: add loongarch lasx optimized implementations (#1239) 484fcce2 x86 avx: use INT64_C when the destination is i64 (#1238) 5e225b1c loongarch: add lsx support for sse2.h 665d7f93 fix clang type redef error b0fcc617 Whoops, missing comma fe262fb0 loongarch float16: use a portable version to avoid compilation errors 1a09d3bc x86: move definition of 'value' to correct branch in _mm_loadl_epi64 aac58332 x86: some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ d1afb3db arm crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 592f8f0c _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes de4337e8 gcc-14 -O3 complained about some possible unitialized values 8b0937a3 neon/cvz z/Arch: stop using deprecated functions. e18dcd7d arm neon: avoid GCC 11 vst1_*_x4 built-in functions 848fb777 arm neon: fix arm64 gcc11 build excess elements in vector failure 0aaf7829 x86/sse: Fix type convert error for LSX. 29c96207 arm wasm: add vst2_u8 translation to Wasm SIMD 375ad48f arm wasm: add vshll translations to Wasm SIMD d5697fa9 arm wasm: add vst4_u8 translation to Wasm SIMD e235b2eb math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF cb4b08c4 wasm AltiVec: add u16x8 and u8x16 avgr translations 90237cab wasm NEON: add u16x8 and u8x16 avgr translations 6050906e arm neon vminnmv_f16: remove duplicate statement (#1208) a3d20d14 x86 wasm: Wasm SIMD version of `_mm_sad_epu8` 32650204 msvc: add simde_MemoryBarrier to avoid including <windows.h> 7ca5a3e0 x86/fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) 2ec1f51f pow: consistently use simde_math_pow 80f65573 x86: remove redundant mm_add_pd translation for WASM (#1190) 249b9dc0 arm/neon riscv64: additional RVV implementations - part 2. (#1189) 408d06a3 arm/neon riscv64: additional RVV implementations - part1 (#1188) da5cf1f5 Use _Float16 in C++ on aarch64 with GCC 13+ 39f436a9 Don't use _Float16 on non-SSE2 x86 985c2710 Don't use _Float16 on s390x 78783046 x86: Apply half tabular method in _mm_crc32 family d8a0c764 arm: improve performance in vqadd and vmvn in risc-v 99c63a42 neon: avoid warnings when "__ARM_NEON_FP" is not defined. e98cbcc7 start next development cycle: v0.8.3 3442dbf2 prepare to release 0.8.0 e6afb7be arm neon: Fully remove the problematic FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics fb73a318 arm: improve performance in vabd_xxx for risc-v 8a4ff7a8 arm: improve performance in vhadd_xxx for risc-v 52f1087a arm: Add neon2rvv support in vand series intrinsics 737e3b33 arm: fix some neon2rvv intrinsic function error 5242a77d arm: enable more intrinsic function for armv7 8f123e5c wasm x86 impl: some were incorrectly marked SSE instead of SSE2 2b9b0126 arm x86 implementations: allow _m128 access from SSE 6679ff01 svml: SSE is good enough for native m128i and m128d types & functions 68aac3b9 sse2 MSVC `_mm_pause` implementaiton for x86 e76f4331 typo fixes from codespell 73160356 x86 xop: fix some native functions 4ecf271b emscripten; use `__builtin_roundeven{f,}` from version 3.1.43 onwards 347e2b69 arm 32 bits: native def fixes; workarounds for gcc 61d1addc apple clang arm64: ignore SHA2 b5835922 arm platform: cleanup feature detection. e38f2568 arm neon sm3: check constant range ac2b229a arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 1d7848cf arm neon clang: skip vrnd native before clang v18 bb11054b clang: detect versions 18 & 19 647bb87d Initial Support for the RISC-V Vector Extension in ARM NEON (#1130) 83479bd7 start next development cycle: v0.8.1 22a493c2 arm/neon abs: negating INT_MIN is undefined behavior 453dec20 simde-detect-clang.h: add clang 17 detection (#1132) e6fab129 Update simde-detect-clang.h (#1131) e29a4fab typo: XCode -> Xcode (#1129) 8392c69a Improve performance of simde_mm512_add_epi32 (#1126) ddaab375 neon {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 8e9d432a correction of simde_mm256_sign_epi{8,16,32}. (#1123) 43ec909b avx512 abs: refine GCC compiler checks for `_mm512{,_mask}_abs_pd` (#1118) 24be11d0 gh-actions: test mips64el using qemu on gcc12/clang16 f0bd155c wasm relaxed: add f{32x4,64x2}_relaxed_{min,max} 459abf9f wasm simd128/relaxed: begin MIPS implementations ffe050ce wasm relaxed: updated names; reordered FMA operations 762d7ad2 wasm: detect support for Relaxed SIMD mode e96949e3 prepare to release 0.8.0 f73d72e4 NEON: implement all bf16-related intrinsics (#1110) 72f6d30f neon: add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 d6271b3f NEON: implement all intrinsics supported by architecture A64-remaining part (#1093) 7904fc3c sse2 mm_pause: more archs, add a basic test 260adca5 arm neon ld2: silence warnings at -O3 on gcc risc-v d1578a0c simde_float16: prefer __fp16 if available 064b8049 svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl 790e8d6c fp16: don't use _Float16 on ClangCL if not supported 87f7d331 neon: Modified simde_float16 to simde_float16_t (#1100) 2c58d6d0 Reuse unoptimized implementations of vaesimcq_u8 from x86 be7e377c [NEON] Add AES instructions. a686efde x86 sse4.1 mm_testz_si128: fix backwards short circuit logic bc206e4f wasm f{32x4,64x2}_min: add workaround for a gcc<6 issue f2e82c96 x86 pclmul: fix natives, some require VPCLMULQDQ 0adef454 avx512 gather: add MSVC native fallbacks ecc46929 avx512 set: add simde_x_mm512_set_m256{,d} 70f70262 NEON: part 1 of implement all intrinsics supported by architecture A64 (#1090) aefb342e avx512 types: avoid using native AVX512 types on MSVC unless required 833a8775 svml: enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ db326c75 sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 3f0321ba sse _mm_movemask_ps: remove unused code 4c7c7721 gh-actions: test with clang-16 87cf105a neon/st1{,q}_*_x{2,3,4}: initial implementation (#1082) 5634cec0 NEON: more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) cfd91723 arm neon: Complex operations from Armv8.3-a (#1077) 389f360a arm aes: add neon implementation using the crypto extension 4adb6591 arm: use SIMDE_ARCH_ARM_FMA faeb00a7 avx512: fix many native aliases 98eb64b8 sse: implement _mm_movelh_ps for Arm64 57197e8d aes: initial implementation of most aes instructions (#1072) 2fbc6339 NEON: Implement some f16XN types and f16 related intrinsics. (#1071) 64f94c68 avx: simde_mm256_shuffle_pd fix for natural vector size < 128 d4140899 Add workaround for GCC bug 111609 66567604 Extend constant range in simde_vshll_n_XXX intrinsics (#1064) 33c4480d Remove non-working MMX specialization from simde_vmin_s16 6f4afd63 Fix issues related to MXCSR register (#1060) 82be3395 fix SIMDE_ARCH_X86_SSE4_2 define 38580983 riscv64 clang: doesn't support _Float16 or __fp16 properly yet a39f2c3b avx512/shuffle: mm512_{shuffle_epi32,shuffle{hi,lo}_epi16} a202d011 avx512/gather: mm512_{mask_,}i64gather_{epi32,epi64,ps,pd) 95a6d081 avx512 new families started: gather/reduce + other additional funcs ef893128 avx512 cmp,cvt,cvts,cvtt,cvtus,gather,kand,permutex,rcp: new ops for intgemm 4a29d21f avx512: start supporting AVX512FP16 / m512h f686d38f clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 7760aabd GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 436dd4cc GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 84311230 GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ e140ac4e x86/avx512 fpclass: improve fallback implementation 5950c402 gh-actions: re-order ccache; add old clang/gcc versions faf22893 avx512/loadu: fix native detection b3341922 simde-f16: improve _Float16 usage; better INFHF/NANHF defs 5e632b09 avx512: naive implementation of fpclass b71b58c2 [NEON/A32V7]: Don't trust clang for load multiple on A32V7 c5de4d09 neon: Add qtbl/qtbx polyfills for A32V7 3bda0d7c neon/cvtn: vcvtnq_u32_f32 is a V8 function 73910b60 msa neon impl: float64x2_t is not avail in A32V7 0540d7fc clang aarch64: optimization bug 45541 was fixed in clang-15 d315aac7 clmul: aarch64 clang has difficulties with poly64x1_t a2eeb9ef sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 e676d998 clang powerpc: vec_bperm bug was fixed in clang-14 0e3290e8 neon/st1: disable last remaining AltiVec implementation db0649e1 wasm simd128: more powerpc fixes bbdb2a1f sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC c6b6ac50 wasm/simd128: add missing unsigned functions 78faeab1 wasm/simd128: fix altivec_p7 version of wasm_f64x2_pmin 1f359106 We are in a dev period again: v0.7.7 9135bd04 neon/cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations 6a1db3a5 neon/cvtn: basic implementation of a few functions 1cf65cb0 mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 4ab8749d sse{,2,3,4.1},avx: more WASM shuffle implementations b49fa29d avx512: arghhh: really fix typedef of __mmask64 6244ab92 avx512: typo fix for typedef of __mmask64 20c5200d avx512/madd: fix native alias arguments for _mm512_madd_epi16 cc476f36 neon/qabs: restore SSE2 impl for vqabsq_s8 a7682611 neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations ca523adb sse: allow native _mm_loadh_pi on MSVC x64 ac526659 test: appease GCC 5.x & clang 01ea9a8d start release process for 0.7.6 28a6001f x86/sse*,avx: add additional SIMD128 implementations aca2f0ae neon/shl,rshl: fix avx include to unbreak amalgamated hearders f60a9d8d neon/mla_lane: initial implementation using mla+dup f982cfd5 Update clang version detection for 14..16 and add link b45a14cc simde-arch: include hedley for setting F16C for MSVC 2022+ with AVX2 3ce91d4c 0.7.5 dev cycle on the road to 0.7.6/0.8.0 02c7a67e sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH b0b370a4 x86/sse: Add LoongArch LSX support 2338f175 arch: Add LoongArch LASX/LSX support 90d95fae avx512: define __mask64 & __mask32 if not yet defined 42a43fa5 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 20f98da6 sve/whilelt: correct type-o in __mmask32 initialization 47a1500f sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 cd93fcc9 avx512/knot,kxor: native calls not availabe on MSVC 2017 ba6324b6 avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 2f6fe9c6 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 91fda2cc axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions a397b74b __builtin_signbit: add cast to double for old Clang versions e016050b clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 7e353c00 Wasm q15mulr_sat_s: match Wasm spec ce375861 Wasm f32/f64 nearest: match Wasm spec 96d5e034 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec 5676a1ba Wasm f32/f64 abs: match Wasm spec aa299c08 Wasm f32/f64 max: match Wasm spec 433d2b95 Wasm f32/f64 min: match Wasm spec cf1ac40b avx{,2}: some intrinsics are missing from older MSVC versions bff9b1b3 simd128: move unary minus to appease msvc native arm64 efc512a4 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 091250e8 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC 4b305360 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 2dedbd9b skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX a04ea7bc f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph e8ee041a ci appveyor: build tests with AVX{,2}, but don't run them 2188c972 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 186f12f1 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} 6a40fdeb arm/neon/rnd: use correct SVML function for simde_vrndq_f64 9a0705b0 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE c298a7ec msvc avx512/roundscale_round: quiet a false positive warning 01d9c5de sse: remove errant MMX requirement from simde_mm_movemask_ps c675aa08 x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 097af509 msvc 2022: enable F16C if AVX2 present 91cd7b64 avx{,2}: fix maskload illegal mem access 2caa25b8 Fixed simde_mm_prefetch warnings 96bdf523 Fixed parameters to _mm_clflush 4d560e41 emscripten; don't use __builtin_roundeven{f,} even if defined 511a01e7 avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 a22b63dc avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 3d87469f wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type 56ca5bd8 Suppress min/max macro definitions from windows.h f2cea4d3 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 3698cef9 neon/cvt: clang bug 46844 was fixed in clang 12.0 9369cea4 simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} ce27bd09 gcc power: vec_cpsgn argument reversal fixed in 12.0 20fd5b94 gcc power: bugs 1007[012] fixed in GCC 12.1 5e25de13 gcc sse2: bug 99754 was fixed in GCC 12.1 e6979602 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error 359c3ff4 clang wasm simde: add workaround to fix wasm_i64x2_shl bug b767f5ed arm/neon: workaround on ARM64 windows bug 599b1fbf mips/msa: fix for Windows ARM64 c6f4821e arm64 windows: fix simd128.h build error 782e7c73 prepare to release 0.7.4 6e9ac245 fix A32V7 version of _mm_test{nz,}c_si128 776f7a69 test with Debian default flags, also for armel a240d951 x86: fix AVX native → SSE4.2 native 5a73c2ce _mm_insert_ps: incorrect handling of the control 597a1c9e neon/ld1[q]_*_x2: initial implementation 4550faea wasm: f32x4 and f64x2 nearest roundeven 5e068645 Add missing `static const` in simde-math.h. NFC da02f2ce avx512/setzero: fix native aliases 89762e11 Fixed FMA detection macro on msvc b0fda5cf avx512/load_pd: initial implementation a61af077 avx512/load_ps: initial implementation 4126bde0 Properly map __mm functions to __simde_mm 2e76b7a6 neon ld2: gcc-12 fixes 604a53de fix wrong size e5e085ff AVX: add native calls for _mm256_insertf128_{pd,ps,si256} ee3bd005 aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a060c461 wasm: load lane memcpy instead of cast to address UBSAN issues git-subtree-dir: lib/simde/simde git-subtree-split: 59f779845beab6a281efcd87076236037be6033b
1 parent 845e09b commit 31aa4ed

File tree

366 files changed

+63566
-4013
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

366 files changed

+63566
-4013
lines changed

arm/neon.h

Lines changed: 113 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
*
2323
* Copyright:
2424
* 2020 Evan Nemerson <[email protected]>
25+
* 2023 Yi-Yen Chung <[email protected]> (Copyright owned by Andes Technology)
2526
*/
2627

2728
#if !defined(SIMDE_ARM_NEON_H)
@@ -30,23 +31,32 @@
3031
#include "neon/types.h"
3132

3233
#include "neon/aba.h"
34+
#include "neon/abal.h"
35+
#include "neon/abal_high.h"
3336
#include "neon/abd.h"
3437
#include "neon/abdl.h"
38+
#include "neon/abdl_high.h"
3539
#include "neon/abs.h"
3640
#include "neon/add.h"
3741
#include "neon/addhn.h"
42+
#include "neon/addhn_high.h"
3843
#include "neon/addl.h"
3944
#include "neon/addlv.h"
4045
#include "neon/addl_high.h"
4146
#include "neon/addv.h"
4247
#include "neon/addw.h"
4348
#include "neon/addw_high.h"
49+
#include "neon/aes.h"
4450
#include "neon/and.h"
4551
#include "neon/bcax.h"
4652
#include "neon/bic.h"
4753
#include "neon/bsl.h"
54+
#include "neon/cadd_rot270.h"
55+
#include "neon/cadd_rot90.h"
4856
#include "neon/cage.h"
4957
#include "neon/cagt.h"
58+
#include "neon/cale.h"
59+
#include "neon/calt.h"
5060
#include "neon/ceq.h"
5161
#include "neon/ceqz.h"
5262
#include "neon/cge.h"
@@ -60,13 +70,24 @@
6070
#include "neon/cltz.h"
6171
#include "neon/clz.h"
6272
#include "neon/cmla.h"
63-
#include "neon/cmla_rot90.h"
73+
#include "neon/cmla_lane.h"
6474
#include "neon/cmla_rot180.h"
75+
#include "neon/cmla_rot180_lane.h"
6576
#include "neon/cmla_rot270.h"
77+
#include "neon/cmla_rot270_lane.h"
78+
#include "neon/cmla_rot90.h"
79+
#include "neon/cmla_rot90_lane.h"
6680
#include "neon/cnt.h"
6781
#include "neon/cvt.h"
82+
#include "neon/cvt_n.h"
83+
#include "neon/cvtm.h"
84+
#include "neon/cvtn.h"
85+
#include "neon/cvtp.h"
6886
#include "neon/combine.h"
87+
#include "neon/copy_lane.h"
88+
#include "neon/crc32.h"
6989
#include "neon/create.h"
90+
#include "neon/div.h"
7091
#include "neon/dot.h"
7192
#include "neon/dot_lane.h"
7293
#include "neon/dup_lane.h"
@@ -76,6 +97,11 @@
7697
#include "neon/fma.h"
7798
#include "neon/fma_lane.h"
7899
#include "neon/fma_n.h"
100+
#include "neon/fmlal.h"
101+
#include "neon/fmlsl.h"
102+
#include "neon/fms.h"
103+
#include "neon/fms_lane.h"
104+
#include "neon/fms_n.h"
79105
#include "neon/get_high.h"
80106
#include "neon/get_lane.h"
81107
#include "neon/get_low.h"
@@ -84,30 +110,48 @@
84110
#include "neon/ld1.h"
85111
#include "neon/ld1_dup.h"
86112
#include "neon/ld1_lane.h"
113+
#include "neon/ld1_x2.h"
114+
#include "neon/ld1_x3.h"
115+
#include "neon/ld1_x4.h"
116+
#include "neon/ld1q_x2.h"
117+
#include "neon/ld1q_x3.h"
118+
#include "neon/ld1q_x4.h"
87119
#include "neon/ld2.h"
120+
#include "neon/ld2_dup.h"
121+
#include "neon/ld2_lane.h"
88122
#include "neon/ld3.h"
123+
#include "neon/ld3_dup.h"
124+
#include "neon/ld3_lane.h"
89125
#include "neon/ld4.h"
126+
#include "neon/ld4_dup.h"
90127
#include "neon/ld4_lane.h"
91128
#include "neon/max.h"
92129
#include "neon/maxnm.h"
130+
#include "neon/maxnmv.h"
93131
#include "neon/maxv.h"
94132
#include "neon/min.h"
95133
#include "neon/minnm.h"
134+
#include "neon/minnmv.h"
96135
#include "neon/minv.h"
97136
#include "neon/mla.h"
137+
#include "neon/mla_lane.h"
98138
#include "neon/mla_n.h"
99139
#include "neon/mlal.h"
100140
#include "neon/mlal_high.h"
141+
#include "neon/mlal_high_lane.h"
101142
#include "neon/mlal_high_n.h"
102143
#include "neon/mlal_lane.h"
103144
#include "neon/mlal_n.h"
104145
#include "neon/mls.h"
146+
#include "neon/mls_lane.h"
105147
#include "neon/mls_n.h"
106148
#include "neon/mlsl.h"
107149
#include "neon/mlsl_high.h"
150+
#include "neon/mlsl_high_lane.h"
108151
#include "neon/mlsl_high_n.h"
109152
#include "neon/mlsl_lane.h"
110153
#include "neon/mlsl_n.h"
154+
#include "neon/mmlaq.h"
111155
#include "neon/movl.h"
112156
#include "neon/movl_high.h"
113157
#include "neon/movn.h"
@@ -117,8 +161,13 @@
117161
#include "neon/mul_n.h"
118162
#include "neon/mull.h"
119163
#include "neon/mull_high.h"
164+
#include "neon/mull_high_lane.h"
165+
#include "neon/mull_high_n.h"
120166
#include "neon/mull_lane.h"
121167
#include "neon/mull_n.h"
168+
#include "neon/mulx.h"
169+
#include "neon/mulx_lane.h"
170+
#include "neon/mulx_n.h"
122171
#include "neon/mvn.h"
123172
#include "neon/neg.h"
124173
#include "neon/orn.h"
@@ -127,59 +176,117 @@
127176
#include "neon/padd.h"
128177
#include "neon/paddl.h"
129178
#include "neon/pmax.h"
179+
#include "neon/pmaxnm.h"
130180
#include "neon/pmin.h"
181+
#include "neon/pminnm.h"
131182
#include "neon/qabs.h"
132183
#include "neon/qadd.h"
184+
#include "neon/qdmlal.h"
185+
#include "neon/qdmlal_high.h"
186+
#include "neon/qdmlal_high_lane.h"
187+
#include "neon/qdmlal_high_n.h"
188+
#include "neon/qdmlal_lane.h"
189+
#include "neon/qdmlal_n.h"
190+
#include "neon/qdmlsl.h"
191+
#include "neon/qdmlsl_high.h"
192+
#include "neon/qdmlsl_high_lane.h"
193+
#include "neon/qdmlsl_high_n.h"
194+
#include "neon/qdmlsl_lane.h"
195+
#include "neon/qdmlsl_n.h"
133196
#include "neon/qdmulh.h"
134197
#include "neon/qdmulh_lane.h"
135198
#include "neon/qdmulh_n.h"
136199
#include "neon/qdmull.h"
200+
#include "neon/qdmull_high.h"
201+
#include "neon/qdmull_high_lane.h"
202+
#include "neon/qdmull_high_n.h"
203+
#include "neon/qdmull_lane.h"
204+
#include "neon/qdmull_n.h"
205+
#include "neon/qrdmlah.h"
206+
#include "neon/qrdmlah_lane.h"
207+
#include "neon/qrdmlsh.h"
208+
#include "neon/qrdmlsh_lane.h"
137209
#include "neon/qrdmulh.h"
138210
#include "neon/qrdmulh_lane.h"
139211
#include "neon/qrdmulh_n.h"
212+
#include "neon/qrshl.h"
213+
#include "neon/qrshrn_high_n.h"
140214
#include "neon/qrshrn_n.h"
215+
#include "neon/qrshrun_high_n.h"
141216
#include "neon/qrshrun_n.h"
142217
#include "neon/qmovn.h"
143-
#include "neon/qmovun.h"
144218
#include "neon/qmovn_high.h"
219+
#include "neon/qmovun.h"
220+
#include "neon/qmovun_high.h"
145221
#include "neon/qneg.h"
146222
#include "neon/qsub.h"
147223
#include "neon/qshl.h"
224+
#include "neon/qshl_n.h"
148225
#include "neon/qshlu_n.h"
226+
#include "neon/qshrn_high_n.h"
149227
#include "neon/qshrn_n.h"
228+
#include "neon/qshrun_high_n.h"
150229
#include "neon/qshrun_n.h"
151230
#include "neon/qtbl.h"
152231
#include "neon/qtbx.h"
232+
#include "neon/raddhn.h"
233+
#include "neon/raddhn_high.h"
234+
#include "neon/rax.h"
153235
#include "neon/rbit.h"
154236
#include "neon/recpe.h"
155237
#include "neon/recps.h"
238+
#include "neon/recpx.h"
156239
#include "neon/reinterpret.h"
157240
#include "neon/rev16.h"
158241
#include "neon/rev32.h"
159242
#include "neon/rev64.h"
160243
#include "neon/rhadd.h"
161244
#include "neon/rnd.h"
245+
#include "neon/rnd32x.h"
246+
#include "neon/rnd32z.h"
247+
#include "neon/rnd64x.h"
248+
#include "neon/rnd64z.h"
249+
#include "neon/rnda.h"
162250
#include "neon/rndm.h"
163251
#include "neon/rndi.h"
164252
#include "neon/rndn.h"
165253
#include "neon/rndp.h"
254+
#include "neon/rndx.h"
166255
#include "neon/rshl.h"
167256
#include "neon/rshr_n.h"
257+
#include "neon/rshrn_high_n.h"
168258
#include "neon/rshrn_n.h"
169259
#include "neon/rsqrte.h"
170260
#include "neon/rsqrts.h"
171261
#include "neon/rsra_n.h"
262+
#include "neon/rsubhn.h"
263+
#include "neon/rsubhn_high.h"
172264
#include "neon/set_lane.h"
265+
#include "neon/sha1.h"
266+
#include "neon/sha256.h"
267+
#include "neon/sha512.h"
173268
#include "neon/shl.h"
174269
#include "neon/shl_n.h"
270+
#include "neon/shll_high_n.h"
175271
#include "neon/shll_n.h"
176272
#include "neon/shr_n.h"
273+
#include "neon/shrn_high_n.h"
177274
#include "neon/shrn_n.h"
275+
#include "neon/sli_n.h"
276+
#include "neon/sm3.h"
277+
#include "neon/sm4.h"
178278
#include "neon/sqadd.h"
279+
#include "neon/sqrt.h"
179280
#include "neon/sra_n.h"
180281
#include "neon/sri_n.h"
181282
#include "neon/st1.h"
182283
#include "neon/st1_lane.h"
284+
#include "neon/st1_x2.h"
285+
#include "neon/st1_x3.h"
286+
#include "neon/st1_x4.h"
287+
#include "neon/st1q_x2.h"
288+
#include "neon/st1q_x3.h"
289+
#include "neon/st1q_x4.h"
183290
#include "neon/st2.h"
184291
#include "neon/st2_lane.h"
185292
#include "neon/st3.h"
@@ -188,17 +295,21 @@
188295
#include "neon/st4_lane.h"
189296
#include "neon/sub.h"
190297
#include "neon/subhn.h"
298+
#include "neon/subhn_high.h"
191299
#include "neon/subl.h"
192300
#include "neon/subl_high.h"
193301
#include "neon/subw.h"
194302
#include "neon/subw_high.h"
303+
#include "neon/sudot_lane.h"
195304
#include "neon/tbl.h"
196305
#include "neon/tbx.h"
197306
#include "neon/trn.h"
198307
#include "neon/trn1.h"
199308
#include "neon/trn2.h"
200309
#include "neon/tst.h"
201310
#include "neon/uqadd.h"
311+
#include "neon/usdot.h"
312+
#include "neon/usdot_lane.h"
202313
#include "neon/uzp.h"
203314
#include "neon/uzp1.h"
204315
#include "neon/uzp2.h"

0 commit comments

Comments
 (0)