SIMD vectorization of Length.cpp, runs up to 2.8x faster. #1331

svenweb · 2025-11-27T00:09:28Z

Hello,

This PR increases the speed of Length.cpp by 1.13x - 2.83x by enabling the compiler to use SIMD vector instructions with SIMD directives. This is done safely by checking #ifdef HAVE_OPEN_SIMD so if the preprocessor macro HAVE_OPEN_SIMD has not been defined then the Length.cpp runs as usual.

On x86 Linux 32GiB DRAM, Intel i9 performance varied by size of line (# of points) and whether line was a CoordinateSequence or CoordinateXY vector.

`
Points | Vector Gain | CoordinateSequence Gain

      10 |         -- |           --
     100 |         -- |           --
    1000 |      1.67x |        1.67x
   10,000 |      2.83x |        2.33x
  100,000 |      1.83x |        1.96x
 1,000,000 |      1.75x |        1.48x
10,000,000 |      1.24x |        1.13x

`

The speed and throughput testing script I used:
myLengthTest.cpp

I built and ran ctest on both the x86 Linux and an M1 Mac, passed all tests.

Thank you,
Sven

…SIMD is defined.

pramsey · 2025-11-27T16:57:57Z

What would cause HAVE_OPEN_SIMD to be set though? Shouldn't there be an accompanying check in cmake or is HAVE_OPEN_SIMD just something intrinsic to some compilers?

pramsey · 2025-11-27T17:05:46Z

Reading on this simd directive, it sounds like in general compilers are already vectorizing pretty automatically. Does your code change (removing the pt0->pt1 assignment) without the simd directive end up vectorized anyways?

gregbaker · 2025-11-27T21:59:06Z

in general compilers are already vectorizing pretty automatically

This code can't be fully vectorized because the compiler is obliged to do the additions in the order specified to preserve any rounding error to be exactly what you asked for. Effectively it must implement (((l0+l1)+l2)+l3)+l4. The pragma gives it permission to treat the + as commutative and associative, allowing the automatic vectorization to happen.

Removed SIMD pragma directives for length calculation.

svenweb · 2025-11-28T22:33:57Z

Hi @pramsey ,

Reading on this simd directive, it sounds like in general compilers are already vectorizing pretty automatically. Does your code change (removing the pt0->pt1 assignment) without the simd directive end up vectorized anyways?

Yes my code change even without the simd directive or pragma vectorizes the multiplication and square root of the length of line method, with up to 2x speed and throughput increase.

Adding the HAVE_OPEN_SIMD directive and #pragma omp simd reduction(+:len) allows the compiler to also vectorize the addition as @gregbaker described, which increases performance further. This Compiler Explorer example shows the addition vectorizing with the #pragma and simd directive set.

To realize performance gains without having to set HAVE_OPEN_SIMD I have updated my commit and removed the HAVE_OPEN_SIMD directive and the #pragma. The only changes now are removing the loop-dependency of the pt0 -> pt1 assignment, which allows the compiler to auto-vectorize most of the loop.

Thank you!
Sven

pramsey · 2025-11-28T23:57:55Z

I'm fine w/ this in principle. Can you explain to me how HAVE_OPEN_SIMD would get set? By the compiler? Does CMake need any special detection?

svenweb · 2025-12-01T06:39:34Z

The HAVE_OPEN_SIMD would have to be set by a special detection in CMake. My experience with CMake is limited, but I think the CMake would look like this:

Check if compiler/toolchain supports basic OpenMP with FindOpenMP
If supported, set CMake OpenMP compiler flags

However detecting support for #pragma omp simd specifically could require more checks as different compilers support the SIMD subset of OpenMP unevenly, see clang has limited support for vectorization.

I could open a separate issue and continue looking into adding a reliable SIMD capability detection step to the CMake?

Thanks!

pramsey · 2025-12-01T15:35:10Z

Yes, I'll merge this and you can research compiler feature detection.

Enable the SIMD vectorization of the line-length loop when HAVE_OPEN_…

b398376

…SIMD is defined.

Remove SIMD directives and pragma from Length.cpp

b46392e

Removed SIMD pragma directives for length calculation.

pramsey merged commit c2a1d40 into libgeos:main Dec 1, 2025
27 checks passed

svenweb mentioned this pull request Dec 2, 2025

OpenMP SIMD compiler support detection in CMake #1334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SIMD vectorization of Length.cpp, runs up to 2.8x faster. #1331

SIMD vectorization of Length.cpp, runs up to 2.8x faster. #1331

Uh oh!

svenweb commented Nov 27, 2025

Uh oh!

pramsey commented Nov 27, 2025

Uh oh!

pramsey commented Nov 27, 2025

Uh oh!

gregbaker commented Nov 27, 2025

Uh oh!

svenweb commented Nov 28, 2025

Uh oh!

pramsey commented Nov 28, 2025

Uh oh!

svenweb commented Dec 1, 2025

Uh oh!

pramsey commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SIMD vectorization of Length.cpp, runs up to 2.8x faster. #1331

SIMD vectorization of Length.cpp, runs up to 2.8x faster. #1331

Uh oh!

Conversation

svenweb commented Nov 27, 2025

` Points | Vector Gain | CoordinateSequence Gain

Uh oh!

pramsey commented Nov 27, 2025

Uh oh!

pramsey commented Nov 27, 2025

Uh oh!

gregbaker commented Nov 27, 2025

Uh oh!

svenweb commented Nov 28, 2025

Uh oh!

pramsey commented Nov 28, 2025

Uh oh!

svenweb commented Dec 1, 2025

Uh oh!

pramsey commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`
Points | Vector Gain | CoordinateSequence Gain