add AVX512 support for filtering in place #5399

connortsui20 · 2025-11-19T14:57:40Z

Reorganizes some of the filter implementations and adds AVX512 filter implementation and benchmarks

codecov · 2025-11-19T15:09:43Z

Codecov Report

❌ Patch coverage is 75.83643% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.45%. Comparing base (c75c8a3) to head (eae0511).

Files with missing lines	Patch %	Lines
vortex-compute/src/filter/slice/simd_compress.rs	9.57%	85 Missing ⚠️
vortex-compute/src/filter/slice/out/by_mask.rs	55.55%	28 Missing ⚠️
vortex-compute/src/filter/slice/out/avx512.rs	91.07%	5 Missing ⚠️
vortex-compute/src/filter/slice/out/by_bitview.rs	87.50%	5 Missing ⚠️
vortex-compute/src/filter/slice/in_place/avx512.rs	92.30%	4 Missing ⚠️
vortex-compute/src/filter/slice/out/mod.rs	98.38%	2 Missing ⚠️
vortex-compute/src/filter/slice/in_place/mod.rs	99.08%	1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

a10y · 2025-11-19T15:53:10Z

Does GitHub give us CI runners with AVX512 support? My recollection from when I did the AVX2 take kernel is that they don't

connortsui20 · 2025-11-19T16:03:01Z

not really sure how to deal with this, I think sometimes they have it, sometimes not?

robert3005 · 2025-11-19T16:03:08Z

CI runs in AWS, you can get whatever machine you want, right now by default we use m7i/m7a which are sapphire rapids/zen4 which have AVX512. On benchmarks we run c6i which is ice-lake sp which also has avx512

0ax1 · 2025-11-20T09:33:41Z

Heads up that we can't micro-benchmark avx512 with codspeed, as valgrind/callgrind does not support avx512. Saying you'll get numbers for avx2 if it is implemented:

      - name: Build benchmarks (shard 1)
        env:
          RUSTFLAGS: "-C target-feature=+avx2"
      ...

connortsui20 · 2025-11-20T18:54:42Z

Given that were pivoting to batch I'm going to put a pin in this.

gatesn · 2025-11-20T19:11:45Z

Nooo I like this!

connortsui20 · 2025-11-24T15:12:07Z

vortex-compute/src/filter/slice.rs

The items here are moved to the out module in slice

vortex-compute/benches/avx512.rs

codspeed-hq · 2025-11-24T15:47:48Z

CodSpeed Performance Report

Merging #5399 will not alter performance

_{Comparing ct/avx512-filter (eae0511) with develop (c75c8a3)}

Summary

✅ 1478 untouched
🆕 56 new
⏩ 214 skipped¹

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`in_place_scalar[(1024, 0.0)]`	N/A	4.6 µs	N/A
🆕	`in_place_scalar[(1024, 0.1)]`	N/A	6.2 µs	N/A
🆕	`in_place_scalar[(1024, 0.25)]`	N/A	7 µs	N/A
🆕	`in_place_scalar[(1024, 0.5)]`	N/A	7.6 µs	N/A
🆕	`in_place_scalar[(1024, 0.75)]`	N/A	8.2 µs	N/A
🆕	`in_place_scalar[(1024, 0.9)]`	N/A	8.5 µs	N/A
🆕	`in_place_scalar[(1024, 1.0)]`	N/A	8.8 µs	N/A
🆕	`in_place_scalar[(131072, 0.0)]`	N/A	553.9 µs	N/A
🆕	`in_place_scalar[(131072, 0.1)]`	N/A	784.8 µs	N/A
🆕	`in_place_scalar[(131072, 0.25)]`	N/A	866 µs	N/A
🆕	`in_place_scalar[(131072, 0.5)]`	N/A	943.8 µs	N/A
🆕	`in_place_scalar[(131072, 0.75)]`	N/A	1 ms	N/A
🆕	`in_place_scalar[(131072, 0.9)]`	N/A	1.1 ms	N/A
🆕	`in_place_scalar[(131072, 1.0)]`	N/A	1.1 ms	N/A
🆕	`in_place_scalar[(16384, 0.0)]`	N/A	69.5 µs	N/A
🆕	`in_place_scalar[(16384, 0.1)]`	N/A	97.8 µs	N/A
🆕	`in_place_scalar[(16384, 0.25)]`	N/A	108.1 µs	N/A
🆕	`in_place_scalar[(16384, 0.5)]`	N/A	117.7 µs	N/A
🆕	`in_place_scalar[(16384, 0.75)]`	N/A	126.8 µs	N/A
🆕	`in_place_scalar[(16384, 0.9)]`	N/A	132.2 µs	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

214 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

vortex-compute/src/filter/slice/simd_compress.rs

connortsui20 · 2025-11-24T17:49:05Z

@0ax1 is there a way for me to disable the benchmark just for codspeed?

a10y · 2025-11-24T17:50:55Z

vortex-compute/src/filter/slice/in_place/mod.rs

+        if (mask[byte_idx] >> bit_idx) & 1 == 1 {
+            data[write_pos] = data[read_pos];
+            write_pos += 1;
+        }


note: i think you can do this faster by always doing speculative write to data[write_pos], and then just incrementing write_pos by the mask bit on each loop iteration. FSST uses a similar trick in its compression loop.

maybe save for FLUP though so we can benchmark the difference

yeah this entire function is super inefficient but I'm going to wait on optimizing all of these further until we actually start using it in operators and can get better / more realistic benchmarks

We probably shouldn't merge SIMD code unless it is fully optimized. It's a bit misleading

I could just remove this then? This isn't SIMD code

0ax1 · 2025-11-24T18:04:11Z

@0ax1 is there a way for me to disable the benchmark just for codspeed?

~~You shouldn't need to. We compile for AVX2, so the AVX512 version shouldn't be picked up.~~

~~Ah I see that we're running into failed to execute the benchmark process, exit code: 132 which I assume is picked based on runtime capabilities of the CPU?~~

Discussed to exclude the benchmark via: #[cfg(not(codspeed))]

connortsui20 · 2025-11-24T18:05:51Z

hmm it seems I've done something wrong here then: https://github.com/vortex-data/vortex/actions/runs/19644046057/job/56254751182?pr=5399

Signed-off-by: Connor Tsui <[email protected]>

connortsui20 requested a review from gatesn November 19, 2025 14:57

connortsui20 added the feature Release label indicating a new feature or request label Nov 19, 2025

connortsui20 force-pushed the ct/avx512-filter branch 10 times, most recently from 02c00d3 to 4f8baa2 Compare November 19, 2025 19:16

connortsui20 mentioned this pull request Nov 19, 2025

BitView SIMD filtering #5356

Open

connortsui20 closed this Nov 20, 2025

gatesn reopened this Nov 20, 2025

connortsui20 force-pushed the ct/avx512-filter branch 2 times, most recently from 4ae987c to be41513 Compare November 24, 2025 15:02

connortsui20 marked this pull request as ready for review November 24, 2025 15:04

connortsui20 marked this pull request as draft November 24, 2025 15:05

connortsui20 force-pushed the ct/avx512-filter branch 2 times, most recently from e2529e9 to 36a28c6 Compare November 24, 2025 15:10

connortsui20 marked this pull request as ready for review November 24, 2025 15:10

connortsui20 force-pushed the ct/avx512-filter branch from 36a28c6 to 923ddb4 Compare November 24, 2025 15:11

connortsui20 commented Nov 24, 2025

View reviewed changes

vortex-compute/src/filter/slice.rs Outdated

Copy link

Contributor Author

connortsui20 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items here are moved to the out module in slice

connortsui20 requested a review from 0ax1 November 24, 2025 15:12

0ax1 reviewed Nov 24, 2025

View reviewed changes

vortex-compute/benches/avx512.rs Show resolved Hide resolved

0ax1 reviewed Nov 24, 2025

View reviewed changes

vortex-compute/benches/avx512.rs Outdated Show resolved Hide resolved

connortsui20 requested a review from 0ax1 November 24, 2025 15:26

connortsui20 force-pushed the ct/avx512-filter branch from df01d9c to 6468b12 Compare November 24, 2025 15:31

a10y reviewed Nov 24, 2025

View reviewed changes

vortex-compute/src/filter/slice/simd_compress.rs Show resolved Hide resolved

a10y reviewed Nov 24, 2025

View reviewed changes

connortsui20 force-pushed the ct/avx512-filter branch 3 times, most recently from 4553fd8 to 723122a Compare November 25, 2025 16:37

add AVX512 filtering + reorg

eae0511

Signed-off-by: Connor Tsui <[email protected]>

connortsui20 force-pushed the ct/avx512-filter branch from 723122a to eae0511 Compare November 25, 2025 16:39

add AVX512 support for filtering in place #5399

Are you sure you want to change the base?

add AVX512 support for filtering in place #5399

Uh oh!

Conversation

connortsui20 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

a10y commented Nov 19, 2025

Uh oh!

connortsui20 commented Nov 19, 2025

Uh oh!

robert3005 commented Nov 19, 2025

Uh oh!

0ax1 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 commented Nov 20, 2025

Uh oh!

gatesn commented Nov 20, 2025

Uh oh!

connortsui20 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #5399 will not alter performance

Summary

Benchmarks breakdown

Footnotes

Uh oh!

Uh oh!

connortsui20 commented Nov 24, 2025

Uh oh!

a10y Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

connortsui20 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

gatesn Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

connortsui20 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

0ax1 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

connortsui20 commented Nov 19, 2025 •

edited

Loading

codecov bot commented Nov 19, 2025 •

edited

Loading

0ax1 commented Nov 20, 2025 •

edited

Loading

codspeed-hq bot commented Nov 24, 2025 •

edited

Loading

0ax1 commented Nov 24, 2025 •

edited

Loading