Skip to content

Conversation

Dr-Emann
Copy link

@Dr-Emann Dr-Emann commented Jun 30, 2025

Adds the fallback optimized count implementation to rebar engines to compare. Appears to be quite a bit faster on my machine.

This also includes a kani model checking harness to verify some of the SWAR functions (and a step to run cargo kani in CI). Not sure if you'll want to pick it up or not, but it was nice to have confidence that it works for all values.

rebar results

Results from my m1 macbook pro on macos

benchmark                          engine                                 /tmp/old.csv      /tmp/new.csv
---------                          ------                                 ------------      ------------
memchr/sherlock/common/huge1       rust/memchr/memchr/fallback/onlycount  92.33us (3.98x)   23.21us (1.00x)
memchr/sherlock/common/small1      rust/memchr/memchr/fallback/onlycount  125.00ns (3.05x)  41.00ns (1.00x)
memchr/sherlock/common/tiny1       rust/memchr/memchr/fallback/onlycount  1.00ns (1.00x)    1.00ns (1.00x)
memchr/sherlock/never/huge1        rust/memchr/memchr/fallback/onlycount  92.33us (3.98x)   23.21us (1.00x)
memchr/sherlock/never/small1       rust/memchr/memchr/fallback/onlycount  125.00ns (3.05x)  41.00ns (1.00x)
memchr/sherlock/never/tiny1        rust/memchr/memchr/fallback/onlycount  1.00ns (1.00x)    1.00ns (1.00x)
memchr/sherlock/never/empty1       rust/memchr/memchr/fallback/onlycount  1.00ns (1.00x)    1.00ns (1.00x)
memchr/sherlock/rare/huge1         rust/memchr/memchr/fallback/onlycount  92.38us (3.98x)   23.21us (1.00x)
memchr/sherlock/rare/small1        rust/memchr/memchr/fallback/onlycount  125.00ns (3.05x)  41.00ns (1.00x)
memchr/sherlock/rare/tiny1         rust/memchr/memchr/fallback/onlycount  1.00ns (1.00x)    1.00ns (1.00x)
memchr/sherlock/uncommon/huge1     rust/memchr/memchr/fallback/onlycount  92.29us (3.98x)   23.21us (1.00x)
memchr/sherlock/uncommon/small1    rust/memchr/memchr/fallback/onlycount  125.00ns (3.05x)  41.00ns (1.00x)
memchr/sherlock/uncommon/tiny1     rust/memchr/memchr/fallback/onlycount  1.00ns (1.00x)    1.00ns (1.00x)
memchr/sherlock/verycommon/huge1   rust/memchr/memchr/fallback/onlycount  92.33us (3.98x)   23.21us (1.00x)
memchr/sherlock/verycommon/small1  rust/memchr/memchr/fallback/onlycount  125.00ns (3.05x)  41.00ns (1.00x)

The fallback implementation is still definitely slower than the simd version (of course), but it's closer:

benchmark                          rust/memchr/memchr/fallback/onlycount  rust/memchr/memchr/onlycount
---------                          -------------------------------------  ----------------------------
memchr/sherlock/common/huge1       23.21us (1.77x)                        13.08us (1.00x)
memchr/sherlock/common/small1      41.00ns (1.00x)                        41.00ns (1.00x)
memchr/sherlock/common/tiny1       1.00ns (1.00x)                         1.00ns (1.00x)
memchr/sherlock/never/huge1        23.21us (1.77x)                        13.12us (1.00x)
memchr/sherlock/never/small1       41.00ns (1.00x)                        41.00ns (1.00x)
memchr/sherlock/never/tiny1        1.00ns (1.00x)                         1.00ns (1.00x)
memchr/sherlock/never/empty1       1.00ns (1.00x)                         1.00ns (1.00x)
memchr/sherlock/rare/huge1         23.21us (1.77x)                        13.08us (1.00x)
memchr/sherlock/rare/small1        41.00ns (1.00x)                        41.00ns (1.00x)
memchr/sherlock/rare/tiny1         1.00ns (1.00x)                         1.00ns (1.00x)
memchr/sherlock/uncommon/huge1     23.21us (1.77x)                        13.08us (1.00x)
memchr/sherlock/uncommon/small1    41.00ns (1.00x)                        41.00ns (1.00x)
memchr/sherlock/uncommon/tiny1     1.00ns (1.00x)                         1.00ns (1.00x)
memchr/sherlock/verycommon/huge1   23.21us (1.77x)                        13.08us (1.00x)
memchr/sherlock/verycommon/small1  41.00ns (1.00x)                        41.00ns (1.00x)

@Dr-Emann Dr-Emann force-pushed the fallback_count_impl branch from f94fd11 to a3b3952 Compare June 30, 2025 00:53
@Dr-Emann Dr-Emann mentioned this pull request Aug 4, 2025
@Dr-Emann Dr-Emann closed this Aug 4, 2025
@Dr-Emann Dr-Emann reopened this Aug 4, 2025
@Dr-Emann Dr-Emann force-pushed the fallback_count_impl branch 4 times, most recently from d85385c to 86fe7f9 Compare August 6, 2025 00:01
@Dr-Emann
Copy link
Author

Dr-Emann commented Aug 6, 2025

@BurntSushi What are your thoughts on the kani checks, do you want them, or should I drop them?

@Dr-Emann Dr-Emann force-pushed the fallback_count_impl branch from 86fe7f9 to c3f2eb8 Compare October 7, 2025 04:42
@Dr-Emann Dr-Emann changed the title Implement the count optimization for the fallback implementation with SWAR Implement the count optimization for the fallback implementation with SWAR (3-4x faster) Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant