Skip to content

Conversation

@quaternic
Copy link
Contributor

@quaternic quaternic commented Aug 1, 2025

This is kind of a retry at #898. One of the problems there was that it would have added overhead and regressed performance for typical inputs.

Unlike that PR, this doesn't aim for sub-linear scaling; the cost of evaluating fmod(x, y) is still roughly proportional to log2(|x/y|). However, the constant factor is much better. Running the random-benchmarks locally, I got walltime reductions of

fmodf16:  -56.9%
fmodf:    -85.0%
fmod:     -95.4%
fmodf128: -98.7%

Needs #1011 and #1012

@tgross35
Copy link
Contributor

Sorry I haven't reviewed yet; I am excited about this but have been a bit short on time.

Any chance you're willing to commits to separate PRs? From a quick glance the first two look good, but I'll need a bit more time for the last one.

fmodf128: -98.7%

Seriously impressive :)

@quaternic
Copy link
Contributor Author

Yeah, I probably should have split it. I'm not home at this time, but I'll try to do that on Tuesday.

If possible, could you run the walltime benchmarks on something non-x86?

@quaternic quaternic changed the title Optimize fmod with a method using integer multiplication libm: optimize fmod Aug 12, 2025
@rustbot

This comment has been minimized.

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here LGTM and the perf is awesome:

 icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                       75679|1016969              (-92.5584%) [-13.4379x]
  L1 Hits:                            85615|1019164              (-91.5995%) [-11.9040x]
  LL Hits:                                2|4                    (-50.0000%) [-2.00000x]
  RAM Hits:                              20|11                   (+81.8182%) [+1.81818x]
  Total read+write:                   85637|1019179              (-91.5975%) [-11.9012x]
  Estimated Cycles:                   86325|1019569              (-91.5332%) [-11.8108x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                     1161536|30699762             (-96.2165%) [-26.4303x]
  L1 Hits:                          1534196|30715053             (-95.0051%) [-20.0203x]
  LL Hits:                                2|8                    (-75.0000%) [-4.00000x]
  RAM Hits:                              56|36                   (+55.5556%) [+1.55556x]
  Total read+write:                 1534254|30715097             (-95.0049%) [-20.0196x]
  Estimated Cycles:                 1536166|30716353             (-94.9989%) [-19.9955x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                       30967|47936                (-35.3993%) [-1.54797x]
  L1 Hits:                            37383|54346                (-31.2130%) [-1.45376x]
  LL Hits:                                1|3                    (-66.6667%) [-3.00000x]
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                   37396|54365                (-31.2131%) [-1.45377x]
  Estimated Cycles:                   37808|54921                (-31.1593%) [-1.45263x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                       40517|150311               (-73.0446%) [-3.70983x]
  L1 Hits:                            45357|152509               (-70.2595%) [-3.36241x]
  LL Hits:                                1|3                    (-66.6667%) [-3.00000x]
  RAM Hits:                              14|9                    (+55.5556%) [+1.55556x]
  Total read+write:                   45372|152521               (-70.2520%) [-3.36157x]
  Estimated Cycles:                   45852|152839               (-69.9998%) [-3.33331x]
 icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      hardfloat|hardfloat (old)
  Instructions:                       75679|1016969              (-92.5584%) [-13.4379x]
  L1 Hits:                            85616|1019164              (-91.5994%) [-11.9039x]
  LL Hits:                                3|4                    (-25.0000%) [-1.33333x]
  RAM Hits:                              18|11                   (+63.6364%) [+1.63636x]
  Total read+write:                   85637|1019179              (-91.5975%) [-11.9012x]
  Estimated Cycles:                   86261|1019569              (-91.5395%) [-11.8196x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      hardfloat|hardfloat (old)
  Instructions:                     1161536|30699762             (-96.2165%) [-26.4303x]
  L1 Hits:                          1534188|30715053             (-95.0051%) [-20.0204x]
  LL Hits:                                8|6                    (+33.3333%) [+1.33333x]
  RAM Hits:                              58|38                   (+52.6316%) [+1.52632x]
  Total read+write:                 1534254|30715097             (-95.0049%) [-20.0196x]
  Estimated Cycles:                 1536258|30716413             (-94.9986%) [-19.9943x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
  Baselines:                      hardfloat|hardfloat (old)
  Instructions:                       30967|47936                (-35.3993%) [-1.54797x]
  L1 Hits:                            37380|54346                (-31.2185%) [-1.45388x]
  LL Hits:                                2|3                    (-33.3333%) [-1.50000x]
  RAM Hits:                              14|16                   (-12.5000%) [-1.14286x]
  Total read+write:                   37396|54365                (-31.2131%) [-1.45377x]
  Estimated Cycles:                   37880|54921                (-31.0282%) [-1.44987x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      hardfloat|hardfloat (old)
  Instructions:                       40517|150311               (-73.0446%) [-3.70983x]
  L1 Hits:                            45358|152509               (-70.2588%) [-3.36234x]
  LL Hits:                                1|3                    (-66.6667%) [-3.00000x]
  RAM Hits:                              13|9                    (+44.4444%) [+1.44444x]
  Total read+write:                   45372|152521               (-70.2520%) [-3.36157x]
  Estimated Cycles:                   45818|152839               (-70.0220%) [-3.33579x]

Thank you for all of the work on this one!

This is kind of a retry at rust-lang#898. One of the
problems there was that it would have added overhead and regressed
performance for typical inputs.

Unlike that PR, this doesn't aim for sub-linear scaling; the cost of
evaluating `fmod(x, y)` is still roughly proportional to `log2(|x/y|)`.
However, the constant factor is much better. Running the
`random`-benchmarks locally, I got walltime reductions of

    fmodf16:  -56.9%
    fmodf:    -85.0%
    fmod:     -95.4%
    fmodf128: -98.7%
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2025

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@tgross35
Copy link
Contributor

tgross35 commented Dec 5, 2025

Rebased and put the PR message into the commit since GH won't let me do that without squashing, I'll merge when CI completes.

@tgross35 tgross35 enabled auto-merge (rebase) December 6, 2025 00:06
@tgross35 tgross35 merged commit 23f6f33 into rust-lang:main Dec 6, 2025
38 checks passed
@quaternic quaternic deleted the fmod-reduce-opt branch December 6, 2025 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants