-
Notifications
You must be signed in to change notification settings - Fork 255
libm: optimize fmod
#1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libm: optimize fmod
#1002
Conversation
e795649 to
d749e82
Compare
|
Sorry I haven't reviewed yet; I am excited about this but have been a bit short on time. Any chance you're willing to commits to separate PRs? From a quick glance the first two look good, but I'll need a bit more time for the last one. Seriously impressive :) |
|
Yeah, I probably should have split it. I'm not home at this time, but I'll try to do that on Tuesday. If possible, could you run the walltime benchmarks on something non-x86? |
d749e82 to
facbc6f
Compare
fmod with a method using integer multiplicationfmod
facbc6f to
0e85407
Compare
0e85407 to
e459fc1
Compare
This comment has been minimized.
This comment has been minimized.
e459fc1 to
afb40ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here LGTM and the perf is awesome:
icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
Baselines: softfloat|softfloat (old)
Instructions: 75679|1016969 (-92.5584%) [-13.4379x]
L1 Hits: 85615|1019164 (-91.5995%) [-11.9040x]
LL Hits: 2|4 (-50.0000%) [-2.00000x]
RAM Hits: 20|11 (+81.8182%) [+1.81818x]
Total read+write: 85637|1019179 (-91.5975%) [-11.9012x]
Estimated Cycles: 86325|1019569 (-91.5332%) [-11.8108x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
Baselines: softfloat|softfloat (old)
Instructions: 1161536|30699762 (-96.2165%) [-26.4303x]
L1 Hits: 1534196|30715053 (-95.0051%) [-20.0203x]
LL Hits: 2|8 (-75.0000%) [-4.00000x]
RAM Hits: 56|36 (+55.5556%) [+1.55556x]
Total read+write: 1534254|30715097 (-95.0049%) [-20.0196x]
Estimated Cycles: 1536166|30716353 (-94.9989%) [-19.9955x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
Baselines: softfloat|softfloat (old)
Instructions: 30967|47936 (-35.3993%) [-1.54797x]
L1 Hits: 37383|54346 (-31.2130%) [-1.45376x]
LL Hits: 1|3 (-66.6667%) [-3.00000x]
RAM Hits: 12|16 (-25.0000%) [-1.33333x]
Total read+write: 37396|54365 (-31.2131%) [-1.45377x]
Estimated Cycles: 37808|54921 (-31.1593%) [-1.45263x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
Baselines: softfloat|softfloat (old)
Instructions: 40517|150311 (-73.0446%) [-3.70983x]
L1 Hits: 45357|152509 (-70.2595%) [-3.36241x]
LL Hits: 1|3 (-66.6667%) [-3.00000x]
RAM Hits: 14|9 (+55.5556%) [+1.55556x]
Total read+write: 45372|152521 (-70.2520%) [-3.36157x]
Estimated Cycles: 45852|152839 (-69.9998%) [-3.33331x]
icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
Baselines: hardfloat|hardfloat (old)
Instructions: 75679|1016969 (-92.5584%) [-13.4379x]
L1 Hits: 85616|1019164 (-91.5994%) [-11.9039x]
LL Hits: 3|4 (-25.0000%) [-1.33333x]
RAM Hits: 18|11 (+63.6364%) [+1.63636x]
Total read+write: 85637|1019179 (-91.5975%) [-11.9012x]
Estimated Cycles: 86261|1019569 (-91.5395%) [-11.8196x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
Baselines: hardfloat|hardfloat (old)
Instructions: 1161536|30699762 (-96.2165%) [-26.4303x]
L1 Hits: 1534188|30715053 (-95.0051%) [-20.0204x]
LL Hits: 8|6 (+33.3333%) [+1.33333x]
RAM Hits: 58|38 (+52.6316%) [+1.52632x]
Total read+write: 1534254|30715097 (-95.0049%) [-20.0196x]
Estimated Cycles: 1536258|30716413 (-94.9986%) [-19.9943x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
Baselines: hardfloat|hardfloat (old)
Instructions: 30967|47936 (-35.3993%) [-1.54797x]
L1 Hits: 37380|54346 (-31.2185%) [-1.45388x]
LL Hits: 2|3 (-33.3333%) [-1.50000x]
RAM Hits: 14|16 (-12.5000%) [-1.14286x]
Total read+write: 37396|54365 (-31.2131%) [-1.45377x]
Estimated Cycles: 37880|54921 (-31.0282%) [-1.44987x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
Baselines: hardfloat|hardfloat (old)
Instructions: 40517|150311 (-73.0446%) [-3.70983x]
L1 Hits: 45358|152509 (-70.2588%) [-3.36234x]
LL Hits: 1|3 (-66.6667%) [-3.00000x]
RAM Hits: 13|9 (+44.4444%) [+1.44444x]
Total read+write: 45372|152521 (-70.2520%) [-3.36157x]
Estimated Cycles: 45818|152839 (-70.0220%) [-3.33579x]
Thank you for all of the work on this one!
afb40ad to
11ff530
Compare
This is kind of a retry at rust-lang#898. One of the problems there was that it would have added overhead and regressed performance for typical inputs. Unlike that PR, this doesn't aim for sub-linear scaling; the cost of evaluating `fmod(x, y)` is still roughly proportional to `log2(|x/y|)`. However, the constant factor is much better. Running the `random`-benchmarks locally, I got walltime reductions of fmodf16: -56.9% fmodf: -85.0% fmod: -95.4% fmodf128: -98.7%
11ff530 to
d94ea4d
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
Rebased and put the PR message into the commit since GH won't let me do that without squashing, I'll merge when CI completes. |
This is kind of a retry at #898. One of the problems there was that it would have added overhead and regressed performance for typical inputs.
Unlike that PR, this doesn't aim for sub-linear scaling; the cost of evaluating
fmod(x, y)is still roughly proportional tolog2(|x/y|). However, the constant factor is much better. Running therandom-benchmarks locally, I got walltime reductions ofNeeds #1011 and #1012