-
Notifications
You must be signed in to change notification settings - Fork 483
Smaller stack usage for SHA-1, SHA-256 and SHA-512. #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
sjaeckel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks interesting. Thanks for the next PR :)
When looking at it it seems like we'd be trading computation in space to computation in time, meaning that the execution should be slower after the patch applied.
So I modified the timing demo a bit to show something relevant, and the before looks as follows:
sha512 : Process at 39
sha512-256 : Process at 39
sha384 : Process at 39
sha512-224 : Process at 39
sha1 : Process at 61
sha256 : Process at 122
sha224 : Process at 122
vs. after this patch applied:
sha512 : Process at 39
sha384 : Process at 40
sha512-256 : Process at 40
sha512-224 : Process at 40
sha1 : Process at 68
sha224 : Process at 106
sha256 : Process at 106
sha1 really got worse, sha512-based stayed more or less the same (maybe a little bit slower), but sha256-based got significantly better performance!?
Not sure what to do with sha1, maybe enable this patch via a new LTC_SMALL_STACK option?
The other two I'd simply take unconditionally.
What do you think?
|
I think my next PR will be about x86 (and amd64) specific intrinsics. Making the SHA-1, SHA-256 and SHA-512 much, much faster.
|
* Add the option to only run for a subset of algos. * Improve `hash` to show something meaningful. Signed-off-by: Steffen Jaeckel <[email protected]>
That's the timinig demo in
That depends on how you build the library. I usually simply run If you use CMake (and build in a folder inside the ltc folder) it'd be
Those previous tests were done with the standard config. With Before the patch: After the patch: So it seems like your patch improves the performance in the default case ( In the case FYI:
OK, that sounds nice. You're also thinking about adding |
|
My performance measurements are different. Maybe it depends on processor cache size, branch prediction buffer size and many other things. Before: After: My command line was: Another measurement, this time with Before: After: Here I was able to improve SHA-1 But it is still slower than |
For sure, since you most likely have a different CPU. But the differences of the algorithm classes themselves are comparable and my statement from above:
is thereby validated.
Absolutely.
FYI: lower value = faster, the number shown is "the number of CPU cycles per iteration" -> i.e. by having it changed from 110 to 121 you made it 10% slower :-D
No need to run all these, especially not To speed your local development cycle up I'd suggest you to run And I run |
Checklist