Skip to content

Commit c24e95d

Browse files
committed
bump version, update readme
1 parent 2d0e3fa commit c24e95d

File tree

2 files changed

+121
-61
lines changed

2 files changed

+121
-61
lines changed

readme.md

Lines changed: 120 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,12 @@
11
Blake2Fast
22
==========
33

4-
These [RFC 7693](https://tools.ietf.org/html/rfc7693)-compliant BLAKE2 implementations have been tuned for high speed and low memory usage. The .NET Core 2.1 and 3.0 builds support the new X86 SIMD Intrinsics for even greater speed. `Span<byte>` is used throughout for lower memory overhead compared to `byte[]` based APIs.
4+
These [RFC 7693](https://tools.ietf.org/html/rfc7693)-compliant BLAKE2 implementations have been tuned for high speed and low memory usage. The .NET Core 2.1 and 3.0 builds use the new x86 SIMD Intrinsics for even greater speed. `Span<byte>` is used throughout for lower memory overhead compared to `byte[]` based APIs.
55

6-
Sample benchmark results comparing with built-in .NET algorithms, 10MiB input, .NET Core x64 and x86 runtimes:
6+
On .NET Core 2.1, Blake2Fast uses an SSE4.1 SIMD-accelerated implementation for both BLAKE2b and BLAKE2s.
77

8-
``` ini
9-
10-
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
11-
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
12-
.NET Core SDK=3.0.100-preview-010184
13-
[Host] : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
14-
netcoreapp1.1 : .NET Core 1.1.8 (CoreCLR 4.6.26328.01, CoreFX 4.6.24705.01), 64bit RyuJIT
15-
netcoreapp2.1 : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
16-
netcoreapp3.0 : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
17-
18-
Jit=RyuJit Toolchain=Default
19-
20-
```
21-
| Method | Job | Platform | Mean | Error | StdDev | Allocated |
22-
|------------ |-------------- |--------- |----------:|----------:|----------:|----------:|
23-
| Blake2bFast | netcoreapp1.1 | X64 | 10.753 ms | 0.0502 ms | 0.0445 ms | 0 B |
24-
| Blake2sFast | netcoreapp1.1 | X64 | 17.066 ms | 0.1528 ms | 0.1355 ms | 0 B |
25-
| | | | | | | |
26-
| Blake2bFast | netcoreapp2.1 | X64 | 10.230 ms | 0.0708 ms | 0.0662 ms | 0 B |
27-
| Blake2sFast | netcoreapp2.1 | X64 | 13.678 ms | 0.0258 ms | 0.0216 ms | 0 B |
28-
| | | | | | | |
29-
| Blake2bFast | netcoreapp3.0 | X64 | 8.792 ms | 0.0305 ms | 0.0254 ms | 0 B |
30-
| Blake2sFast | netcoreapp3.0 | X64 | 13.687 ms | 0.0463 ms | 0.0433 ms | 0 B |
31-
| MD5 | netcoreapp3.0 | X64 | 17.894 ms | 0.0632 ms | 0.0561 ms | 0 B |
32-
| SHA256 | netcoreapp3.0 | X64 | 38.607 ms | 0.2877 ms | 0.2691 ms | 0 B |
33-
| SHA512 | netcoreapp3.0 | X64 | 23.498 ms | 0.1493 ms | 0.1397 ms | 304 B |
34-
35-
| Method | Job | Platform | Mean | Error | StdDev | Allocated |
36-
|------------ |-------------- |--------- |----------:|----------:|----------:|----------:|
37-
| Blake2bFast | netcoreapp1.1 | X86 | 68.925 ms | 0.1575 ms | 0.1315 ms | 0 B |
38-
| Blake2sFast | netcoreapp1.1 | X86 | 67.513 ms | 0.5069 ms | 0.4742 ms | 0 B |
39-
| | | | | | | |
40-
| Blake2bFast | netcoreapp2.1 | X86 | 14.208 ms | 0.0876 ms | 0.0819 ms | 0 B |
41-
| Blake2sFast | netcoreapp2.1 | X86 | 13.628 ms | 0.0399 ms | 0.0333 ms | 0 B |
42-
| | | | | | | |
43-
| Blake2bFast | netcoreapp3.0 | X86 | 8.965 ms | 0.0483 ms | 0.0452 ms | 0 B |
44-
| Blake2sFast | netcoreapp3.0 | X86 | 13.636 ms | 0.0474 ms | 0.0443 ms | 0 B |
45-
| MD5 | netcoreapp3.0 | X86 | 16.966 ms | 0.1235 ms | 0.1155 ms | 0 B |
46-
| SHA256 | netcoreapp3.0 | X86 | 44.138 ms | 0.1181 ms | 0.0986 ms | 0 B |
47-
| SHA512 | netcoreapp3.0 | X86 | 37.384 ms | 0.3196 ms | 0.2989 ms | 0 B |
48-
49-
Duplicate results have been removed from the above tables for the sake of brevity.
50-
51-
Note that the built-in cryptographic hash algorithms in .NET forward to platform-native libraries for their implementations. On Windows, this means the implementations are provided by [Windows CNG](https://docs.microsoft.com/en-us/windows/desktop/seccng/cng-portal). Their performance is therefore identical across all .NET Core versions.
52-
53-
On .NET Framework and .NET Core 1.1, only scalar implementations are available for both BLAKE2 algorithms. The scalar implementations outperform the built-in .NET algorithms on x64 platforms, but they are significantly slower on x86.
54-
55-
On .NET Core 2.1, Blake2Fast uses an SSE4.1 SIMD-accelerated implementation for both BLAKE2b and BLAKE2s. On .NET Core 3.0, an AVX2 implementation of BLAKE2b is available (with SSE4.1 fallback for older processors), while BLAKE2s uses the same SSE4.1 implementation. These are faster than the .NET built-in algorithms on either processor architecture.
8+
On .NET Core 3.0, a faster AVX2 implementation of BLAKE2b is available (with SSE4.1 fallback for older processors), while BLAKE2s uses the same SSE4.1 implementation.
569

57-
You can find more detailed comparisons between Blake2Fast and other .NET BLAKE2 implementations starting [here](https://photosauce.net/blog/post/fast-hashing-with-blake2-part-1-nuget-is-a-minefield). The short version is that Blake2Fast is the fastest and lowest-memory version of RFC-compliant BLAKE2 available for .NET.
5810

5911
Installation
6012
------------
@@ -95,14 +47,15 @@ BLAKE2 hashes can be incrementally updated if you do not have the data available
9547
```C#
9648
async Task<byte[]> ComputeHashAsync(Stream data)
9749
{
98-
var incHash = Blake2b.CreateIncrementalHasher();
99-
var buffer = new byte[4096];
100-
int bytesRead;
50+
var hasher = Blake2b.CreateIncrementalHasher();
51+
var buffer = ArrayPool<byte>.Shared.Rent(4096);
10152

53+
int bytesRead;
10254
while ((bytesRead = await data.ReadAsync(buffer, 0, buffer.Length)) > 0)
103-
incHash.Update(new Span<byte>(buffer, 0, bytesRead));
55+
hasher.Update(new Span<byte>(buffer, 0, bytesRead));
10456

105-
return incHash.Finish();
57+
ArrayPool<byte>.Shared.Return(buffer);
58+
return hasher.Finish();
10659
}
10760
```
10861

@@ -132,10 +85,10 @@ For interoperating with code that uses `System.Security.Cryptography` primitives
13285
`HashAlgorithm` is less efficient than the above methods, so use it only when necessary for compatibility.
13386

13487
```C#
135-
byte[] WriteDataAndCalculateHash(byte[] data)
88+
byte[] WriteDataAndCalculateHash(byte[] data, string outFile)
13689
{
13790
using (var hashAlg = Blake2b.CreateHashAlgorithm())
138-
using (var fileStream = new FileStream(@"c:\data\output.bin", FileMode.Create))
91+
using (var fileStream = new FileStream(outFile, FileMode.Create))
13992
using (var cryptoStream = new CryptoStream(fileStream, hashAlg, CryptoStreamMode.Write))
14093
{
14194
cryptoStream.Write(data, 0, data.Length);
@@ -148,8 +101,115 @@ byte[] WriteDataAndCalculateHash(byte[] data)
148101
SIMD Intrinsics Warning
149102
-----------------------
150103

151-
The X86 SIMD Intrinsics used in the .NET Core 2.1 build are not officially supported by Microsoft. Although the specific SSE Intrinsics used by Blake2Fast have been well-tested, the JIT support for the X86 Intrinsics in general is experimental in .NET Core 2.1.
104+
**This warning applies only to .NET Core 2.1**; the older build targets use only the scalar code, and SIMD intrinsics are fully supported on .NET Core 3.0.
105+
106+
The x86 SIMD Intrinsics used in the .NET Core 2.1 build are not officially supported by Microsoft. Although the specific SSE Intrinsics used by Blake2Fast have been well-tested, the JIT support for the x86 Intrinsics in general is experimental in .NET Core 2.1.
152107

153108
If you are uncomfortable using unsupported functionality, you can make a custom build of Blake2Fast by removing the `USE_INTRINSICS` define constant in the [project file](src/Blake2Fast/Blake2Fast.csproj).
154109

155-
This warning applies only to .NET Core 2.1; the older build targets use only the scalar code, and SIMD intrinsics will be fully supported on .NET Core 3.0+.
110+
111+
Benchmarks
112+
----------
113+
114+
Sample results from the [Blake2.Bench](tests/Blake2.Bench) project. Benchmarks were run on the .NET Core 3.0-preview7 x64 runtime. Configuration below:
115+
116+
``` ini
117+
118+
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
119+
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
120+
.NET Core SDK=3.0.100-preview7-012821
121+
[Host] : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
122+
ShortRun : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
123+
124+
Job=ShortRun IterationCount=3 LaunchCount=1 WarmupCount=3
125+
126+
```
127+
128+
### Blake2Fast vs .NET in-box algorithms (MD5 and SHA2)
129+
130+
```
131+
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
132+
|----------- |------------:|----------------:|---------------:|---------------:|-------:|------:|------:|----------:|
133+
| BLAKE2-256 | 3 | 111.6 ns | 5.079 ns | 0.2784 ns | 0.0134 | - | - | 56 B |
134+
| BLAKE2-512 | 3 | 138.5 ns | 8.805 ns | 0.4826 ns | 0.0210 | - | - | 88 B |
135+
| MD5 | 3 | 544.1 ns | 48.366 ns | 2.6511 ns | 0.0496 | - | - | 208 B |
136+
| SHA-256 | 3 | 711.8 ns | 8.934 ns | 0.4897 ns | 0.0572 | - | - | 240 B |
137+
| SHA-512 | 3 | 734.7 ns | 35.255 ns | 1.9324 ns | 0.0725 | - | - | 304 B |
138+
| | | | | | | | | |
139+
| BLAKE2-256 | 3268 | 4,174.0 ns | 139.581 ns | 7.6509 ns | 0.0076 | - | - | 56 B |
140+
| BLAKE2-512 | 3268 | 2,693.9 ns | 1.073 ns | 0.0588 ns | 0.0191 | - | - | 88 B |
141+
| MD5 | 3268 | 5,840.9 ns | 187.058 ns | 10.2533 ns | 0.0458 | - | - | 208 B |
142+
| SHA-256 | 3268 | 12,563.8 ns | 271.360 ns | 14.8742 ns | 0.0458 | - | - | 240 B |
143+
| SHA-512 | 3268 | 7,532.9 ns | 98.917 ns | 5.4220 ns | 0.0687 | - | - | 304 B |
144+
| | | | | | | | | |
145+
| BLAKE2-256 | 3145728 | 3,909,347.1 ns | 120,876.614 ns | 6,625.6551 ns | - | - | - | 56 B |
146+
| BLAKE2-512 | 3145728 | 2,497,492.3 ns | 50,301.798 ns | 2,757.2113 ns | - | - | - | 88 B |
147+
| MD5 | 3145728 | 5,085,250.3 ns | 95,827.863 ns | 5,252.6485 ns | - | - | - | 208 B |
148+
| SHA-256 | 3145728 | 10,936,735.2 ns | 674,402.898 ns | 36,966.2985 ns | - | - | - | 240 B |
149+
| SHA-512 | 3145728 | 6,620,802.9 ns | 32,556.339 ns | 1,784.5228 ns | - | - | - | 304 B |
150+
```
151+
152+
Note that the built-in cryptographic hash algorithms in .NET Core forward to platform-native libraries for their implementations. On Windows, this means the implementations are provided by [Windows CNG](https://docs.microsoft.com/en-us/windows/desktop/seccng/cng-portal). Performance may differ on Linux.
153+
154+
On .NET Framework, only scalar (not SIMD) implementations are available for both BLAKE2 algorithms. The scalar implementations outperform the built-in .NET algorithms in 64-bit applications, but they are slower for large input data on 32-bit. The SIMD implementations available in .NET Core are faster than the built-in algorithms on either processor architecture.
155+
156+
### Blake2Fast vs other BLAKE2b implementations available on Nuget
157+
158+
```
159+
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
160+
|-------------------- |------------:|----------------:|------------------:|------------------:|----------:|----------:|----------:|------------:|
161+
| *Blake2Fast.Blake2b | 3 | 141.0 ns | 11.192 ns | 0.6135 ns | 0.0076 | - | - | 32 B |
162+
| Blake2Sharp(1) | 3 | 380.5 ns | 30.801 ns | 1.6883 ns | 0.2065 | - | - | 864 B |
163+
| ByteTerrace(2) | 3 | 455.3 ns | 4.572 ns | 0.2506 ns | 0.1087 | - | - | 456 B |
164+
| S.D.HashFunction(3) | 3 | 1,819.3 ns | 45.298 ns | 2.4829 ns | 0.4158 | - | - | 1744 B |
165+
| Konscious(4) | 3 | 1,282.5 ns | 58.913 ns | 3.2292 ns | 0.2289 | - | - | 960 B |
166+
| Isopoh(5) | 3 | 4,920,916.8 ns | 54,306,897.991 ns | 2,976,744.3293 ns | 1753.6621 | 1740.2344 | 1740.2344 | 527448084 B |
167+
| Blake2Core(6) | 3 | 1,394.8 ns | 44.357 ns | 2.4314 ns | 0.2060 | - | - | 864 B |
168+
| NSec(7) | 3 | 189.6 ns | 4.810 ns | 0.2636 ns | 0.0267 | - | - | 112 B |
169+
| | | | | | | | | |
170+
| *Blake2Fast.Blake2b | 3268 | 2,686.8 ns | 16.774 ns | 0.9195 ns | 0.0076 | - | - | 32 B |
171+
| Blake2Sharp(1) | 3268 | 4,338.0 ns | 173.013 ns | 9.4834 ns | 0.2060 | - | - | 864 B |
172+
| ByteTerrace(2) | 3268 | 4,090.6 ns | 158.552 ns | 8.6908 ns | 0.1068 | - | - | 456 B |
173+
| S.D.HashFunction(3) | 3268 | 29,381.7 ns | 261.868 ns | 14.3539 ns | 2.2278 | - | - | 9344 B |
174+
| Konscious(4) | 3268 | 16,620.0 ns | 1,402.499 ns | 76.8757 ns | 0.2136 | - | - | 960 B |
175+
| Isopoh(5) | 3268 | 3,392,905.6 ns | 24,844,814.249 ns | 1,361,828.1041 ns | 2203.3691 | 2186.0352 | 2186.0352 | 670057939 B |
176+
| Blake2Core(6) | 3268 | 20,614.0 ns | 200.856 ns | 11.0096 ns | 0.1831 | - | - | 864 B |
177+
| NSec(7) | 3268 | 2,819.8 ns | 79.142 ns | 4.3380 ns | 0.0267 | - | - | 112 B |
178+
| | | | | | | | | |
179+
| *Blake2Fast.Blake2b | 3145728 | 2,503,472.0 ns | 71,056.282 ns | 3,894.8346 ns | - | - | - | 32 B |
180+
| Blake2Sharp(1) | 3145728 | 3,954,441.7 ns | 169,463.338 ns | 9,288.8574 ns | - | - | - | 864 B |
181+
| ByteTerrace(2) | 3145728 | 3,639,843.0 ns | 92,425.183 ns | 5,066.1361 ns | - | - | - | 456 B |
182+
| S.D.HashFunction(3) | 3145728 | 27,317,234.4 ns | 711,323.445 ns | 38,990.0383 ns | 1781.2500 | - | - | 7472544 B |
183+
| Konscious(4) | 3145728 | 15,110,314.6 ns | 305,461.330 ns | 16,743.3662 ns | - | - | - | 960 B |
184+
| Isopoh(5) | 3145728 | 3,968,873.4 ns | 86,677.772 ns | 4,751.1011 ns | - | - | - | 984 B |
185+
| Blake2Core(6) | 3145728 | 18,638,068.8 ns | 1,356,570.399 ns | 74,358.2011 ns | - | - | - | 864 B |
186+
| NSec(7) | 3145728 | 2,561,597.0 ns | 25,735.378 ns | 1,410.6429 ns | - | - | - | 112 B |
187+
```
188+
189+
(1) `Blake2Sharp` is the reference C# BLAKE2b implementation from the [official BLAKE2 repo](https://github.com/BLAKE2/BLAKE2). This version is not published to Nuget, so the source is included in the benchmark project directly.
190+
(2) `ByteTerrace.Maths.Cryptography.Blake2` version 0.0.4. This package also includes a BLAKE2s implementation, but it crashed on the 3268 byte and 3KiB inputs, so it is included only in the BLAKE2b benchmark.
191+
(3) `System.Data.HashFunction.Blake2` version 2.0.0. BLAKE2b only.
192+
(4) `Konscious.Security.Cryptography.Blake2` version 1.0.9. BLAKE2b only.
193+
(5) `Isopoh.Cryptography.Blake2b` version 1.1.2.
194+
(6) `Blake2Core` version 1.0.0. This package contains the reference Blake2Sharp code compiled as a debug (unoptimized) build. BenchmarkDotNet errors in such cases, so the settings were overridden to allow this library to run.
195+
(7) `NSec.Cryptography` 19.5.0. This implementation of BLAKE2 is not RFC-compliant in that it does not allow digest sizes less than 16 bytes. This library forwards to a referenced native library (libsodium), which contains an AVX2 implementation of BLAKE2b.
196+
197+
### Blake2Fast vs other BLAKE2s implementations available on Nuget
198+
199+
```
200+
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
201+
|-------------------- |------------:|---------------:|---------------:|---------------:|-------:|------:|------:|----------:|
202+
| *Blake2Fast.Blake2s | 3 | 108.9 ns | 3.378 ns | 0.1852 ns | 0.0076 | - | - | 32 B |
203+
| Blake2s-net(1) | 3 | 255.3 ns | 10.771 ns | 0.5904 ns | 0.1278 | - | - | 536 B |
204+
| | | | | | | | | |
205+
| *Blake2Fast.Blake2s | 3268 | 4,169.5 ns | 176.109 ns | 9.6531 ns | 0.0076 | - | - | 32 B |
206+
| Blake2s-net(1) | 3268 | 5,964.5 ns | 165.185 ns | 9.0544 ns | 0.1221 | - | - | 536 B |
207+
| | | | | | | | | |
208+
| *Blake2Fast.Blake2s | 3145728 | 3,906,812.0 ns | 73,568.528 ns | 4,032.5393 ns | - | - | - | 32 B |
209+
| Blake2s-net(1) | 3145728 | 5,469,015.9 ns | 194,030.194 ns | 10,635.4497 ns | - | - | - | 536 B |
210+
```
211+
212+
(1) blake2s-net version 0.1.0. This is a conversion of the reference Blake2Sharp code to support BLAKE2s. It is the only other properly working BLAKE2s implementation I could find on Nuget.
213+
214+
You can find more detailed comparisons between Blake2Fast and other .NET BLAKE2 implementations starting [here](https://photosauce.net/blog/post/fast-hashing-with-blake2-part-1-nuget-is-a-minefield). The short version is that Blake2Fast is the fastest and lowest-memory version of RFC-compliant BLAKE2 available for .NET.
215+

src/Blake2Fast/Blake2Fast.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<Project Sdk="Microsoft.NET.Sdk">
22

33
<PropertyGroup>
4-
<VersionPrefix>0.3.0</VersionPrefix>
4+
<VersionPrefix>1.0.0</VersionPrefix>
55
<TargetFrameworks>netstandard1.1;netstandard1.3;netstandard2.0;netcoreapp2.1;netcoreapp3.0;net45</TargetFrameworks>
66
</PropertyGroup>
77

0 commit comments

Comments
 (0)