Skip to content

Conversation

@ycherkes
Copy link

@ycherkes ycherkes commented Aug 10, 2024

I created a fork Raffinert.FuzzySharp. Someone who'd like to try this can use this Raffinert.FuzzySharp NuGet until this PR is merged.

Accent to performance and allocations. See Benchmark.
Support local languages more naturally (removed regexps "a-zA-Z"). All regexps were replaced with string manipulations (fixes PR!7).
Extra performance improvement, reused approach Dmitry Sushchevsky - see PR!42.
Implemented new Process.ExtractAll method, see Issue!46.
Remove support of outdated/vulnerable platforms netcoreapp2.0;netcoreapp2.1;netstandard1.6.

Before:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 8.0.303
[Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2

Method Mean Error StdDev Gen0 Gen1 Allocated
Ratio1 269.77 ns 3.168 ns 2.809 ns 0.0505 - 320 B
Ratio2 44.11 ns 0.880 ns 0.903 ns 0.0318 - 200 B
PartialRatio 944.78 ns 18.218 ns 24.321 ns 0.5360 0.0019 3368 B
TokenSortRatio 1,508.79 ns 29.616 ns 30.414 ns 0.3529 - 2216 B
PartialTokenSortRatio 1,567.89 ns 30.935 ns 42.344 ns 0.4025 - 2536 B
TokenSetRatio 2,148.36 ns 42.070 ns 57.587 ns 0.6905 - 4352 B
PartialTokenSetRatio 2,498.55 ns 49.063 ns 73.435 ns 0.9308 - 5840 B
WeightedRatio 11,749.26 ns 168.137 ns 149.049 ns 2.1362 - 13482 B
TokenInitialismRatio1 454.24 ns 8.549 ns 9.845 ns 0.1440 - 904 B
TokenInitialismRatio2 372.71 ns 7.457 ns 12.459 ns 0.1173 - 736 B
TokenInitialismRatio3 858.16 ns 13.069 ns 10.913 ns 0.2470 - 1552 B
PartialTokenInitialismRatio 977.08 ns 6.698 ns 5.938 ns 0.3414 - 2144 B
TokenAbbreviationRatio 1,353.48 ns 8.884 ns 7.875 ns 0.4749 - 2984 B
PartialTokenAbbreviationRatio 1,612.71 ns 32.207 ns 30.127 ns 0.6199 - 3896 B

After:

Method Mean Error StdDev Gen0 Gen1 Allocated
Ratio1 246.82 ns 3.234 ns 3.025 ns 0.0162 - 104 B
Ratio2 16.02 ns 0.344 ns 0.436 ns - - -
PartialRatio 849.31 ns 13.055 ns 11.573 ns 0.3786 0.0010 2376 B
TokenSortRatio 651.52 ns 12.939 ns 25.237 ns 0.0896 - 568 B
PartialTokenSortRatio 676.59 ns 13.339 ns 13.698 ns 0.1154 - 728 B
TokenSetRatio 856.94 ns 16.701 ns 27.440 ns 0.3490 - 2200 B
PartialTokenSetRatio 1,127.99 ns 21.898 ns 30.698 ns 0.5112 - 3208 B
WeightedRatio 7,065.62 ns 141.109 ns 131.994 ns 0.8011 - 5072 B
TokenInitialismRatio1 145.67 ns 2.918 ns 3.243 ns 0.0625 - 392 B
TokenInitialismRatio2 127.54 ns 2.549 ns 4.531 ns 0.0548 - 344 B
TokenInitialismRatio3 260.34 ns 5.023 ns 7.203 ns 0.1106 - 696 B
PartialTokenInitialismRatio 352.64 ns 6.229 ns 6.923 ns 0.1845 - 1160 B
TokenAbbreviationRatio 624.04 ns 12.404 ns 17.789 ns 0.2508 - 1576 B
PartialTokenAbbreviationRatio 794.29 ns 13.812 ns 12.244 ns 0.3366 - 2112 B

ycherkes and others added 20 commits May 26, 2024 14:29
Updated .gitignore to include .vshistory/.
Upgraded NUnit3TestAdapter to 4.6.0.
Refactored TestScoringEmptyString method in RegressionTests.cs.
Simplified string formatting in ExtractedResult.cs.
Updated target framework to NET8.0 in FuzzySharp.csproj.
Simplified conditional logic in Levenshtein.cs.
Refactored StringPreprocessorFactory.cs for better string trimming and switch expression.
Used null-coalescing assignment in Process.cs methods.
Made several scorer classes sealed.
Removed unnecessary using directive in TokenAbbreviationScorerBase.cs.
Optimized scoring logic in TokenAbbreviationScorerBase.cs.
Changed several strategy classes to static.
Optimized score calculation in PartialRatioStrategy.cs.
Simplified Heap constructor and added null checks.
Removed redundant ToList calls and optimized permutation logic in Permutation.cs.
Simplified Cycles method in Permutation.cs.
target tests to netcore3.1,net8.0,netframework4.7.2
replace instance lambda with static
copied from JakeBayer#42
- Implemented new Process.ExtractAll method, see [Issue!46](JakeBayer#46).
- Added fastenstein to benchmarks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant