Improve throughput performance at compression level 9 #1280
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch specifically addresses issue observed at level 9, it doesn't have adverse impact at other levels.
Following are the performance numbers in terms of throughput (bytes processed per sec) for various inputs on latest Xeon server.
Runtime 10 seconds
Compression level 9
Background
Children Self Shared Object Command
The detail stack trace shows following,
Children Self Command Shared Object Symbol
This gave clear indication of major cycles spent due to "page-faults". Collecting "perf stat" showed below stats,
$ perf stat -- ./bench -q 9 -c 1 index.html
Tested file index.html; size: 29329
Threads: 1, alg: brotli, quality 9
Total times compressed: 1716; compressed size: 7476
Compression speed:4.80 MiB
Performance counter stats for './bench -q 9 -c 1 index.html':
With suggested change page faults dropped considerably improving the performance.
Tested file index.html; size: 29329
Threads: 1, alg: brotli, quality 9
Total times compressed: 16109; compressed size: 7476
Compression speed:45.06 MiB
Performance counter stats for './bench -q 9 -c 1 index.html':
And majority cycles are spent in the application instead of kernel managing memory (mmap/munmap).
Children Self Shared Object
Environment:
OS: Ubuntu 24.04.2 LTS
Kernel: 6.8.0-58-generic
GCC: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Glibc: ldd (Ubuntu GLIBC 2.39-0ubuntu8.4) 2.39