-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Description of the bug:
Certain actions, possibly those that trigger a high number of remote cache CAS hits, lead to excessively high Bazel server memory use when using --experimental_remote_cache_compression
. A build using --experimental_remote_cache_compression
may result in 3-5x times higher JVM heap use compared to not using that flag.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I've minimized a repro here: https://github.com/jfirebaugh/bazel_remote_cache_compression
Checkout that repository, add .bazelrc with appropriate remote cache configuration, and then compare the output of:
bazel clean && bazel shutdown && bazel build --memory_profile=memprof :binary && grep 'Build artifacts:heap:used' memprof
bazel clean && bazel shutdown && bazel build --experimental_remote_cache_compression --memory_profile=memprof :binary && grep 'Build artifacts:heap:used' memprof
Which operating system are you running Bazel on?
macOS
What is the output of bazel info release
?
release 6.2.1
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
[email protected]:jfirebaugh/bazel_remote_cache_compression.git
master
fatal: ambiguous argument 'master': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fb76727c785f013541bb150120f2fecf14e79c58
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
High memory use with --experimental_remote_cache_compression
was reported by another user on Bazel slack: https://bazelbuild.slack.com/archives/CA31HN1T3/p1646921767090359?thread_ts=1646911448.469939&cid=CA31HN1T3
Any other information, logs, or outputs that you want to share?
I have done some initial investigation using JFR memory profiling, and it looks like one possible cause is the following:
Stack Trace Count Percentage
void java.nio.HeapByteBuffer.<init>(int, int, MemorySegmentProxy) 504 34.4 %
ByteBuffer java.nio.ByteBuffer.allocate(int) 504 34.4 %
ByteBuffer com.github.luben.zstd.NoPool.get(int) 471 32.2 %
void com.github.luben.zstd.ZstdInputStreamNoFinalizer.<init>(InputStream, BufferPool) 471 32.2 %
void com.github.luben.zstd.ZstdInputStreamNoFinalizer.<init>(InputStream) 471 32.2 %
void com.google.devtools.build.lib.remote.zstd.ZstdDecompressingOutputStream.<init>(OutputStream) 471 32.2 %
ListenableFuture com.google.devtools.build.lib.remote.GrpcCacheClient.requestRead(RemoteActionExecutionContext, RemoteRetrier$ProgressiveBackoff, Digest, CountingOutputStream, Supplier, Channel) 471 32.2 %
requestRead
allocates ZstdDecompressingOutputStream
, which allocates ZstdInputStreamNoFinalizer
, which uses NoPool
to allocate via ByteBuffer.allocate
. It appears the size allocated here is:
Which if I've calculated correctly is 131 kB. And I assume this will allocate once for every in-flight cache read.