Skip to content

--experimental_remote_cache_compression causes 3-5x higher Bazel server heap usage #18997

@jfirebaugh

Description

@jfirebaugh

Description of the bug:

Certain actions, possibly those that trigger a high number of remote cache CAS hits, lead to excessively high Bazel server memory use when using --experimental_remote_cache_compression. A build using --experimental_remote_cache_compression may result in 3-5x times higher JVM heap use compared to not using that flag.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I've minimized a repro here: https://github.com/jfirebaugh/bazel_remote_cache_compression

Checkout that repository, add .bazelrc with appropriate remote cache configuration, and then compare the output of:

  1. bazel clean && bazel shutdown && bazel build --memory_profile=memprof :binary && grep 'Build artifacts:heap:used' memprof
  2. bazel clean && bazel shutdown && bazel build --experimental_remote_cache_compression --memory_profile=memprof :binary && grep 'Build artifacts:heap:used' memprof

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

release 6.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

[email protected]:jfirebaugh/bazel_remote_cache_compression.git
master
fatal: ambiguous argument 'master': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fb76727c785f013541bb150120f2fecf14e79c58

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

High memory use with --experimental_remote_cache_compression was reported by another user on Bazel slack: https://bazelbuild.slack.com/archives/CA31HN1T3/p1646921767090359?thread_ts=1646911448.469939&cid=CA31HN1T3

Any other information, logs, or outputs that you want to share?

I have done some initial investigation using JFR memory profiling, and it looks like one possible cause is the following:

Stack Trace	Count	Percentage
void java.nio.HeapByteBuffer.<init>(int, int, MemorySegmentProxy)	504	34.4 %
   ByteBuffer java.nio.ByteBuffer.allocate(int)	504	34.4 %
   ByteBuffer com.github.luben.zstd.NoPool.get(int)	471	32.2 %
      void com.github.luben.zstd.ZstdInputStreamNoFinalizer.<init>(InputStream, BufferPool)	471	32.2 %
      void com.github.luben.zstd.ZstdInputStreamNoFinalizer.<init>(InputStream)	471	32.2 %
      void com.google.devtools.build.lib.remote.zstd.ZstdDecompressingOutputStream.<init>(OutputStream)	471	32.2 %
      ListenableFuture com.google.devtools.build.lib.remote.GrpcCacheClient.requestRead(RemoteActionExecutionContext, RemoteRetrier$ProgressiveBackoff, Digest, CountingOutputStream, Supplier, Channel)	471	32.2 %

requestRead allocates ZstdDecompressingOutputStream, which allocates ZstdInputStreamNoFinalizer, which uses NoPool to allocate via ByteBuffer.allocate. It appears the size allocated here is:

https://github.com/luben/zstd-jni/blob/e07f8970be0c72ce02dcdf1877daa034208915d0/src/main/native/decompress/zstd_decompress.c#L1668

Which if I've calculated correctly is 131 kB. And I assume this will allocate once for every in-flight cache read.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2We'll consider working on this in future. (Assignee optional)help wantedSomeone outside the Bazel team could own thisteam-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions