Explicit heap trimming #6022

vlntb · 2025-11-11T16:55:14Z

High Level Overview of Change

Introduces a MallocTrim helper in libxrpl to centralise calls to ::malloc_trim(0) on Linux/glibc and optionally record RSS before/after for debugging.
Wires mallocTrim into production paths:
- Application::doSweep: call after sweeps, when we’ve just freed a meaningful amount of heap.
- Online delete (SHAMapStore): call after clearCaches / prior-ledger cleanup, when online delete drops large chunks of in-memory SHAMap / ledger state.
- NetworkOPs sync completion: call when transitioning to FULL
  operating mode after initial sync, when sync-related temporary
  allocations have been freed.
On non-Linux or non-glibc builds, the helper reports supported = false and is effectively a no-op.

Context of Change

Long-running nodes on Linux/glibc showed steady Resident growth under heavier ledger-state scenarios.
24h baseline without malloc_trim:
- Resident: ~14.0 GB → ~ 32.9 GB (+18.6 GB, ~0.76 GB/h).
- Referenced: low teens → ~ 32.7 GB (~+29 GB, ~1.2 GB/h).
24h run with malloc_trim from Application::doSweep + online delete:
- Resident: ~18.3 GB → ~19.0 GB (+0.7 GB, ~0.03 GB/h).
- Referenced: ~18.0 GB → ~18.8 GB (+0.8 GB, ~0.03 GB/h).
Net effect over 24h:
- ~42% lower Resident/Referenced at the end of the run (~32.9 GB → ~19.0 GB).
- Growth rate drops from ~0.76 GB/h → ~0.03 GB/h (Resident) and ~1.2 GB/h → ~0.03 GB/h (Referenced), i.e. ~25–30× slower accumulation.
Over a longer ~40h trimmed run, we see ~147 GB returned to the OS (~3.7–3.8 GB/h) while steady-state Resident lives in an 18–20 GB band and the Resident–Referenced gap stays small (~0.2 GB). This suggests:
- malloc_trim is doing real work against fragmentation and short-lived churn.
- The remaining slow drift is driven by genuine long-lived working set, which will need separate follow-up (caches, data-structure sizing, protocol state), but the production hooks in this PR already give a clear and measurable win.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking change that only restructures code)
Performance (increase or change in throughput and/or latency)
Tests (you added tests for code that already exists, or your new feature included in this PR)
Documentation update
Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
Release

API Impact

Public API: New feature (new methods and/or new fields)
Public API: Breaking change (in general, breaking changes should only impact the next api_version)
libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
Peer protocol change (must be backward compatible or bump the peer protocol version)

codecov · 2025-11-13T17:38:41Z

Codecov Report

❌ Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.6%. Comparing base (ad37461) to head (645fdda).

Files with missing lines	Patch %	Lines
src/libxrpl/basics/MallocTrim.cpp	97.8%	1 Missing ⚠️
src/xrpld/app/main/Application.cpp	0.0%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #6022   +/-   ##
=======================================
  Coverage     78.6%   78.6%           
=======================================
  Files          818     820    +2     
  Lines        68976   69032   +56     
  Branches      8267    8234   -33     
=======================================
+ Hits         54190   54259   +69     
+ Misses       14786   14773   -13

Files with missing lines	Coverage Δ
include/xrpl/basics/MallocTrim.h	`100.0% <100.0%> (ø)`
src/xrpld/app/misc/NetworkOPs.cpp	`69.9% <100.0%> (+<0.1%)`	⬆️
src/xrpld/app/misc/SHAMapStoreImp.cpp	`76.1% <100.0%> (+0.7%)`	⬆️
src/libxrpl/basics/MallocTrim.cpp	`97.8% <97.8%> (ø)`
src/xrpld/app/main/Application.cpp	`68.5% <0.0%> (-0.1%)`	⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pratikmankawde · 2025-11-21T18:58:24Z

src/libxrpl/basics/MallocTrim.cpp

+        auto const statusBefore = readFile(statusPath);
+        report.rssBeforeKB = detail::parseVmRSSkB(statusBefore);
+
+        report.trimResult = ::malloc_trim(0);


I would suggest, instead of calling malloc_trim(0), which would try to trim up to the main heap's top boundary (leaving only a minimum amount, 1 page, which is usually 4 or 8 KB, link for other readers), we should instead call malloc_trim(m);, where m is the minimum amount of memory we expect we will need soon enough. It could range from few hundred MBs to few GBs. The description of this PR mentions a rate of about ~1.2 GB/h (on the higher side). So, I would suggest we keep m=~2.4GB for starters. The reports we are accumulating in this class can then help us fine tune it.

Why?
Memory allocation is an expensive operation in itself. In this case(after calling malloc_trim(0)) any new allocations would also require heap extension. This will be exceptionally expensive(by the order of 1000 times, involving user to kernel space context switch and then allocation of new pages by OS). We can avoid that by keeping few GBs in reserve.

The trim after NetworkOps -> SyncComplete could be one of the suboptimal calls, if we do any heavy operations after SyncComplete, requiring memory allocations exceeding 1 page.

This note in the manual suggests that attempting to use the trim padding is wasted effort:

Only the main heap (using sbrk(2)) honors the pad argument; thread heaps do not.

A malloc_trim(m) call will check all the arenas (main and thread) to see if chunks of memory can be released. It will use the padding arg to decide(keep upto) the chunk that can be freed from main heap top.

For thread heaps(sub-heaps), malloc_trim anyway can't free the sub-heap or part of it(hence neglect padding), if there's even a small block in use. If a sub-heap region becomes completely empty after the last call to free(), allocator will anyway return the memory to OS. So padding doesn't make much sense for thread heaps. Thread heaps are anyway self-managed. So, we don't need to optimise for them. If there's fragmentation in the sub-heaps, that will remain until an entire sub-heap is empty(which will then be released).

lmaisons

The points where we trim look reasonable. Unless there's some obvious caveat I'm not aware of, I think the RSS reporting should use the self and statm proc features so as to minimize parsing / clerical errors.

lmaisons · 2025-11-21T20:38:30Z

src/libxrpl/basics/MallocTrim.cpp

+
+        std::string const tagStr = tag.value_or("default");
+        std::string const statusPath =
+            "/proc/" + std::to_string(cachedPid) + "/status";


Could we not just use /proc/self/...? Also, it seems we would need to do fewer parsing contortions if we read from /proc/self/statm instead of /proc/self/status

lmaisons · 2025-11-21T20:49:49Z

src/libxrpl/basics/MallocTrim.cpp

+        auto const statusBefore = readFile(statusPath);
+        report.rssBeforeKB = detail::parseVmRSSkB(statusBefore);
+
+        report.trimResult = ::malloc_trim(0);


This note in the manual suggests that attempting to use the trim padding is wasted effort:

Only the main heap (using sbrk(2)) honors the pad argument; thread heaps do not.

vlntb added 8 commits November 5, 2025 21:01

testing malloc trim

ac5554e

encapsulate and instrument

3fdd42a

trim min internal

a1ed175

efficient call from doSweep and online delete

72b34e6

unit tests + refactore

52c8368

Merge branch 'develop' into vlntb/malloc-trim

2d41bfe

cleanup and notes

334382f

Merge branch 'develop' into vlntb/malloc-trim

d85f707

vlntb marked this pull request as ready for review November 13, 2025 14:12

vlntb requested a review from a team as a code owner November 13, 2025 14:12

fixing test

50d6065

vlntb added 7 commits November 14, 2025 12:11

malloc trim once orderbook update finished

ff8b435

add malloc trim after sync complete

2b2b361

load mode with relaxed ordering

efe7177

Merge branch 'develop' into vlntb/malloc-trim

6a8a1b7

remove untested

e77bd4e

Merge branch 'develop' into vlntb/malloc-trim

265ea4b

remove unused

645fdda

bthomee requested review from a1q123456, pratikmankawde and ximinez November 19, 2025 22:31

pratikmankawde reviewed Nov 21, 2025

View reviewed changes

bthomee requested a review from lmaisons November 21, 2025 20:11

lmaisons reviewed Nov 21, 2025

View reviewed changes

bthomee requested a review from g-ripple November 24, 2025 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explicit heap trimming #6022

Explicit heap trimming #6022

Uh oh!

vlntb commented Nov 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

pratikmankawde Nov 21, 2025 •

edited

Loading

Uh oh!

lmaisons Nov 21, 2025

Uh oh!

pratikmankawde Nov 26, 2025

Uh oh!

lmaisons left a comment

Uh oh!

lmaisons Nov 21, 2025

Uh oh!

lmaisons Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Explicit heap trimming #6022

Are you sure you want to change the base?

Explicit heap trimming #6022

Uh oh!

Conversation

vlntb commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High Level Overview of Change

Context of Change

Type of Change

API Impact

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pratikmankawde Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lmaisons Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pratikmankawde Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

lmaisons left a comment

Choose a reason for hiding this comment

Uh oh!

lmaisons Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

lmaisons Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vlntb commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 13, 2025 •

edited

Loading

pratikmankawde Nov 21, 2025 •

edited

Loading