Skip to content
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions include/xrpl/basics/MallocTrim.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#ifndef XRPL_BASICS_MALLOCTRIM_H_INCLUDED
#define XRPL_BASICS_MALLOCTRIM_H_INCLUDED

#include <xrpl/beast/utility/Journal.h>

#include <optional>
#include <string>

namespace ripple {

// -----------------------------------------------------------------------------
// Allocator interaction note:
// - This facility invokes glibc's malloc_trim(0) on Linux/glibc to request that
// ptmalloc return free heap pages to the OS.
// - If an alternative allocator (e.g. jemalloc or tcmalloc) is linked or
// preloaded (LD_PRELOAD), calling glibc's malloc_trim typically has no effect
// on the *active* heap. The call is harmless but may not reclaim memory
// because those allocators manage their own arenas.
// - Only glibc sbrk/arena space is eligible for trimming; large mmap-backed
// allocations are usually returned to the OS on free regardless of trimming.
// - Call at known reclamation points (e.g., after cache sweeps / online delete)
// and consider rate limiting to avoid churn.
// -----------------------------------------------------------------------------

struct MallocTrimReport
{
bool supported{false};
int trimResult{-1};
long rssBeforeKB{-1};
long rssAfterKB{-1};

[[nodiscard]] long
deltaKB() const noexcept
{
if (rssBeforeKB < 0 || rssAfterKB < 0)
return 0;
return rssAfterKB - rssBeforeKB;
}
};

/**
* @brief Attempt to return freed memory to the operating system.
*
* On Linux with glibc malloc, this issues ::malloc_trim(0), which may release
* free space from ptmalloc arenas back to the kernel. On other platforms, or if
* a different allocator is in use, this function is a no-op and the report will
* indicate that trimming is unsupported or had no effect.
*
* @param tag Optional identifier for logging/debugging purposes.
* @param journal Journal for diagnostic logging.
* @return Report containing before/after metrics and the trim result.
*
* @note If an alternative allocator (jemalloc/tcmalloc) is linked or preloaded,
* calling glibc's malloc_trim may have no effect on the active heap. The
* call is harmless but typically does not reclaim memory under those
* allocators.
*
* @note Only memory served from glibc's sbrk/arena heaps is eligible for trim.
* Large allocations satisfied via mmap are usually returned on free
* independently of trimming.
*
* @note Intended for use after operations that free significant memory (e.g.,
* cache sweeps, ledger cleanup, online delete). Consider rate limiting.
*/
MallocTrimReport
mallocTrim(std::optional<std::string> const& tag, beast::Journal journal);

} // namespace ripple

#endif
121 changes: 121 additions & 0 deletions src/libxrpl/basics/MallocTrim.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#include <xrpl/basics/Log.h>
#include <xrpl/basics/MallocTrim.h>

#include <boost/predef.h>

#include <cstdio>
#include <fstream>

#if defined(__GLIBC__) && BOOST_OS_LINUX
#include <malloc.h>
#include <unistd.h>

namespace {
pid_t const cachedPid = ::getpid();
} // namespace
#endif

namespace ripple {

namespace detail {

#if defined(__GLIBC__) && BOOST_OS_LINUX

long
parseVmRSSkB(std::string const& status)
{
std::istringstream iss(status);
std::string line;

while (std::getline(iss, line))
{
// Allow leading spaces/tabs before the key.
auto const firstNonWs = line.find_first_not_of(" \t");
if (firstNonWs == std::string::npos)
continue;

constexpr char key[] = "VmRSS:";
constexpr auto keyLen = sizeof(key) - 1;

// Require the line (after leading whitespace) to start with "VmRSS:".
// Check if we have enough characters and the substring matches.
if (firstNonWs + keyLen > line.size() ||
line.substr(firstNonWs, keyLen) != key)
continue;

// Move past "VmRSS:" and any following whitespace.
auto pos = firstNonWs + keyLen;
while (pos < line.size() &&
std::isspace(static_cast<unsigned char>(line[pos])))
{
++pos;
}

long value = -1;
if (std::sscanf(line.c_str() + pos, "%ld", &value) == 1)
return value;

// Found the key but couldn't parse a number.
return -1;
}

// No VmRSS line found.
return -1;
}

#endif // __GLIBC__ && BOOST_OS_LINUX

} // namespace detail

MallocTrimReport
mallocTrim(
[[maybe_unused]] std::optional<std::string> const& tag,
beast::Journal journal)
{
MallocTrimReport report;

#if !(defined(__GLIBC__) && BOOST_OS_LINUX)
JLOG(journal.debug()) << "malloc_trim not supported on this platform";
#else

report.supported = true;

if (journal.debug())
{
auto readFile = [](std::string const& path) -> std::string {
std::ifstream ifs(path);
if (!ifs.is_open())
return {};

Check warning on line 88 in src/libxrpl/basics/MallocTrim.cpp

View check run for this annotation

Codecov / codecov/patch

src/libxrpl/basics/MallocTrim.cpp#L88

Added line #L88 was not covered by tests
return std::string(
std::istreambuf_iterator<char>(ifs),
std::istreambuf_iterator<char>());
};

std::string const tagStr = tag.value_or("default");
std::string const statusPath =
"/proc/" + std::to_string(cachedPid) + "/status";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not just use /proc/self/...? Also, it seems we would need to do fewer parsing contortions if we read from /proc/self/statm instead of /proc/self/status


auto const statusBefore = readFile(statusPath);
report.rssBeforeKB = detail::parseVmRSSkB(statusBefore);

report.trimResult = ::malloc_trim(0);
Copy link
Collaborator

@pratikmankawde pratikmankawde Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest, instead of calling malloc_trim(0), which would try to trim up to the main heap's top boundary (leaving only a minimum amount, 1 page, which is usually 4 or 8 KB, link for other readers), we should instead call malloc_trim(m);, where m is the minimum amount of memory we expect we will need soon enough. It could range from few hundred MBs to few GBs. The description of this PR mentions a rate of about ~1.2 GB/h (on the higher side). So, I would suggest we keep m=~2.4GB for starters. The reports we are accumulating in this class can then help us fine tune it.

Why?
Memory allocation is an expensive operation in itself. In this case(after calling malloc_trim(0)) any new allocations would also require heap extension. This will be exceptionally expensive(by the order of 1000 times, involving user to kernel space context switch and then allocation of new pages by OS). We can avoid that by keeping few GBs in reserve.

The trim after NetworkOps -> SyncComplete could be one of the suboptimal calls, if we do any heavy operations after SyncComplete, requiring memory allocations exceeding 1 page.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note in the manual suggests that attempting to use the trim padding is wasted effort:

Only the main heap (using sbrk(2)) honors the pad argument; thread heaps do not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malloc_trim(m) call will check all the arenas (main and thread) to see if chunks of memory can be released. It will use the padding arg to decide(keep upto) the chunk that can be freed from main heap top.

For thread heaps(sub-heaps), malloc_trim anyway can't free the sub-heap or part of it(hence neglect padding), if there's even a small block in use. If a sub-heap region becomes completely empty after the last call to free(), allocator will anyway return the memory to OS. So padding doesn't make much sense for thread heaps. Thread heaps are anyway self-managed. So, we don't need to optimise for them. If there's fragmentation in the sub-heaps, that will remain until an entire sub-heap is empty(which will then be released).


auto const statusAfter = readFile(statusPath);
report.rssAfterKB = detail::parseVmRSSkB(statusAfter);

JLOG(journal.debug())
<< "malloc_trim tag=" << tagStr << " result=" << report.trimResult
<< " rss_before=" << report.rssBeforeKB << "kB"
<< " rss_after=" << report.rssAfterKB << "kB"
<< " delta=" << report.deltaKB() << "kB";
}
else
{
report.trimResult = ::malloc_trim(0);
}
#endif

return report;
}

} // namespace ripple
Loading