Skip to content

Conversation

lge
Copy link
Member

@lge lge commented Oct 16, 2025

For force_umount=safe, we can significantly speed up get_pids() by limiting the search for significant symlinks
to the process and skipping the tasks (or threads).

We are interested in /proc/<pid>/{cwd,root,exe} and /proc/<pid>/fd/<fd> as well as memory mappings.
All of these are per process, not per thread. We can save us a lot of time and effort by not scanning
/proc/<pid>/taks/<tid>/*, we'd only to find identical information there.

Even on a mostly idle system with just a few "heavily threaded" processes,
this can speed up the scanning by a factor of 10.

With "modern" linux kernels, we can also drop the "grep" in maps, we already found the symlinks in map_files/.

lge added 2 commits October 16, 2025 10:58
With force_umount=safe, we "manually" scan the /proc/ file system.

We look for symlinks pointing into the path we are interested in.
Specifically, we are interested in
  /proc/<pid>/{root,exe,cwd}
  /proc/<pid>/fd/<fd>
We also look for relevant memory mappings in /proc/<pid>/maps

All these are per process, not per "task" or "thread".
see procfs(5) and pthreads(7).
Still, we currently also scan /proc/<pid>/task/<tid>/
for all the same things.

With a large system with many heavily threaded processes,
this can significantly slow down this scanning,
without gaining new information.

Adding -maxdepth to the find command line avoids this useless work,
potentially reducing the scanning time by orders of magnitute
on systems with many heavily threaded processes.

We could also write a dedicated helper in C to do the very same thing,
with the option to "short circuit" and proceed with the next pid
as soon as the first "match" is found for the currently inspected pid.

That could further reduce the scanning time
by about an additional factor of 10.
If we have /proc/<pid>/map_files/* symlinks,
we don't need to additionally grep /proc/<pid>/maps.

Also don't first collect output of commands into variables
just to pipe them to sort -u later,
just pipe the output of the commands through sort -u directly.
@lge lge requested a review from oalbrigt October 16, 2025 09:44
@oalbrigt oalbrigt changed the title speed up get pids Filesystem: speed up get pids Oct 16, 2025
Copy link
Contributor

@oalbrigt oalbrigt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants