bcachefs: don't hold srcu lock across blocking operations by matthiasgoergens · Pull Request #1069 · koverstreet/bcachefs

matthiasgoergens · 2026-02-27T17:13:39Z

Audit and fix 25 sites where bch2_trans_unlock() is used before a
potentially long-blocking operation, leaving the SRCU read lock held.
Under memory pressure this creates a reclaim deadlock: the shrinker
needs an SRCU grace period to free old btree nodes, but the grace period
can't complete because we're holding the read lock.

The fix at each site is to use bch2_trans_unlock_long() instead. Also
introduces a drop_locks_long_do() macro — the SRCU-dropping variant of
drop_locks_do() — and converts the callers that pass blocking
operations (mutex_lock, wait_event, journal reservation, etc.).

Also cleans up bch2_fsck_ask_yn(): the deferred unlock_long_at
workaround is no longer needed since we drop SRCU immediately before
waiting for user input.

Triggered by a full system deadlock on a 128 GB machine with bcachefs as
root on tiered SSD+HDD storage, where the reconcile thread held SRCU for
13+ seconds during GFP_KERNEL bio allocation in
bch2_data_update_init().

15 files changed, 25 sites fixed, net zero lines changed.

Investigation notes, full audit, GitHub issue survey, and a QEMU-based
reproducer are on the stacked branch.

Related: #934
Related: #936
Related: #882
Related: #636
Related: #605
Related: #826
Related: #811
Related: #1016

Copilot

Pull request overview

This pull request addresses a critical deadlock issue in bcachefs where SRCU read locks were held across long-blocking operations, preventing memory reclaim under pressure. Under low memory conditions, the shrinker needs SRCU grace periods to complete before freeing old btree nodes, but these grace periods cannot complete while read locks are held, creating a deadlock.

Changes:

Introduces drop_locks_long_do() macro as an SRCU-dropping variant of drop_locks_do()
Systematically replaces bch2_trans_unlock() with bch2_trans_unlock_long() at 25 sites before blocking operations
Simplifies bch2_fsck_ask_yn() by removing deferred unlock logic since SRCU is now dropped immediately

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
fs/bcachefs/btree/iter.h	Adds `drop_locks_long_do()` macro for SRCU-aware lock dropping
fs/bcachefs/vfs/fs.c	Drops SRCU before waiting on freeing inode (up to 10 seconds)
fs/bcachefs/vfs/buffered.c	Drops SRCU before bio allocation with GFP_KERNEL
fs/bcachefs/init/error.c	Drops SRCU before waiting for user input; simplifies timeout logic
fs/bcachefs/debug/tests.c	Drops SRCU before journal flush
fs/bcachefs/data/write.c	Drops SRCU before nocow bucket locking
fs/bcachefs/data/update.c	Drops SRCU before closure_sync (2 sites) and GFP_KERNEL allocation
fs/bcachefs/data/migrate.c	Drops SRCU before btree interior updates flush
fs/bcachefs/data/ec/io.c	Drops SRCU before stripe buffer allocation
fs/bcachefs/btree/read.c	Drops SRCU before btree node read IO
fs/bcachefs/btree/locking.c	Drops SRCU in mutex lock helper
fs/bcachefs/btree/interior.c	Drops SRCU before journal wait, gc.lock, btree interior updates (5 sites)
fs/bcachefs/btree/commit.c	Drops SRCU before journal reservation, GFP_KERNEL allocation, journal reclaim wait
fs/bcachefs/btree/cache.c	Drops SRCU before GFP_KERNEL allocations and btree node read (4 sites)
fs/bcachefs/alloc/foreground.c	Drops SRCU before mutex lock

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Audit and fix 25 sites where bch2_trans_unlock() (which only drops btree locks) is used before a potentially long-blocking operation, leaving the SRCU read lock held. If that blocking operation triggers page reclaim, and the shrinker needs to free old btree nodes, it must wait for the SRCU grace period -- which can't complete because we're holding the read lock. This creates a reclaim deadlock under memory pressure. The fix at each site is to use bch2_trans_unlock_long() (which also drops the SRCU read lock) instead of bch2_trans_unlock(). Also introduce a drop_locks_long_do() macro -- the SRCU-dropping variant of drop_locks_do() -- and convert the callers that pass blocking operations (mutex_lock, wait_event, journal reservation, etc.). Also clean up bch2_fsck_ask_yn(): the deferred unlock_long workaround (wait 2s then upgrade to unlock_long) is no longer needed since we now drop SRCU immediately before waiting for user input. Sites fixed: btree/cache.c 4 GFP_KERNEL allocs, sync I/O, wait_on_bit_io btree/commit.c 3 GFP_KERNEL alloc, journal reclaim wait, journal res btree/interior.c 5 mutex, closure_sync, wait_event, down_read btree/locking.c 1 mutex_lock btree/read.c 1 sync disk I/O alloc/foreground.c 1 mutex_lock data/update.c 3 GFP_KERNEL bio alloc, closure_sync (x2) data/ec/io.c 1 kvmalloc + multi-device I/O data/migrate.c 1 closure_wait_event for in-flight writes data/write.c 1 nocow lock + GFP_KERNEL vfs/buffered.c 1 filemap_alloc_folio + GFP_KERNEL vfs/fs.c 1 __wait_on_freeing_inode init/error.c 1 user input wait (+ cleanup) debug/tests.c 1 journal flush Reported on a 128 GB desktop (bcachefs root, tiered SSD+HDD): the reconcile thread held SRCU for 13+ seconds during GFP_KERNEL bio allocation in bch2_data_update_init(), blocking memory reclaim and causing a full system deadlock. Related: koverstreet#934 Related: koverstreet#636 Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com>

matthiasgoergens · 2026-03-08T11:51:52Z

I'm using the fixes from this branch together with the swap file support from the other PR on m desktop now. It's working well, and performance is great. Large background writes don't block my foreground activity anymore.

koverstreet · 2026-03-09T18:57:54Z

The approach of blanket-replacing bch2_trans_unlock() with bch2_trans_unlock_long() at all 25 sites is too aggressive. unlock_long() drops the SRCU read lock, which forces a full transaction restart on relock. In the common case, the blocking operation completes quickly and you don't need to drop SRCU at all.

What we want instead is a two-phase approach:

bch2_trans_unlock() first (drops btree locks, keeps SRCU)
Do the blocking operation
Only if we're still blocked after a couple seconds, escalate to bch2_trans_unlock_long() to let SRCU grace periods complete

This way the fast path doesn't pay the transaction restart cost, but the pathological case (blocked for 13+ seconds during a writeback storm) still resolves the SRCU deadlock.

Ironically, bch2_fsck_ask_yn() already had something like this with its unlock_long_at = jiffies + HZ * 2 pattern — the PR removes it and calls it a cleanup, but it was actually the right idea, just with a clunky implementation. A drop_locks_do variant that starts with unlock() and escalates to unlock_long() after a timeout would be the clean version of that pattern.

— ProofOfConcept

koverstreet · 2026-03-09T22:39:42Z

I'm also watching the pull requests too - can give me a shout if anything's unclear

Copilot AI review requested due to automatic review settings February 27, 2026 17:13

Copilot started reviewing on behalf of matthiasgoergens February 27, 2026 17:15 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

matthiasgoergens force-pushed the fix/drop-srcu-pr branch from bcd1641 to e7a3376 Compare February 28, 2026 14:22

matthiasgoergens mentioned this pull request Mar 1, 2026

bcachefs: Add support for shrinking filesystems #1070

Open

koverstreet mentioned this pull request Mar 9, 2026

bcachefs: swap file support via SWP_FS_OPS #1072

Draft

koverstreet force-pushed the master branch from 11271f2 to d251df1 Compare March 12, 2026 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcachefs: don't hold srcu lock across blocking operations#1069

bcachefs: don't hold srcu lock across blocking operations#1069
matthiasgoergens wants to merge 1 commit intokoverstreet:masterfrom
matthiasgoergens:fix/drop-srcu-pr

matthiasgoergens commented Feb 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

matthiasgoergens commented Mar 8, 2026

Uh oh!

koverstreet commented Mar 9, 2026

Uh oh!

koverstreet commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthiasgoergens commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

matthiasgoergens commented Mar 8, 2026

Uh oh!

koverstreet commented Mar 9, 2026

Uh oh!

koverstreet commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthiasgoergens commented Feb 27, 2026 •

edited

Loading