Skip to content

[#2601][FOLLOWUP] fix(spark): Release segmentPermits first to avoid deadlock#2737

Merged
zuston merged 1 commit intoapache:masterfrom
wForget:hotfix
Mar 5, 2026
Merged

[#2601][FOLLOWUP] fix(spark): Release segmentPermits first to avoid deadlock#2737
zuston merged 1 commit intoapache:masterfrom
wForget:hotfix

Conversation

@wForget
Copy link
Copy Markdown
Member

@wForget wForget commented Mar 5, 2026

What changes were proposed in this pull request?

Release segmentPermits first to avoid deadlock

Why are the changes needed?

#2601 caused ShuffleReadClientImplTest to hang, see https://github.com/apache/uniffle/actions/runs/22700481546/job/65816264462?pr=2736

Threads dump:

"main" #1 prio=5 os_prio=31 tid=0x0000000152014000 nid=0x1a03 waiting on condition [0x000000016b5a9000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076c844f78> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3334)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at org.apache.uniffle.client.response.DecompressedShuffleBlock.getByteBuffer(DecompressedShuffleBlock.java:62)
	at org.apache.uniffle.client.response.DecompressedShuffleBlock.getUncompressLength(DecompressedShuffleBlock.java:53)
	at org.apache.uniffle.client.impl.DecompressionWorker.get(DecompressionWorker.java:165)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:329)
	at org.apache.uniffle.client.TestUtils.validateResult(TestUtils.java:55)
	at org.apache.uniffle.client.impl.ShuffleReadClientImplTest.readTest7(ShuffleReadClientImplTest.java:357)
"decompressionWorker-0" #340 daemon prio=5 os_prio=31 tid=0x000000011b921000 nid=0x1481f waiting on condition [0x0000000329772000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076abc0630> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
	at org.apache.uniffle.client.impl.DecompressionWorker.lambda$add$0(DecompressionWorker.java:102)
	at org.apache.uniffle.client.impl.DecompressionWorker$$Lambda$490/0x00000008006a0028.get(Unknown Source)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1604)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

@wForget
Copy link
Copy Markdown
Member Author

wForget commented Mar 5, 2026

cc @zuston

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.82%. Comparing base (b324cc3) to head (d9dfc71).

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #2737       +/-   ##
=============================================
+ Coverage          0   50.82%   +50.82%     
- Complexity        0     3327     +3327     
=============================================
  Files             0      537      +537     
  Lines             0    25843    +25843     
  Branches          0     2357     +2357     
=============================================
+ Hits              0    13134    +13134     
- Misses            0    11861    +11861     
- Partials          0      848      +848     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Test Results

 3 151 files  + 2 401   3 151 suites  +2 401   6h 38m 25s ⏱️ + 1h 58m 33s
 1 246 tests + 1 018   1 244 ✅ + 1 017   1 💤 ±0  0 ❌ ±0  1 🔥 +1 
15 725 runs  +12 975  15 709 ✅ +12 974  15 💤 ±0  0 ❌ ±0  1 🔥 +1 

For more details on these errors, see this check.

Results for commit d9dfc71. ± Comparison against base commit b324cc3.

Copy link
Copy Markdown
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@zuston zuston merged commit 2f0b954 into apache:master Mar 5, 2026
39 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants