Skip to content

Conversation

@pamolloy
Copy link
Collaborator

@pamolloy pamolloy commented Feb 6, 2026

A customer reported a bug and proposed a fix for a kernel crash. The following stack trace is generated by simulating that crash.

I generated the patch in this PR based on the above. It will require more thorough review and testing. The comments are also fairly verbose and should likely get dropped before merging.

[  151.146581] Unable to handle kernel paging request at virtual address dead000000000108
[  151.154506] Mem abort info:
[  151.157169]   ESR = 0x96000044
[  151.160190]   EC = 0x25: DABT (current EL), IL = 32 bits
[  151.165497]   SET = 0, FnV = 0
[  151.168509]   EA = 0, S1PTW = 0
[  151.171639]   FSC = 0x04: level 0 translation fault
[  151.176498] Data abort info:
[  151.179357]   ISV = 0, ISS = 0x00000044
[  151.183185]   CM = 0, WnR = 1
[  151.186119] [dead000000000108] address between user and kernel address ranges
[  151.193281] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[  151.198789] Modules linked in: mod(OE) crct10dif_ce(E) spi_cadence_quadspi(E) fuse(E) xl4ptpextts(OE) overlay(E) adi_rpmsg(E) alap_ipc(OE)
[  151.211206] CPU: 0 PID: 2059 Comm: sim_handler Tainted: G           OE     5.15.78-yocto-standard #1
[  151.220315] Hardware name: MARS L1 US (DT)
[  151.224392] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  151.231337] pc : handler_fn+0x9c/0xcc [mod]
[  151.235501] Physical pc : 0x000000007aa4116c
[  151.239755] lr : handler_fn+0x98/0xcc [mod]
[  151.243921] sp : ffff80000d39be20
[  151.247219] Physical sp : 0x000000008759be20
[  151.251481] pmr_save: 000000e0
[  151.254511] x29: ffff80000d39be20 x28: 0000000000000000 x27: 0000000000000000
[  151.261631] x26: ffff000017f4fec0 x25: ffff800008cd17d8 x24: dead000000000122
[  151.268748] x23: dead000000000100 x22: ffff800000842078 x21: ffff00001b522e00
[  151.275866] x20: ffff800000843000 x19: ffff8000008433c0 x18: fffffffffffed110
[  151.282984] x17: ffff800015544000 x16: ffff800008004000 x15: 0000000000000030
[  151.290101] x14: 00000000000002cd x13: ffff80000d39bae0 x12: ffff800008c10b88
[  151.297219] x11: fffffffffffed110 x10: ffff800008c10b88 x9 : ffff800008bb8b88
[  151.304337] x8 : 00000000ffffefff x7 : ffff800008c10b88 x6 : 0000000000000000
[  151.311454] x5 : ffff00001dfa1710 x4 : 0000000000000000 x3 : 0000000000000000
[  151.318571] x2 : 0000000000000000 x1 : dead000000000100 x0 : dead000000000122
[  151.325690] Call trace:
[  151.328118]  handler_fn+0x9c/0xcc [mod]
[  151.331937]  kthread+0x144/0x150
[  151.335148]  ret_from_fork+0x10/0x20
[  151.338713] Code: b0000000 9103c000 95fdd7b7 a94002a1 (f9000420) 
[  151.344790] ---[ end trace d1f7840f61b087da ]---
[  151.361967] Kernel panic - not syncing: Oops: Fatal exception
[  151.367575] Kernel Offset: disabled
[  151.371013] CPU features: 0xc0000101,23380e02
[  151.375353] Memory Limit: none

A more trivial fix would be as follows and more clearly identifies the problem:

  diff --git a/drivers/dma/adi-dma.c b/drivers/dma/adi-dma.c                                             
  index b631b5e0aaa2b..xxxxxxxxxxxxx 100644              
  --- a/drivers/dma/adi-dma.c                                                                            
  +++ b/drivers/dma/adi-dma.c                            
  @@ -896,10 +896,11 @@ static irqreturn_t adi_dma_thread_handler(int irq, void *id)

        spin_lock_irqsave(&channel->lock, flags);

        if (channel->current_desc && channel->current_desc->cyclic) {
  +             struct dmaengine_result result = channel->current_desc->result;
                dmaengine_desc_get_callback(&channel->current_desc->tx, &cb);

                spin_unlock_irqrestore(&channel->lock, flags);
  -             dmaengine_desc_callback_invoke(&cb, &channel->current_desc->result);
  +             dmaengine_desc_callback_invoke(&cb, &result);
                return IRQ_HANDLED;
        }

@pamolloy pamolloy requested review from a team, jiez, mhennerich, nunojsa and ozan956 February 6, 2026 08:46
@pamolloy pamolloy added this to ADSP Feb 6, 2026
@pamolloy pamolloy added the bug label Feb 6, 2026
@nunojsa
Copy link
Collaborator

nunojsa commented Feb 6, 2026

A more trivial fix would be as follows and more clearly identifies the problem:

Agreed! Claude overdid it IMO. I would go with your proposed fix as something temporary... Looking at the driver it seems like a big refactor is needed anyways.

@pamolloy
Copy link
Collaborator Author

pamolloy commented Feb 6, 2026

A more trivial fix would be as follows and more clearly identifies the problem:

Agreed! Claude overdid it IMO. I would go with your proposed fix as something temporary... Looking at the driver it seems like a big refactor is needed anyways.

Yah, I dug a little deeper, especially since it removed the following comment:

	// Cyclic interrupts do not use the pending list to avoid complications
	// during dma termination

And after some prompting it concluded:

  1. Cyclic descriptors are added to cb_pending AND remain in current_desc                               
  2. Thread handler moves them to a local stack list and unlocks
  3. While processing the local list without locks, terminate_all() can free the descriptor via          
  current_desc                                           
  4. Both pointers reference the same memory - freeing one invalidates the other

  This is why the original comment warned "Cyclic interrupts do not use the pending list to avoid
  complications during dma termination."

  The minimal fix (copying result before unlocking) avoids this entirely. Alternatively, we need proper
  synchronization in adi_dma_synchronize() to wait for thread handlers before freeing.

Before I saw that comment I was confused why the cyclic and non-cyclic approaches were different.

@nunojsa
Copy link
Collaborator

nunojsa commented Feb 6, 2026

Alternatively, we need proper synchronization in adi_dma_synchronize() to wait for thread handlers before freeing.

Probably over complicating again 😄 . That driver looks like a typical out of tree driver so we'll have to refactor it a bit. So, IMO, I would go with your fix. Yeah, we have a copy in there but it's a small struct and I would not expect overhead at all

A race condition exists in the threaded interrupt handler when processing
cyclic DMA descriptors. The handler accesses channel->current_desc->result
after releasing the channel lock, creating a window where another CPU
executing adi_dma_terminate_all() can free the descriptor.

Race sequence:
  CPU 0 (thread handler)          CPU 1 (terminate_all)
  ----------------------          ---------------------
  Lock channel->lock
  Check current_desc->cyclic
  Get callback pointer
  Unlock channel->lock
                                  Lock channel->lock
                                  Free current_desc
                                  Unlock channel->lock
  Access current_desc->result  <- Use-after-free

This results in accessing freed memory containing list poison values
(0xdead000000000100), leading to kernel crashes.

Fix this by copying the result structure to a local variable while
still holding the lock, then using the copy after unlocking. This
follows the same safe pattern used in the non-cyclic path, which
already uses a local descriptor pointer.

The fix is minimal and avoids the complications warned about in the
original code comment regarding cyclic descriptors and the pending
list during termination.

Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Philip Molloy <philip.molloy@analog.com>
@pamolloy
Copy link
Collaborator Author

pamolloy commented Feb 6, 2026

Before I force push, here is the more complicated patch for future reference:

commit 4bd572081a1a1ffa8267b4c91608bfe318f8c698
Author: Philip Molloy <philip@philipmolloy.com>
Date:   Fri Feb 6 09:38:30 2026 +0100

    dmaengine: adi-dma: fix use-after-free in cyclic transfers
    
    A race condition exists in the threaded interrupt handler when processing
    cyclic DMA descriptors. The handler accesses channel->current_desc after
    releasing the channel lock, creating a window where another CPU executing
    adi_dma_terminate_all() can free the descriptor.
    
    Race sequence:
      CPU 0 (thread handler)          CPU 1 (terminate)
      ----------------------          -----------------
      Lock channel->lock
      Check current_desc->cyclic
      Get callback pointer
      Unlock channel->lock
                                      Lock channel->lock
                                      Free current_desc
                                      Set current_desc = NULL
                                      Unlock channel->lock
      Access current_desc->result  <- Use-after-free
    
    This results in accessing freed memory containing list poison values
    (0xdead000000000100), leading to kernel crashes.
    
    The original code had two separate paths: cyclic descriptors were
    handled via current_desc with special-case code, while non-cyclic
    descriptors used the cb_pending list. This created the vulnerability
    and added unnecessary complexity.
    
    Fix this by unifying both paths to use the cb_pending list:
    
    1. Modify __adi_dma_handler() to add both cyclic and non-cyclic
       descriptors to cb_pending. Cyclic descriptors remain in current_desc
       to continue running, but their callbacks are now queued safely.
    
    2. Rewrite adi_dma_thread_handler() to use a local list:
       - Lock once and move all cb_pending entries to a local list
       - Unlock immediately
       - Process all callbacks without holding locks
       - Handle cyclic vs non-cyclic differences (cookie completion
         and descriptor freeing) uniformly
    
    This approach eliminates the use-after-free by ensuring descriptor
    access happens through locally-held pointers rather than dereferencing
    through shared channel state after unlocking. It also improves
    performance by minimizing lock hold time and eliminating lock cycling.
    
    Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
    Signed-off-by: Philip Molloy <philip@philipmolloy.com>

diff --git a/drivers/dma/adi-dma.c b/drivers/dma/adi-dma.c
index b631b5e0aaa2b..a3a3e9e0a27a2 100644
--- a/drivers/dma/adi-dma.c
+++ b/drivers/dma/adi-dma.c
@@ -835,13 +835,15 @@ static irqreturn_t __adi_dma_handler(struct adi_dma_channel *channel,
 	desc->result.result = result;
 	desc->result.residue = 0;
 
-	// Cyclic interrupts do not use the pending list to avoid complications
-	// during dma termination
+	// Add all descriptors to cb_pending for unified processing in thread handler
+	list_add_tail(&desc->cb_node, &channel->cb_pending);
+
+	// Non-cyclic descriptors are removed from current_desc after completion
 	if (!desc->cyclic) {
-		list_add_tail(&desc->cb_node, &channel->cb_pending);
 		channel->current_desc = NULL;
 		__issue_pending(channel);
 	}
+	// Cyclic descriptors remain in current_desc and continue running
 
 done:
 	spin_unlock(&channel->lock);
@@ -892,33 +894,36 @@ static irqreturn_t adi_dma_thread_handler(int irq, void *id)
 	struct adi_dma_descriptor *desc;
 	struct dmaengine_desc_callback cb;
 	unsigned long flags;
+	LIST_HEAD(local_list);
 
 	spin_lock_irqsave(&channel->lock, flags);
 
-	if (channel->current_desc && channel->current_desc->cyclic) {
-		dmaengine_desc_get_callback(&channel->current_desc->tx, &cb);
+	// Move all pending callbacks to local list for processing without lock
+	if (!list_empty(&channel->cb_pending))
+		list_splice_tail_init(&channel->cb_pending, &local_list);
 
-		spin_unlock_irqrestore(&channel->lock, flags);
-		dmaengine_desc_callback_invoke(&cb, &channel->current_desc->result);
-		return IRQ_HANDLED;
-	}
+	spin_unlock_irqrestore(&channel->lock, flags);
 
-	while (!list_empty(&channel->cb_pending)) {
-		desc = list_first_entry(&channel->cb_pending, struct adi_dma_descriptor,
+	// Process all callbacks without holding the channel lock
+	while (!list_empty(&local_list)) {
+		desc = list_first_entry(&local_list, struct adi_dma_descriptor,
 			cb_node);
 		list_del(&desc->cb_node);
 
-		dma_cookie_complete(&desc->tx);
-		dmaengine_desc_get_callback(&desc->tx, &cb);
+		// Only complete cookie for non-cyclic descriptors
+		// Cyclic descriptors keep their cookie for status queries
+		if (!desc->cyclic)
+			dma_cookie_complete(&desc->tx);
 
-		spin_unlock_irqrestore(&channel->lock, flags);
+		dmaengine_desc_get_callback(&desc->tx, &cb);
 		dmaengine_desc_callback_invoke(&cb, &desc->result);
 
-		desc->tx.desc_free(&desc->tx);
-		spin_lock_irqsave(&channel->lock, flags);
+		// Only free non-cyclic descriptors
+		// Cyclic descriptors remain allocated and in current_desc
+		if (!desc->cyclic)
+			desc->tx.desc_free(&desc->tx);
 	}
 
-	spin_unlock_irqrestore(&channel->lock, flags);
 	return IRQ_HANDLED;
 }

@pamolloy pamolloy force-pushed the adsp-6.12.0/philip/adi-dma-locking-bug branch from 4bd5720 to fd506fb Compare February 6, 2026 10:42
@pamolloy pamolloy marked this pull request as ready for review February 6, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants