[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

From: Tim JH Chen(陳仁鴻)

Date: Wed May 13 2026 - 04:44:45 EST


Date: Wed, 13 May 2026 09:21:40 +0800
Subject: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM
 suspend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.

The root cause is two-fold:

1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
   the TX work-queue items (by clearing txq->que_started and waiting on
   txq->tx_processing). It does NOT signal the kthread and does NOT
   update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.

2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
   is only checked at the top of each loop iteration. If the thread
   already passed this guard, it proceeds unconditionally to call
   pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
   also held (or contended) by the system PM suspend path.

The result is a spinlock deadlock observed as:

  watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
  RIP: _raw_spin_unlock_irqrestore
  Call Trace:
    __pm_runtime_resume+0x5b/0x80
    t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]

The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds the PM lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.

Fix by three coordinated changes:

- In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
  after stopping the TX queue, then call wake_up() so any sleeping thread
  re-evaluates the wait_event condition and stops.

- In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
  re-enabling the TX queues, symmetric with the suspend change.
  Without this the kthread would never wake up after resume.

- In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
  immediately before pm_runtime_resume_and_get() to close the TOCTOU
  window between the wait_event guard and the pm call.

Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).

Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <tim.jh.chen@xxxxxxxxxx>
---
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c    | 3 +++
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6..315a77e24 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,8 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
        struct dpmaif_ctrl *dpmaif_ctrl = param;

        t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+       dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+       wake_up(&dpmaif_ctrl->tx_wq);
        t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
        t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
        t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
        if (!dpmaif_ctrl)
                return 0;

+       dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
        t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
        t7xx_dpmaif_enable_irq(dpmaif_ctrl);
        t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf..d5a5befec 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
                                break;
                }

+               if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
+                       continue;
+
                ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
                if (ret < 0 && ret != -EACCES)
                        return ret;
--
2.25.1
"
This email and any attachments are intended for the sole use of the named recipient(s) and may contain confidential, proprietary, privileged or copyrighted information. If you are not the intended recipient, please delete immediately. Do not read, copy, or forward this email or any attachments.
"

Attachment: 0001-net-wwan-t7xx-fix-race-between-TX-thread-and-system-.patch
Description: 0001-net-wwan-t7xx-fix-race-between-TX-thread-and-system-.patch