Re: [PATCH v4] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
From: Shuai Zhang
Date: Sun Mar 29 2026 - 22:11:11 EST
Hi Luiz
Thanks for the suggestion.
On 3/28/2026 1:51 AM, Luiz Augusto von Dentz wrote:
Hi Shuai,
On Fri, Mar 27, 2026 at 4:33 AM Shuai Zhang
<shuai.zhang@xxxxxxxxxxxxxxxx> wrote:
From: Shuai Zhang <quic_shuaz@xxxxxxxxxxx>https://sashiko.dev/#/patchset/20260327083258.1398450-1-shuai.zhang%40oss.qualcomm.com
When Bluetooth controller encounters a coredump, it triggers
the Subsystem Restart (SSR) mechanism. The controller first
reports the coredump data, and once the data upload is complete,
it sends a hw_error event. The host relies on this event to
proceed with subsequent recovery actions.
If the host has not finished processing the coredump data
when the hw_error event is received,
it sets a timer to wait until either the data processing is complete
or the timeout expires before handling the event.
The current implementation lacks a wakeup trigger. As a result,
even if the coredump data has already been processed, the host
continues to wait until the timer expires, causing unnecessary
delays in handling the hw_error event.
To fix this issue, adds a `wake_up_bit()` call after the host finishes
processing the coredump data. This ensures that the waiting thread is
promptly notified and can proceed to handle the hw_error event without
waiting for the timeout.
Test case:
- Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
- Use `btmon` to capture HCI logs.
- Observe the time interval between receiving the hw_error event
and the execution of the power-off sequence in the HCI log.
Signed-off-by: Shuai Zhang <quic_shuaz@xxxxxxxxxxx>
Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
---
Changes v4:
- add Acked-by signoff
- Link to v3
https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@xxxxxxxxxxx/
Changes v3:
- add Fixes tag
- Link to v2
https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@xxxxxxxxxxx/
Changes v2:
- Split timeout conversion into a separate patch.
- Clarified commit messages and added test case description.
- Link to v1
https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@xxxxxxxxxxx/
---
drivers/bluetooth/hci_qca.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index c17a462ae..228a754a9 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
qca->qca_memdump = NULL;
qca->memdump_state = QCA_MEMDUMP_COLLECTED;
cancel_delayed_work(&qca->ctrl_memdump_timeout);
- clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+ clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
clear_bit(QCA_IBS_DISABLED, &qca->flags);
mutex_unlock(&qca->hci_memdump_lock);
return;
@@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
kfree(qca->qca_memdump);
qca->qca_memdump = NULL;
qca->memdump_state = QCA_MEMDUMP_COLLECTED;
- clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+ clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
}
mutex_unlock(&qca->hci_memdump_lock);
--
2.34.1
Not saying the feedback is actually valid, but if there are other part
of the code still using clear_bit(QCA_MEMDUMP_COLLECTION then perhaps
they should be updated as well?
Only these two locations incorrectly use clear_bit instead of clear_and_wake_up_bit.
All other uses of QCA_MEMDUMP_COLLECTION only involve set_bit and test_bit.
Thanks,
Shuai