Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue

From: Luiz Augusto von Dentz

Date: Mon May 18 2026 - 16:48:31 EST


Hi Heitor,

On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
<halves@xxxxxxxxxx> wrote:
>
> Hi Luiz,
>
> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> > Hi Heitor,
> >
> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> > <halves@xxxxxxxxxx> wrote:
> >>
> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> >> workqueue while it's being drained. This can happen during device reset or
> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >> >>
> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >> >>
> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> >> queued for transmission.
> >> >>
> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> >> Reported-by: syzbot+97721dd81f792e838ba0@xxxxxxxxxxxxxxxxxxxxxxxxx
> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@xxxxxxxxxx>
> >> >> ---
> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> >> 1 file changed, 18 insertions(+)
> >> >>
> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> >> --- a/net/bluetooth/hci_core.c
> >> >> +++ b/net/bluetooth/hci_core.c
> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >> >>
> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >> >>
> >> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> >> + kfree_skb(skb);
> >> >> + return;
> >> >> + }
> >> >> +
> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
> >> >>
> >> >> queue_work(hdev->workqueue, &hdev->tx_work);
> >> >>
> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
> >>
> >> I see, I missed the RCU guards for the device flags. Sorry about that,
> >> I'll add them to v2.
> >> Thanks for the catch!
> >
> > Actually this whole thing might be because we try to clean the
> > workqueue without actually closing the hdev, so I suspect that if we
> > just remove all the code from hci_dev_do_reset with
> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
> > how things work over the MGMT interface:
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index c46c1236ebfa..6782bbc9b6a7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> > hci_req_sync_lock(hdev);
> >
> > - /* Drop queues */
> > - skb_queue_purge(&hdev->rx_q);
> > - skb_queue_purge(&hdev->cmd_q);
> > -
> > - /* Cancel these to avoid queueing non-chained pending work */
> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > - /* Wait for
> > - *
> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > - *
> > - * inside RCU section to see the flag or complete scheduling.
> > - */
> > - synchronize_rcu();
> > - /* Explicitly cancel works in case scheduled after setting the flag. */
> > - cancel_delayed_work(&hdev->cmd_timer);
> > - cancel_delayed_work(&hdev->ncmd_timer);
> > -
> > - /* Avoid potential lockdep warnings from the *_flush() calls by
> > - * ensuring the workqueue is empty up front.
> > - */
> > - drain_workqueue(hdev->workqueue);
> > -
> > - hci_dev_lock(hdev);
> > - hci_inquiry_cache_flush(hdev);
> > - hci_conn_hash_flush(hdev);
> > - hci_dev_unlock(hdev);
> > -
> > - if (hdev->flush)
> > - hdev->flush(hdev);
> > -
> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > -
> > - atomic_set(&hdev->cmd_cnt, 1);
> > - hdev->acl_cnt = 0;
> > - hdev->sco_cnt = 0;
> > - hdev->le_cnt = 0;
> > - hdev->iso_cnt = 0;
> > + ret = hci_dev_close_sync(hdev);
> > + if (ret)
> > + goto done;
> >
> > - ret = hci_reset_sync(hdev);
> > + ret = hci_dev_open_sync(hdev);
> >
> > +done:
> > hci_req_sync_unlock(hdev);
> > return ret;
> > }
> >
> > Seem to work here and as a side effect we get notified over the MGMT
> > when a user uses hciconfig hci0 reset:
> >
> > # tools/hciconfig hci0 reset
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> > has been disabled
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> > has been enabled
> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>
> Thank you for the review, and for the suggestion! There seems to be a
> number of similar syzbot reports for the hdev workqueue, so maybe the
> close+open approach is a simpler solution indeed. I've originally
> considered implementing a wrapper/helper for workqueue submission in
> hci_core.c, but if we can eliminate the race condition altogether that'd
> be even better.
>
> My only concern is that there seem to be slight differences between
> what hci_dev_do_reset() does now and what we'd get with
> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
> suitable for the stable kernels? And if not, do you think the checks
> from v1 would be appropriate, in that case?

I think, the HCI_Reset is a heavy hammer because it's a global reset.
Getting all the pieces to cooperate has been causing all sort of
issues, and in any case I think what hci_dev_do_reset does is simply
wrong or outdated; for instance it doesn't seem to affect LE, such as
cancelling scanning or advertising. This is likely because interfaces
like MGMT don't use it and require a power cycle to reset.

--
Luiz Augusto von Dentz