Re: bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6
From: Stanislav Fomichev
Date: Tue Jun 02 2026 - 13:53:04 EST
On 06/02, Breno Leitao wrote:
> I am hitting a "suspicious RCU usage" lockdep splat from bnxt_en
> during a firmware reset on v7.1-rc6 (e43ffb69e043) with PROVE_RCU
> and PROVE_LOCKING enabled.
>
> The firmware reset path re-opens the device from bnxt_fw_reset_task()
> holding only the netdev instance lock (&dev->lock). bnxt_open() ->
> __bnxt_open_nic() -> bnxt_set_real_num_queues() ->
> netif_set_real_num_tx_queues() then walks the qdisc tree in
> dev_qdisc_change_real_num_tx(), which still dereferences dev->qdisc
> with rtnl_dereference() and therefore expects rtnl_lock to be held:
>
> void dev_qdisc_change_real_num_tx(struct net_device *dev,
> unsigned int new_real_tx)
> {
> struct Qdisc *qdisc = rtnl_dereference(dev->qdisc);
> ...
> }
>
> Since only the instance lock is held in this path, lockdep complains.
>
> Splat
> -----
>
> =============================
> WARNING: suspicious RCU usage
> 7.1.0 #1 Tainted: G E
> -----------------------------
> net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by kworker/u208:1/13:
> #0: ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_scheduled_works+0x8f7/0x13b0
> #1: ((work_completion)(&(&bp->fw_reset_task)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x917/0x13b0
> #2: (&dev->lock){+.+.}-{4:4}, at: bnxt_fw_reset_task+0x7ed/0x1e80
> stack backtrace:
> CPU: 38 UID: 0 PID: 13 Comm: kworker/u208:1 Tainted: G E
> Hardware name: Wiwynn Delta Lake MP/Delta Lake-Class1, BIOS Y3DL405 11/21/2025
> Workqueue: bnxt_pf_wq bnxt_fw_reset_task
> <TASK>
> dump_stack_lvl+0x69/0xa0
> lockdep_rcu_suspicious+0x13f/0x1d0
> dev_qdisc_change_real_num_tx+0x54/0xe0
> netif_set_real_num_tx_queues+0x4ed/0xa80
> __bnxt_open_nic+0x9cb/0x3490
> ? bnxt_hwrm_if_change+0x4fd/0x620
> bnxt_open+0x1cb/0x370
> bnxt_fw_reset_task+0x80d/0x1e80
> process_scheduled_works+0x9c1/0x13b0
> worker_thread+0x90d/0xd20
> kthread+0x320/0x3f0
> ret_from_fork+0x2b6/0xb00
> ret_from_fork_asm+0x11/0x20
> </TASK>
>
> Bisect / analysis
> -----------------
>
> This looks like a regression from:
>
> 850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the
> bnxt_open() path"")
>
> That revert dropped rtnl_lock() from the bnxt_open() paths and left
> bnxt_fw_reset_task() holding only the instance lock around bnxt_open().
>
> I can send a patch restoring rtnl_lock() around the bnxt_open() call in
> bnxt_fw_reset_task() (re-taking rtnl before the instance lock, matching the
> pre-revert code), but I wanted to report it first to make sure I am in the
> right direction.
>
> Please let me know how you would like to proceed.
I hope (but need to verify) that at this point most of the paths that assign
qdisc are under ops lock as well. So the alternative might be to use
netdev_ops_lock_dereference from Jakub's recent
https://lore.kernel.org/netdev/20260528231637.251822-1-kuba@xxxxxxxxxx/t/#m3de56e1d53f86b2c1b12ca29d1617dd6635e78cb