Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT

From: Daniel Vacek

Date: Wed Mar 18 2026 - 06:31:11 EST

On Thu, 26 Feb 2026 18:29:27 +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-18 13:50:14 [+0100], Bert Karwatzki wrote:
> > Am Mittwoch, dem 18.02.2026 um 08:30 +0100 schrieb Sebastian Andrzej Siewior:
> > > On 2026-02-17 20:10:09 [+0100], Bert Karwatzki wrote:
> > > >
> > > > I tried to research the original commit which introduced the xmit_lock_owner check, but
> > > > it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
> > > > so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
> > > > not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).
> > >
> > > Okay. Thank you. I add it to my list.
> > >
> > I've thought about it again and I now think the xmit_lock_owner check IS necessary to
> > avoid deadlocks on recursion in the non-RT case.
> >
> > My idea to use get_current()->tgid as lock owner also does not work as interrupts are still enabled
> > and __dev_queue_xmit() can be called from interrupt context. So in a situation where an interrupt occurs
> > after the lock has been taken and the interrupt handler calls __dev_queue_xmit() on the same CPU a deadlock
> > would occur.
>
> The warning happens because taskA on cpuX goes through
> HARD_TX_LOCK(), gets preempted and then taskB on cpuX wants also to send
> send a packet. The second one throws the warning.
>
> We could ignore this check because a deadlock will throw a warning and
> "halt" the task that runs into the deadlock.
> But then we could be smart about this in the same way !RT is and
> pro-active check for the simple deadlock. The lock owner of
> netdev_queue::_xmit_lock is recorded can be checked vs current.
>
> The snippet below should work. I need to see if tomorrow this is still a
> good idea.
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6ff4256700e60..de342ceb17201 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> /* Other cpus might concurrently change txq->xmit_lock_owner
> * to -1 or to their cpu id, but not to our id.
> */
> - if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> + if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {

Ain't this changing the behavior for !RT case? Previously, if it was the same thread
which has already locked the queue (and hence the same CPU) evaluating this condition,
the condition was skipped, which is no longer the case with this change.

How about something like this instead?:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d99b0fbc1942..27d090b03493 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -705,7 +705,7 @@ struct netdev_queue {
struct dql dql;
#endif
spinlock_t _xmit_lock ____cacheline_aligned_in_smp;
- int xmit_lock_owner;
+ struct task_struct *xmit_lock_owner;
/*
* Time (in jiffies) of last Tx
*/
@@ -4709,7 +4709,7 @@ static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
{
spin_lock(&txq->_xmit_lock);
/* Pairs with READ_ONCE() in __dev_queue_xmit() */
- WRITE_ONCE(txq->xmit_lock_owner, cpu);
+ WRITE_ONCE(txq->xmit_lock_owner, current);
}

static inline bool __netif_tx_acquire(struct netdev_queue *txq)
@@ -4727,7 +4727,7 @@ static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
{
spin_lock_bh(&txq->_xmit_lock);
/* Pairs with READ_ONCE() in __dev_queue_xmit() */
- WRITE_ONCE(txq->xmit_lock_owner, smp_processor_id());
+ WRITE_ONCE(txq->xmit_lock_owner, current);
}

static inline bool __netif_tx_trylock(struct netdev_queue *txq)
@@ -4736,7 +4736,7 @@ static inline bool __netif_tx_trylock(struct netdev_queue *txq)

if (likely(ok)) {
/* Pairs with READ_ONCE() in __dev_queue_xmit() */
- WRITE_ONCE(txq->xmit_lock_owner, smp_processor_id());
+ WRITE_ONCE(txq->xmit_lock_owner, current);
}
return ok;
}
@@ -4744,14 +4744,14 @@ static inline bool __netif_tx_trylock(struct netdev_queue *txq)
static inline void __netif_tx_unlock(struct netdev_queue *txq)
{
/* Pairs with READ_ONCE() in __dev_queue_xmit() */
- WRITE_ONCE(txq->xmit_lock_owner, -1);
+ WRITE_ONCE(txq->xmit_lock_owner, NULL);
spin_unlock(&txq->_xmit_lock);
}

static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
{
/* Pairs with READ_ONCE() in __dev_queue_xmit() */
- WRITE_ONCE(txq->xmit_lock_owner, -1);
+ WRITE_ONCE(txq->xmit_lock_owner, NULL);
spin_unlock_bh(&txq->_xmit_lock);
}

diff --git a/net/core/dev.c b/net/core/dev.c
index ccef685023c2..f62bffb8edf6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4817,7 +4817,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
/* Other cpus might concurrently change txq->xmit_lock_owner
* to -1 or to their cpu id, but not to our id.
*/
- if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
+ if (READ_ONCE(txq->xmit_lock_owner) != current) {
if (dev_xmit_recursion())
goto recursion_alert;

@@ -11178,7 +11178,7 @@ static void netdev_init_one_queue(struct net_device *dev,
/* Initialize queue lock */
spin_lock_init(&queue->_xmit_lock);
netdev_set_xmit_lockdep_class(&queue->_xmit_lock, dev->type);
- queue->xmit_lock_owner = -1;
+ queue->xmit_lock_owner = NULL;
netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
queue->dev = dev;
#ifdef CONFIG_BQL

Daniel

> if (dev_xmit_recursion())
> goto recursion_alert;
>
>
> > Bert Karwatzki
>
> Sebastian