[PATCH net-next v3 1/5] netconsole: do not schedule skb pool refill from NMI
From: Breno Leitao
Date: Thu Jun 04 2026 - 12:21:35 EST
When alloc_skb() fails in find_skb(), the fallback path dequeues an skb
from np->skb_pool and unconditionally calls schedule_work() to top the
pool back up. schedule_work() ends up taking the workqueue pool locks,
which are not NMI-safe.
netconsole_write() is registered as the nbcon write_atomic callback and
is explicitly marked CON_NBCON_ATOMIC_UNSAFE, meaning it is invoked from
emergency/panic contexts including NMIs. If the NMI interrupts a thread
already holding the workqueue pool lock, calling schedule_work()
self-deadlocks and the panic message that was being printed is lost.
Introduce netcons_skb_pop() to fold the pool dequeue and the refill
request into a single helper. The helper skips schedule_work() when
called from NMI context; the pool is best-effort, so the refill is simply
deferred to the next non-NMI find_skb() call that exhausts alloc_skb()
and hits the fallback again. This keeps the fast path untouched and the
locking rules around the fallback pool documented in one place.
Note this only removes the schedule_work() hazard from the NMI path. The
allocation itself is still not fully NMI-safe: the alloc_skb(GFP_ATOMIC)
attempted first may take slab locks, and the skb_dequeue() fallback takes
np->skb_pool.lock, so either can deadlock if the NMI interrupts a holder
of those locks. Closing those windows requires an NMI-safe (lockless) skb
pool and is left to a follow-up; this patch addresses the schedule_work()
deadlock, which is both the most likely and the easiest to trigger.
Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
drivers/net/netconsole.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 8ecc2c71c699..918e4a9f4456 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -1654,6 +1654,23 @@ static struct notifier_block netconsole_netdev_notifier = {
.notifier_call = netconsole_netdev_event,
};
+/* Pop a pre-allocated skb from the pool and request a refill.
+ *
+ * The refill is requested via schedule_work(), which takes the workqueue
+ * pool locks and is therefore not NMI-safe. Skip the refill when called
+ * from NMI context; the next non-NMI caller will top the pool back up.
+ */
+static struct sk_buff *netcons_skb_pop(struct netpoll *np)
+{
+ struct sk_buff *skb;
+
+ skb = skb_dequeue(&np->skb_pool);
+ if (!in_nmi())
+ schedule_work(&np->refill_wq);
+
+ return skb;
+}
+
static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
{
int count = 0;
@@ -1663,10 +1680,8 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
repeat:
skb = alloc_skb(len, GFP_ATOMIC);
- if (!skb) {
- skb = skb_dequeue(&np->skb_pool);
- schedule_work(&np->refill_wq);
- }
+ if (!skb)
+ skb = netcons_skb_pop(np);
if (!skb) {
if (++count < 10) {
--
2.53.0-Meta