Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll

From: Dragos Tatulea

Date: Wed Apr 29 2026 - 04:14:27 EST


On Tue, Apr 28, 2026 at 05:31:54PM -0700, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 20:04:13 -0400 Martin Karsten wrote:
> > On 2026-04-28 19:40, Jakub Kicinski wrote:
> > > On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> > >> Under certain conditions a queue can be left out with interrupts
> > >> disabled and with the napi re-scheduling timer permanently stopped.
> > >> This behaviour is triggered by the napi busy poll path when
> > >> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
> > >> operations:
> > >>
> > >> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
> > >> from the timer.
> > >>
> > >> 2. During napi poll, driver disables interrupts due to being in poll
> > >> mode (napi_complete_done() returns false because napi->state has
> > >> NAPIF_STATE_IN_BUSY_POLL set).
> > >
> > > Why does the driver have IRQs disabled in busy poll?
> >
> > The problems occurs in irq deferral mode when both gro-flush-timeout and
> > defer-hard-irqs are nonzero and NIC interrupts are disabled.
>
> Okay.
>
> > >> 3. At the end of the busy poll (busy_poll_stop()):
> > >> 3.1 napi timer is scheduled and skip_schedule is set (due to config)
> > >> 3.2 napi->poll() is called:
> > >> - driver poll() processes exactly budget packets
> > >> and exits early => napi not scheduled.
> > >> (interrupts are still disabled at this point)
> > >> 3.3 Since napi poll processed budget packets, __busy_poll_stop()
> > >> is called with skip_schedule set => napi is not scheduled here
> > >> either.
> > >
> > > with skip_schedule it calls:
> > >
> > > clear_bit(NAPI_STATE_SCHED, &napi->state);
> > >
> > >> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
> > >> or some other reason, the timer will run with no effect (due to
> > >> NAPI_STATE_SCHED being set).
> > >
> > > And here you claim STATE_SCHED is still set?
> >
> > Labelling this with number 4. might be misleading, sorry! The concern is
> > that a short enough timer (compared to the duration of the driver poll)
> > can be triggered before the NAPI_STATE_SCHED bit is cleared at the end
> > of Step 3.3.
>
> Ah. Just say that :D Two pages of buggy text, y'all would have been
> better off using this one paragraph as the commit message.
> Please don't use AI for generating commit messages if that's the cause.
> It really is spectacularly shit at it.
I take the blame for this. Funnily enough, the text was written mostly
without AI... Just wanted to present the interactions in a more explanatory
way.

Do you prefer the short version from Martin or an improved version of
the long explanation?

Thanks,
Dragos