Re: [PATCH v2] sched/rt: Have RT_PUSH_IPI be default off for non PREEMPT_RT

From: Valentin Schneider

Date: Mon May 18 2026 - 04:47:48 EST


On 15/05/26 10:37, Steven Rostedt wrote:
> A live lock occurred on a workload that was doing heavy networking traffic
> on a large machine where the softirqs would run 500us out of 750us. And it
> would also be waking up RT tasks, causing the RT pull logic to be
> constantly executed.
>
> When a softirq triggered on a CPU with RT tasks queued but not running
> yet, and the other CPUs would see this CPU as being overloaded, they would
> send an IPI over to it. The CPU would notice that the waiting RT tasks are
> of higher priority than the currently running task and simply schedule
> that CPU instead. But because the softirq was executing, before it could
> schedule, it would receive another IPI to do the same. The amount of IPIs
> would slow down the currently running softirq so much that before it could
> return back to task context, it would execute another softirq never
> allowing the CPU to schedule. This live locked that CPU.
>

I got a bit confused here, please correct me if I didn't get this right:

Per handle_softirqs(), we can't restart the softirq handling loop if
need_resched() is true (which would be the case here, per what we'd have
done it push_rt_task(@pull=true)).
I thought this meant softirqs couldn't be the issue, however this is only
valid within the scope of a single handle_softirqs() invocation.

AIUI here we're being hammered by IRQs, thus under this sort of pattern:

<IRQ>
__irq_exit_rcu()
invoke_softirq()
handle_softirqs()
// handle NET_RX_SOFTIRQ
// need_resched() is true so don't loop
</IRQ>

// Barely any progress made here towards actually executing __schedule()

<IRQ>
__irq_exit_rcu()
invoke_softirq()
handle_softirqs()
// handle NET_RX_SOFTIRQ and wake up some tasks
// need_resched() is true so don't loop
</IRQ>

& repeat ad nauseam. Did I get this right?