Re: [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer

From: Andrea Righi

Date: Fri Mar 27 2026 - 16:40:58 EST


On Fri, Mar 27, 2026 at 05:04:23PM +0530, K Prateek Nayak wrote:
> Hello Andrea,
>
> On 3/27/2026 3:14 PM, Andrea Righi wrote:
> > Hi Vincent,
> >
> > On Fri, Mar 27, 2026 at 09:45:56AM +0100, Vincent Guittot wrote:
> >> On Thu, 26 Mar 2026 at 16:12, Andrea Righi <arighi@xxxxxxxxxx> wrote:
> >>>
> >>> When choosing which idle housekeeping CPU runs the idle load balancer,
> >>> prefer one on a fully idle core if SMT is active, so balance can migrate
> >>> work onto a CPU that still offers full effective capacity. Fall back to
> >>> any idle candidate if none qualify.
> >>
> >> This one isn't straightforward for me. The ilb cpu will check all
> >> other idle CPUs 1st and finish with itself so unless the next CPU in
> >> the idle_cpus_mask is a sibling, this should not make a difference
> >>
> >> Did you see any perf diff ?
> >
> > I actually see a benefit, in particular, with the first patch applied I see
> > a ~1.76x speedup, if I add this on top I get ~1.9x speedup vs baseline,
> > which seems pretty consistent across runs (definitely not in error range).
> >
> > The intention with this change was to minimize SMT noise running the ILB
> > code on a fully-idle core when possible, but I also didn't expect to see
> > such big difference.
> >
> > I'll investigate more to better understand what's happening.
>
> Interesting! Either this "CPU-intensive workload" hates SMT turning
> busy (but to an extent where performance drops visibly?) or ILB
> keeps getting interrupted on an SMT sibling that is burdened by
> interrupts leading to slower balance (or IRQs driving the workload
> being delayed by rq_lock disabling them)
>
> Would it be possible to share the total SCHED_SOFTIRQ time, load
> balancing attempts, and utlization with and without the patch? I too
> will go queue up some runs to see if this makes a difference.

Quick update: I also tried this on a Vera machine with a firmware that
exposes the same capacity for all the CPUs (so with SD_ASYM_CPUCAPACITY
disabled and SMT still on of course) and I see similar performance
benefits.

Looking at SCHED_SOFTIRQ and load balancing attempts I don't see big
differences, all within error range (results produced using a vibe-coded
python script):

- baseline (stats/sec):

SCHED softirq count : 2,625
LB attempts (total) : 69,832

Per-domain breakdown:
domain0 (SMT):
lb_count (total) : 68,482 [balanced=68,472 failed=9]
CPU_IDLE : lb=1,408 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NEWLY_IDLE : lb=67,041 imb(load=0 util=0 task=7 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0
domain1 (MC):
lb_count (total) : 902 [balanced=900 failed=2]
CPU_NEWLY_IDLE : lb=869 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0
domain2 (NUMA):
lb_count (total) : 448 [balanced=441 failed=7]
CPU_NEWLY_IDLE : lb=415 imb(load=0 util=0 task=44 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=268 misfit=0) gained=0

- with ilb-smt (stats/sec):

SCHED softirq count : 2,671
LB attempts (total) : 68,572

Per-domain breakdown:
domain0 (SMT):
lb_count (total) : 67,239 [balanced=67,197 failed=41]
CPU_IDLE : lb=1,419 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NEWLY_IDLE : lb=65,783 imb(load=0 util=0 task=42 misfit=0) gained=1
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0
domain1 (MC):
lb_count (total) : 833 [balanced=833 failed=0]
CPU_NEWLY_IDLE : lb=796 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0
domain2 (NUMA):
lb_count (total) : 500 [balanced=488 failed=12]
CPU_NEWLY_IDLE : lb=463 imb(load=0 util=0 task=44 misfit=0) gained=0
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=627 misfit=0) gained=0

I'll add more direct instrumentation to check what ILB is doing
differently...

And I'll also repeat the test and collect the same metrics on the Vera
machine with the firmware that exposes different CPU capacities as soon as
I get access again.

Thanks,
-Andrea