Re: [PATCH 4/5] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity

From: Andrea Righi

Date: Sat May 16 2026 - 05:04:55 EST


Hi Shrikanth,

On Fri, May 15, 2026 at 03:39:55PM +0530, Shrikanth Hegde wrote:
> On 5/9/26 11:37 PM, Andrea Righi wrote:
> > When SD_ASYM_CPUCAPACITY load balancing considers pulling a misfit task,
> > capacity_of(dst_cpu) can overstate available compute if the SMT sibling is
> > busy: the core does not deliver its full nominal capacity.
> >
> > If SMT is active and dst_cpu is not on a fully idle core, skip this
> > destination so we do not migrate a misfit expecting a capacity upgrade we
> > cannot actually provide.
> >
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Christian Loehle <christian.loehle@xxxxxxx>
> > Cc: Koba Ko <kobak@xxxxxxxxxx>
> > Cc: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> > Reported-by: Felix Abecassis <fabecassis@xxxxxxxxxx>
> > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6f0835c15ee11..2ddba8bd27e59 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9693,6 +9693,7 @@ struct lb_env {
> > int dst_cpu;
> > struct rq *dst_rq;
> > + bool dst_core_idle;
> > struct cpumask *dst_grpmask;
> > int new_dst_cpu;
> > @@ -10918,10 +10919,16 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > * We can use max_capacity here as reduction in capacity on some
> > * CPUs in the group should either be possible to resolve
> > * internally or be covered by avg_load imbalance (eventually).
> > + *
> > + * When SMT is active, only pull a misfit to dst_cpu if it is on a
> > + * fully idle core; otherwise the effective capacity of the core is
> > + * reduced and we may not actually provide more capacity than the
> > + * source.
> > */
> > if ((env->sd->flags & SD_ASYM_CPUCAPACITY) &&
> > (sgs->group_type == group_misfit_task) &&
> > - (!capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
> > + (!env->dst_core_idle ||
> > + !capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
> > sds->local_stat.group_type != group_has_spare))
> > return false;
> > @@ -11485,6 +11492,8 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> > unsigned long sum_util = 0;
> > bool sg_overloaded = 0, sg_overutilized = 0;
> > + env->dst_core_idle = !sched_smt_active() || is_core_idle(env->dst_cpu);
> > +
> > do {
> > struct sg_lb_stats *sgs = &tmp_sgs;
> > int local_group;
>
>
> This is kind of similar to what ASYM_PACKING would have done at MC domain with
> equal CPU capacities. i.e pull the load if the core is idle.

I think that's right, semantically "only pull a misfit to dst_cpu if its core is
idle" is essentially the same heuristics that SD_ASYM_PACKING ends up doing at
MC: prefer destinations on cores that can actually deliver their nominal
capacity. With equal per-CPU priorities the asym_packing path collapses to
"prefer the idle core", which is essentially what this patch enforces for the
misfit case.

>
> In your table in the cover-letter, if you do "NO ASYM + SIS_UTIL + ASYM_PACKING (at MC)"
> does it achieve close to "ASYM + SMT + SIS_UTIL"?

Christian already explored the "NO ASYM_CPUCAPACITY + SD_ASYM_PACKING" idea
(https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@xxxxxxx).

I gave it a spin on Vera at the time. Summarizing the numbers I reported on that
thread (all vs. baseline = default SD_ASYM_CPUCAPACITY, no SMT awareness, on my
CPU-bound workload):
- SD_ASYM_PACKING at MC (Christian's RFC): ~1.5x speedup
- equalize capacities within +/-5% (NO_ASYM): ~1.6x speedup
- SMT-aware SD_ASYM_CPUCAPACITY (PATCH 3/5): ~1.7x speedup

So SD_ASYM_PACKING seems to help, but not as much as NO_ASYM baseline (even if
it's pretty close) or this series.

I think the structural reason is that ASYM_PACKING at MC only fixes destination
selection in load balance, it doesn't change select_idle_capacity() /
asym_fits_cpu() on the wakeup path, where I think most of the placement
decisions actually happen in this case.

Thanks,
-Andrea