Re: [PATCH] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection

From: Andrea Righi

Date: Wed Mar 18 2026 - 06:31:35 EST


Hi Vincent,

On Wed, Mar 18, 2026 at 10:41:15AM +0100, Vincent Guittot wrote:
> On Wed, 18 Mar 2026 at 10:22, Andrea Righi <arighi@xxxxxxxxxx> wrote:
> >
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement. However, when those CPUs belong to SMT cores,
>
> Interesting, which kind of system has both SMT and SD_ASYM_CPUCAPACITY
> ? I thought both were never set simultaneously and SD_ASYM_PACKING was
> used for system involving SMT like x86

It's an NVIDIA platform (not publicly available yet), where the firmware
exposes different CPU capacities and has SMT enabled, so both
SD_ASYM_CPUCAPACITY and SMT are present. I'm not sure whether the final
firmware release will keep this exact configuration (there's a good chance
it will), so I'm targeting it to be prepared.

>
> > their effective capacity can be much lower than the nominal capacity
> > when the sibling thread is busy: SMT siblings compete for shared
> > resources, so a "high capacity" CPU that is idle but whose sibling is
> > busy does not deliver its full capacity. This effective capacity
> > reduction cannot be modeled by the static capacity value alone.
> >
> > Introduce SMT awareness in the asym-capacity idle selection policy: when
> > SMT is active prefer fully-idle SMT cores over partially-idle ones. A
> > two-phase selection first tries only CPUs on fully idle cores, then
> > falls back to any idle CPU if none fit.
> >
> > Prioritizing fully-idle SMT cores yields better task placement because
> > the effective capacity of partially-idle SMT cores is reduced; always
> > preferring them when available leads to more accurate capacity usage on
> > task wakeup.
> >
> > On an SMT system with asymmetric CPU capacities, SMT-aware idle
> > selection has been shown to improve throughput by around 15-18% for
> > CPU-bound workloads, running an amount of tasks equal to the amount of
> > SMT cores.
> >
> > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 24 +++++++++++++++++++++---
> > 1 file changed, 21 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0a35a82e47920..0f97c44d4606b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7945,9 +7945,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> > * the task fits. If no CPU is big enough, but there are idle ones, try to
> > * maximize capacity.
> > + *
> > + * When @smt_idle_only is true (asym + SMT), only consider CPUs on cores whose
> > + * SMT siblings are all idle, to avoid stacking and sharing SMT resources.
> > */
> > static int
> > -select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > +select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target,
> > + bool smt_idle_only)
> > {
> > unsigned long task_util, util_min, util_max, best_cap = 0;
> > int fits, best_fits = 0;
> > @@ -7967,6 +7971,9 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > if (!choose_idle_cpu(cpu, p))
> > continue;
> >
> > + if (smt_idle_only && !is_core_idle(cpu))
> > + continue;
> > +
> > fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> >
> > /* This CPU fits with all requirements */
> > @@ -8102,8 +8109,19 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > * capacity path.
> > */
> > if (sd) {
> > - i = select_idle_capacity(p, sd, target);
> > - return ((unsigned)i < nr_cpumask_bits) ? i : target;
> > + /*
> > + * When asym + SMT and the hint says idle cores exist,
> > + * try idle cores first to avoid stacking on SMT; else
> > + * scan all idle CPUs.
> > + */
> > + if (sched_smt_active() && test_idle_cores(target)) {
> > + i = select_idle_capacity(p, sd, target, true);
> > + if ((unsigned int)i >= nr_cpumask_bits)
> > + i = select_idle_capacity(p, sd, target, false);
>
> Can't you make it one pass in select_idle_capacity ?

Oh yes, absolutely, we can select the best-fit CPU in the same pass and use
it as a fallback if we can't find any fully-idle SMT CPU. I'll change that.

>
> > + } else {
> > + i = select_idle_capacity(p, sd, target, false);
> > + }
> > + return ((unsigned int)i < nr_cpumask_bits) ? i : target;
> > }
> > }
> >
> > --
> > 2.53.0
> >

Thanks,
-Andrea