Re: [PATCH v3 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help
From: Ricardo Neri
Date: Sun May 17 2026 - 20:26:40 EST
On Fri, May 15, 2026 at 01:12:28PM -0700, Tim Chen wrote:
> On Thu, 2026-05-14 at 11:34 -0700, Ricardo Neri wrote:
> > In domains with asymmetric capacity, identifying misfit load in a
> > scheduling group is not useful when the destination CPU cannot help (i.e.,
> > its capacity exceeds the group's maximum CPU capacity by less than ~5%). In
> > such cases, it also prevents load balance among clusters of equal capacity
> > when CONFIG_SCHED_CLUSTER is enabled. This happens because
> > update_sd_pick_busiest() skips candidate groups of type misfit_task if the
> > destination CPU has similar capacity.
> >
> > Skipping misfit load accounting in this situation allows the group to be
> > classified as has_spare or fully_busy and lets load balancing proceed. Keep
> > marking scheduling groups as overloaded when misfit tasks are present. The
> > sg_overloaded flag propagates to the root domain and allows bigger CPUs in
> > it to help via newly idle balance.
> >
> > Reviewed-by: Christian Loehle <christian.loehle@xxxxxxx>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes in v3:
> > * Added Reviewed-by tag from Christian. Thanks!
> >
> > Changes in v2:
> > * Moved the check of the destination CPU capacity inside the code block
> > used for SD_ASYM_CPUCAPACITY. v1 inadvertently broke the mutual
> > exclusion of the sched_reduced_capacity() path.
> > * Keep marking the root domain as overloaded to allow bigger CPUs to
> > help. (sashiko)
> > * Fixed patch description to clarify that the capacity_greater() looks
> > for differences of 5% or more. (Christian)
> > * Reworded the patch description for clarity.
> > * I did not include the Reviewed-by tag from Christian since the patch
> > changed functionally.
> > ---
> > kernel/sched/fair.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e06e74d9ce0e..dcc02ceb44b5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10749,10 +10749,24 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> > continue;
> >
> > if (sd_flags & SD_ASYM_CPUCAPACITY) {
> > - /* Check for a misfit task on the cpu */
> > - if (sgs->group_misfit_task_load < rq->misfit_task_load) {
> > - sgs->group_misfit_task_load = rq->misfit_task_load;
> > + if (rq->misfit_task_load) {
> > + /*
> > + * Always mark the domain overloaded so big CPUs
> > + * can pick up misfit tasks via newly idle
> > + * balance.
> > + */
> > *sg_overloaded = 1;
> > +
> > + /*
> > + * Only account misfit load if @dst_cpu can
> > + * help; otherwise, the group may be classified
> > + * as misfit_task and update_sd_pick_busiest()
> > + * will skip it.
>
> You mean "sd_pick_busiest() will pick it" instead of "skip it" for misfit task
> load balancing in the above comment?
Thank you for your review!
I mean "skip it" because update_sd_pick_busiest() will skip a candidate group
of type misfit if dst_cpu has less than 1.05 times the max capacity of such group.
It is the first check in the function.
Skipping misfit accounting allows the candidate group to be classified as fully_
busy or has_spare so that tasks can be balanced between clusters of equal capacity.
I will rephrase this comment to make it more clear.