Re: [PATCH v3 4/4] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters
From: Tim Chen
Date: Fri May 15 2026 - 16:25:34 EST
On Thu, 2026-05-14 at 11:34 -0700, Ricardo Neri wrote:
> Some topologies have scheduling domains that contain CPUs of asymmetric
> capacity, grouped into two or more clusters of equal-capacity CPUs
> sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be
> balanced across these resource-sharing clusters.
>
> Do not clear SD_PREFER_SIBLING in the child domains to indicate to the
> load balancer that it should spread load among cluster siblings.
>
> Checks for capacity in update_sd_pick_busiest() prevent migrations from
> high- to low-capacity CPUs if a candidate group is not overloaded.
>
> An effect of keeping the SD_PREFER_SIBLING in domains with asymmetric
> capacity is that low-capacity clusters with spare capacity can now help
> overloaded higher-capacity groups. This was already the case for single-CPU
> groups (see calculate_imbalance() for domains with SD_SHARE_LLC).
>
> Once the overloading condition disappears, misfit load will still be used
> to move high-utilization tasks to bigger CPUs if they have spare capacity.
Looks good to me.
Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> ---
> Changes in v3:
> * Updated documentation of SD_PREFER_SIBLING.
> * Expanded the patch description to explain the behavior when overloaded
> groups are involved.
>
> Changes in v2:
> * Reworded the patch description for clarity.
> * Kept parentheses around bitwise operators for clarity.
> ---
> include/linux/sched/sd_flags.h | 3 ++-
> kernel/sched/topology.c | 14 ++++++++++++--
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> index 42839cfa2778..42f74af83b8c 100644
> --- a/include/linux/sched/sd_flags.h
> +++ b/include/linux/sched/sd_flags.h
> @@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
> * Prefer to place tasks in a sibling domain
> *
> * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
> - * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
> + * flag, but cleared below domains with SD_ASYM_CPUCAPACITY if the domain does
> + * not have clusters of CPUs sharing cache.
> *
> * NEEDS_GROUPS: Load balancing flag.
> */
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 5847b83d9d55..a1d048344ea1 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1723,8 +1723,18 @@ sd_init(struct sched_domain_topology_level *tl,
> /*
> * Convert topological properties into behaviour.
> */
> - /* Don't attempt to spread across CPUs of different capacities. */
> - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> + /*
> + * Don't attempt to spread across CPUs of different capacities.
> + *
> + * If the domain has clusters of CPUs sharing L2 cache, keep the flag to
> + * spread tasks across clusters of identical capacity. Checks in
> + * update_sd_pick_busiest() prevent task migrations from high- to low-
> + * capacity CPUs for non-overloaded groups. Migrations to a lower-
> + * capacity CPU can happen if a higher-capacity group is overloaded and
> + * a low-capacity cluster has spare capacity.
> + */
> + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child &&
> + !(sd->child->flags & SD_CLUSTER))
> sd->child->flags &= ~SD_PREFER_SIBLING;
>
> if (sd->flags & SD_SHARE_CPUCAPACITY) {