Re: [PATCH v2 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity
From: K Prateek Nayak
Date: Tue May 19 2026 - 03:51:13 EST
Hello Andrea,
Thank you for taking a look at the diff!
On 5/19/2026 12:13 PM, Andrea Righi wrote:
> Hi Prateek,
>
> On Tue, May 19, 2026 at 11:22:32AM +0530, K Prateek Nayak wrote:
>> Hello Peter, Andrea,
>>
>> On 5/19/2026 2:28 AM, Peter Zijlstra wrote:
>>> @@@ -2775,20 -3049,16 +3107,15 @@@ build_sched_domains(const struct cpumas
>>> if (!sd)
>>> continue;
>>>
>>> + if (has_asym)
>>> - asym_claimed = claim_asym_sched_domain_shared(&d, i);
>>> ++ claim_asym_sched_domain_shared(&d, i);
>>> +
>>> /* First, find the topmost SD_SHARE_LLC domain */
>>> while (sd->parent && (sd->parent->flags & SD_SHARE_LLC))
>>> sd = sd->parent;
>>>
>>> if (sd->flags & SD_SHARE_LLC) {
>>> - /*
>>> - * Initialize the sd->shared for SD_SHARE_LLC unless
>>> - * the asym path above already claimed it.
>>> - */
>>> - if (!asym_claimed)
>>> - init_sched_domain_shared(&d, sd);
>>> - int sd_id = cpumask_first(sched_domain_span(sd));
>>> -
>>> - sd->shared = *per_cpu_ptr(d.sds, sd_id);
>>> - atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight);
>>> - atomic_inc(&sd->shared->ref);
>>> ++ init_sched_domain_shared(&d, sd);
>>
>> This will run into a small problem with "nr_idle_scan" if
>> cpumask_first(sched_domain_span(sd)) is the same for both sd_asym and
>> sd_llc.
>
> Ah, good catch! When cpumask_first(asym_span) == cpumask_first(llc_span)
> (big.LITTLE typical case), both sd_asym->shared and sd_llc->shared would alias
> to d->sds[0].
>
>>
>> Load balancer at different domains will populate "nr_idle_scan" with
>> different values and they alias to same ->shared if one isn't
>> degenerated and I believe there is at least one way to hit the WARN_ON()
>> from cpu_attach_domain() if the SD_ASYM_CPUCAPACITY_FULL comes before
>> the last SD_SHARE_LLC domain and the latter is degenerated.
>>
>> How about this:
>>
>> (On top of queue:sched/core; Lightly tested on !ASYM_CPUCAPACITY system)
>>
>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>> index fe09d3268bc9..1d2c98dca211 100644
>> --- a/include/linux/sched/topology.h
>> +++ b/include/linux/sched/topology.h
>> @@ -67,7 +67,15 @@ struct sched_domain_shared {
>> atomic_t ref;
>> atomic_t nr_busy_cpus;
>> int has_idle_cores;
>> - int nr_idle_scan;
>> + union {
>> + int nr_idle_scan;
>> + /*
>> + * Used during allocation to claim the
>> + * sched_domain_shared object at
>> + * multiple levels.
>
> I think between build and the first LB tick, readers of nr_idle_scan may observe
> leftover SD_* flags in nr_idle_scan. This shouldn't be a problem and should
> self-heal soon, but maybe it's worth a comment? Something like:
>
> * Note: between build and the first periodic LB tick, which
> * rewrites the union via update_idle_cpu_scan(), readers of
> * nr_idle_scan may observe the transient SD_* flag value as
> * the scan bound. The flag bits are small positive integers,
> * so the effect is just a slightly relaxed scan bound for one
> * window and self-heals on the first tick.
Ack! We start with 0 today which isn't representative of the system
state either and depend on the eventual correctness to fix the value
after a hotplug / cpuset.
I can fold in the note and resend it as a formal patch.
Peter, would you prefer a formal patch or would you like to do this
(or something similar) as a part of the conflict resolution itself?
>> + BUG_ON(!sd->shared);
>
> Unreachable in practice, but should we have a WARN_ON_ONCE() +
> bail/early-return? In this way we'd fall back to using LLC's shared for
> sd_balance_shared, which seems nicer than a BUG_ON().
Ack! We can just use the last CPU's "sds" if we don't end up finding a
free one as a backup. I just had the BUG_ON() to easily spot my VM
crashing ;-)
--
Thanks and Regards,
Prateek