Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise
From: Dietmar Eggemann
Date: Wed Mar 25 2026 - 07:24:17 EST
On 25.03.26 10:32, Andrea Righi wrote:
> On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote:
>> On 24.03.26 12:01, Andrea Righi wrote:
>>> Hi Dietmar,
>>>
>>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote:
>>>> On 24.03.26 10:46, Andrea Righi wrote:
>>>>> Hi Christian,
>>>>>
>>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote:
>>>>>> On 3/24/26 07:55, Christian Loehle wrote:
>>>>>>> On 3/24/26 07:39, Vincent Guittot wrote:
>>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@xxxxxxxxxx> wrote:
>>
>> [...]
>>
>>>> The first time we observed this on NVIDIA Grace, we wondered whether
>>>> there might be functionality outside the task scheduler that makes use
>>>> of these slightly heterogeneous CPU capacity values from CPPC—and
>>>> whether the dependency on task scheduling was simply an overlooked
>>>> phenomenon.
>>>>
>>>> And then there was DCPerf Mediawiki on 72 CPUs system always scoring
>>>> better with sched_asym_cpucap_active() = TRUE (mentioned already by
>>>> Chris L. in:
>>>> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@xxxxxxx
>>>
>>> Yeah, I think Chris' asym-packing approach might be the safest thing to do.
>>>
>>> At the same time it would be nice to improve asym-capacity to introduce
>>> some concept of SMT awareness, that was my original attempt with
>>> https://lore.kernel.org/all/20260318092214.130908-1-arighi@xxxxxxxxxx,
>>> since we may see similar asym-capacity benefits on Vera (that has SMT,
>>> unlike Grace). What do you think?
>>
>> We never found a good way to specify a CPU capacity in the SMT case (EAS
>> and energy model included). So comparing CPU capacity w/ utilization, CPU
>> overutilization detection etc. definitions get more blurry.
>
> Hm... so should we just avoid calling select_idle_capacity() when SMT is
> enabled to prevent waking up tasks on both SMT siblings when there are
> fully-idle SMT cores?
Yeah, pretty much. So prefer (2) over (1).
IMHO, we do have a similar issue here. Can we say that a logical CPU is idle
if its SMT sibling isn't? But at least we don't have to use any CPU cap/util
comparison there.
select_idle_sibling()
8132 if (sched_smt_active()) {
8133 has_idle_core = test_idle_cores(target);
8134
8135 if (!has_idle_core && cpus_share_cache(prev, target)) { <-- (1)
8136 i = select_idle_smt(p, sd, prev);
8137 if ((unsigned int)i < nr_cpumask_bits)
8138 return i;
8139 }
8140 }
8141
8142 i = select_idle_cpu(p, sd, has_idle_core, target); <-- (2a)
8143 if ((unsigned)i < nr_cpumask_bits)
8144 return i
select_idle_cpu()
7926 for_each_cpu_wrap(cpu, cpus, target + 1) {
7927 if (has_idle_core) {
7928 i = select_idle_core(p, cpu, cpus, &idle_cpu); <-- (2b)
7929 if ((unsigned int)i < nr_cpumask_bits)
7930 return i;
7931
7932 } else {
7933 if (--nr <= 0)
7934 return -1;
7935 idle_cpu = __select_idle_cpu(cpu, p);
7936 if ((unsigned int)idle_cpu < nr_cpumask_bits)
7937 break;
7938 }
7939 }
[...]