Re: [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper

From: K Prateek Nayak

Date: Mon Mar 16 2026 - 04:51:29 EST


Hello Dietmar,

On 3/16/2026 1:54 PM, Dietmar Eggemann wrote:
> Hi Prateek,
>
> On 16.03.26 04:41, K Prateek Nayak wrote:
>> Hello Dietmar,
>>
>> On 3/16/2026 5:48 AM, Dietmar Eggemann wrote:
>
> [...]
>
>> Indeed! "imb_numa_nr" only makes sense when looking at NUMA domains
>> and having it assigned to 1 for lower domains is harmless
>> (but wasteful indeed). I'm 99% sure we can simply do:
>>
>> (Only build tested)
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 43150591914b..e9068a809dbc 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -2623,9 +2623,6 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
>> else
>> imb = nr_llcs;
>>
>> - imb = max(1U, imb);
>> - sd_llc->parent->imb_numa_nr = imb;
>> -
>> /*
>> * Set span based on the first NUMA domain.
>> *
>> @@ -2639,10 +2636,14 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
>> while (parent && !(parent->flags & SD_NUMA))
>> parent = parent->parent;
>>
>> - imb_span = parent ? parent->span_weight : sd_llc->parent->span_weight;
>> + /* No NUMA domain to adjust imbalance for! */
>> + if (!parent)
>> + return;
>> +
>> + imb = max(1U, imb);
>> + imb_span = parent->span_weight;
>>
>> /* Update the upper remainder of the topology */
>> - parent = sd_llc->parent;
>> while (parent) {
>> int factor = max(1U, (parent->span_weight / imb_span));
>>
>> ---
>>
>> If we have NUMA domains, we definitely have NODE and NODE sets neither
>> SD_SHARE_LLC, nor SD_NUMA so likely sd->parent is PKG / NODE domain and
>> NUMA has to start at sd->parent->parent and it has to break at the first
>> SD_NUMA domains.
>>
>> If it doesn't exist, we don't have any NUMA domains and nothing to worry
>> about, and if we do, the final loop will adjust the NUMA imbalance.
>>
>> Thoughts? Again, this commit was kept 1:1 with the previous loop but we
>> can always improve :-)
> Ah, I see!
>
> This would work, IMHO.
>
> Tested on qemu-system-aarch64 w/
>
> -smp 8,sockets=2,clusters=2,cores=2,threads=1
>
> Are you aware of a setup in which PKG would survive between MC and
> lowest NUMA?

On x86, you can have:

-smp 8,sockets=2,dies=2,cores=2,threads=1

and each "die" will appear as an MC within the socket so we get

NUMA { 0-7 }
NODE { 0-3 } { 4-7 }
PKG { 0-3 } { 4-7 }
MC {0,1} {2,3} {4,5} {6,7}

In the above case, NODE is degenerated since it matches with PKG
and MC, PKG, NUMA survive at the end.

--
Thanks and Regards,
Prateek