Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

From: Tim Chen

Date: Wed Apr 08 2026 - 12:47:38 EST

On Wed, 2026-04-08 at 17:25 +0800, Chen, Yu C wrote:
> On 4/8/2026 4:35 AM, Tim Chen wrote:
> > On Fri, 2026-04-03 at 13:46 +0800, Chen, Yu C wrote:
> > > On 4/2/2026 7:06 PM, K Prateek Nayak wrote:
> > > > Hello Peter,
> > > >
> > > > On 4/2/2026 4:25 PM, Peter Zijlstra wrote:
> > > > > On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:
> > > > >
> > > > > > It is still not super clear to me how the logic deals with more than
> > > > > > 128CPUs in a DIE domain because that'll need more than the u64 but
> > > > > > sbm_find_next_bit() simply does:
> > > > > >
> > > > > > tmp = leaf->bitmap & mask; /* All are u64 */
> > > > > >
> > > > > > expecting just the u64 bitmap to represent all the CPUs in the leaf.
> > > > > >
> > > > > > If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_mask
> > > > > > as 7f (127) which allows a leaf to more than 64 CPUs but we are
> > > > > > using the "u64 bitmap" directly and not:
> > > > > >
> > > > > > find_next_bit(bitmap, arch_sbm_mask)
> > > > > >
> > > > > > Am I missing something here?
> > > > >
> > > > > Nope. That logic just isn't there, that was left as an exercise to the
> > > > > reader :-)
> > > >
> > > > Ack! Let me go fiddle with that.
> > > >
> > >
> > > Nice catch. I hadn't noticed this since we have fewer than
> > > 64 CPUs per die. Please feel free to send patches to me when
> > > they're available.
> > >
> > > And regarding your other question about the calculation of arch_sbm_shift,
> > > I'm trying to understand why there is a subtraction of 1, should it be:
> > > - arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN] - 1;
> > > + arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN - 1];
> >
> > Perhaps something like
> >
> > arch_sbm_shift = min(sizeof(unsigned long),
> > topology_get_domain_shift(TOPO_TILE_DOMAIN));
> >
> > to take care of both AMD system and the 64 bit leaf bitmask limit?
> >
>
> Yes, this should be doable (Prateek has mentioned using TOPO_TILE_DOMAIN).
> The only drawback I can think of is that if there are more than 64 CPUs
> within a die, it is possible CPUs in different dies (LLCs) be indexed in
> the same leaf and access the same mask,
>

First, I think I should have used
arch_sbm_shift = min(BITS_PER_LONG,
topology_get_domain_shift(TOPO_TILE_DOMAIN));

I am assuming that we should choose TOPO_DIE_DOMAIN for Intel CPUs and
TOPO_TILE_DOMAIN for AMD CPUs. And the assumption is that such domain
choice will span one L3 (I think that's the case).

Then leaf domains smaller than the
domain size will also only span one L3 by definition. So for the 128 CPUs
example you gave, both leaves with CPU
0-63 and 64-127 will span the same LLC and we should not have cache
bounce.

Tim

> which would still lead to cache
> contention. Maybe we should allocate the leaf cpumask according to the
> actual size of a die?
>
> thanks,
> Chenyu
>
>