Re: [PATCH v2 2/5] workqueue: add WQ_AFFN_CACHE_SHARD affinity scope

From: Tejun Heo

Date: Mon Mar 23 2026 - 18:48:05 EST

Hello,

On Fri, Mar 20, 2026 at 10:56:28AM -0700, Breno Leitao wrote:
> +/**
> + * llc_count_cores - count distinct cores (SMT groups) within a cpumask
> + * @pod_cpus: the cpumask to scan (typically an LLC pod)
> + * @smt_pt: the SMT pod type, used to identify sibling groups
> + *
> + * A core is represented by the lowest-numbered CPU in its SMT group. Returns
> + * the number of distinct cores found in @pod_cpus.
> + */
> +static int __init llc_count_cores(const struct cpumask *pod_cpus,
> + struct wq_pod_type *smt_pt)
> +{
> + const struct cpumask *smt_cpus;
> + int nr_cores = 0, c;
> +
> + for_each_cpu(c, pod_cpus) {
> + smt_cpus = smt_pt->pod_cpus[smt_pt->cpu_pod[c]];
> + if (cpumask_first(smt_cpus) == c)
> + nr_cores++;
> + }
> +
> + return nr_cores;
> +}
> +
> +/**
> + * llc_cpu_core_pos - find a CPU's core position within a cpumask
> + * @cpu: the CPU to locate
> + * @pod_cpus: the cpumask to scan (typically an LLC pod)
> + * @smt_pt: the SMT pod type, used to identify sibling groups
> + *
> + * Returns the zero-based index of @cpu's core among the distinct cores in
> + * @pod_cpus, ordered by lowest CPU number in each SMT group.
> + */
> +static int __init llc_cpu_core_pos(int cpu, const struct cpumask *pod_cpus,
> + struct wq_pod_type *smt_pt)
> +{
> + const struct cpumask *smt_cpus;
> + int core_pos = 0, c;
> +
> + for_each_cpu(c, pod_cpus) {
> + smt_cpus = smt_pt->pod_cpus[smt_pt->cpu_pod[c]];
> + if (cpumask_test_cpu(cpu, smt_cpus))
> + break;
> + if (cpumask_first(smt_cpus) == c)
> + core_pos++;
> + }
> +
> + return core_pos;
> +}

Can you do the above two in a separate pass and record the results and then
use that to implement cpu_cache_shard_id()? Doing all of it on the fly makes
it unnecessarily difficult to follow and init_pod_type() is already O(N^2)
and the above makes it O(N^4). Make the machine large enough and this may
become noticeable.

> +/**
> + * cpu_cache_shard_id - compute the shard index for a CPU within its LLC pod
> + * @cpu: the CPU to look up
> + *
> + * Returns a shard index that is unique within the CPU's LLC pod. The LLC is
> + * divided into shards of at most wq_cache_shard_size cores, always split on
> + * core (SMT group) boundaries so that SMT siblings are never placed in
> + * different shards. Cores are distributed across shards as evenly as possible.
> + *
> + * Example: 36 cores with wq_cache_shard_size=8 gives 5 shards of
> + * 8+7+7+7+7 cores.
> + */

I always feel a bit uneasy about using max number as split point in cases
like this because the reason why you picked 8 as the default was that
testing showed shard sizes close to 8 seems to behave the best (or at least
acceptably in most cases). However, setting max number to 8 doesn't
necessarily keep you close to that. e.g. If there are 9 cores, you end up
with 5 and 4 even though 9 is a lot closer to the 8 that we picked as the
default. Can the sharding logic updated so that "whatever sharding that gets
the system closest to the config target?".

Thanks.

--
tejun