Re: [PATCH 08/17] sched_ext: Add scx_bpf_cid_override() kfunc

From: Andrea Righi

Date: Wed Apr 29 2026 - 10:10:16 EST

Hi Tejun,

On Tue, Apr 28, 2026 at 10:35:36AM -1000, Tejun Heo wrote:
> The auto-probed cid mapping reflects the kernel's view of topology
> (node -> LLC -> core), but a BPF scheduler may want a different layout -
> to align cid slices with its own partitioning, or to work around how the
> kernel reports a particular machine.
>
> Add scx_bpf_cid_override(), callable from ops.init() of the root
> scheduler. It validates the caller-supplied cpu->cid array and replaces
> the in-place mapping; topo info is invalidated. A compat.bpf.h wrapper
> silently no-ops on kernels that lack the kfunc.
>
> A new SCX_KF_ALLOW_INIT bit in the kfunc context filter restricts the
> kfunc to ops.init() at verifier load time.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Reviewed-by: Cheng-Yang Chou <yphbchou0911@xxxxxxxxx>
...
> +/**
> + * scx_bpf_cid_override - Install an explicit cpu->cid mapping
> + * @cpu_to_cid: array of nr_cpu_ids s32 entries (cid for each cpu)
> + * @cpu_to_cid__sz: must be nr_cpu_ids * sizeof(s32) bytes
> + * @aux: implicit BPF argument to access bpf_prog_aux hidden from BPF progs
> + *
> + * May only be called from ops.init() of the root scheduler. Replace the
> + * topology-probed cid mapping with the caller-provided one. Each possible cpu
> + * must map to a unique cid in [0, num_possible_cpus()). Topo info is cleared.
> + * On invalid input, trigger scx_error() to abort the scheduler.
> + */
> +__bpf_kfunc void scx_bpf_cid_override(const s32 *cpu_to_cid, u32 cpu_to_cid__sz,
> + const struct bpf_prog_aux *aux)
> +{
> + cpumask_var_t seen __free(free_cpumask_var) = CPUMASK_VAR_NULL;
> + struct scx_sched *sch;
> + bool alloced;
> + s32 cpu, cid;
> +
> + /* GFP_KERNEL alloc must happen before the rcu read section */
> + alloced = zalloc_cpumask_var(&seen, GFP_KERNEL);
> +
> + guard(rcu)();
> +
> + sch = scx_prog_sched(aux);
> + if (unlikely(!sch))
> + return;
> +
> + if (!alloced) {
> + scx_error(sch, "scx_bpf_cid_override: failed to allocate cpumask");
> + return;
> + }
> +
> + if (scx_parent(sch)) {
> + scx_error(sch, "scx_bpf_cid_override() only allowed from root sched");
> + return;
> + }
> +
> + if (cpu_to_cid__sz != nr_cpu_ids * sizeof(s32)) {
> + scx_error(sch, "scx_bpf_cid_override: expected %zu bytes, got %u",
> + nr_cpu_ids * sizeof(s32), cpu_to_cid__sz);
> + return;
> + }
> +
> + for_each_possible_cpu(cpu) {
> + s32 c = cpu_to_cid[cpu];
> +
> + if (!cid_valid(sch, c))
> + return;
> + if (cpumask_test_and_set_cpu(c, seen)) {
> + scx_error(sch, "cid %d assigned to multiple cpus", c);
> + return;
> + }
> + scx_cpu_to_cid_tbl[cpu] = c;
> + scx_cid_to_cpu_tbl[c] = cpu;
> + }
> +
> + /* Invalidate stale topo info - the override carries no topology. */
> + for (cid = 0; cid < num_possible_cpus(); cid++)
> + scx_cid_topo[cid] = SCX_CID_TOPO_NEG;

Considering that the topology info is wiped when scx_bpf_cid_override() is used,
should we error if a scheduler is also trying to use scx_bpf_cid_topo() (i.e.,
setting a flag or similar)?

Thanks,
-Andrea