Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions

From: K Prateek Nayak

Date: Fri Mar 20 2026 - 23:37:29 EST

Hello Nathan,

Thank you for the report.

On 3/21/2026 5:28 AM, Nathan Chancellor wrote:
> $ cat kernel/configs/schedstats.config
> CONFIG_SCHEDSTATS=y

Is the "schedstats.config" available somewhere? I tried these
steps on my end but couldn't reproduce the crash with my config.

Also, are you saying it is necessary to enable CONFIG_SCHEDSTATS
to observe the crash?

>
> $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage
>
> $ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio
>
> $ qemu-system-arm \
> -display none \
> -nodefaults \
> -no-reboot \
> -machine virt \
> -append 'console=ttyAMA0 earlycon' \
> -kernel arch/arm/boot/zImage \
> -initrd rootfs.cpio \
> -m 1G \
> -serial mon:stdio
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026
> ...
> [ 0.031929] 8<--- cut here ---
> [ 0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write
> [ 0.032172] [00000000] *pgd=00000000
> [ 0.032459] Internal error: Oops: 805 [#1] SMP ARM
> [ 0.032902] Modules linked in:
> [ 0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY
> [ 0.033658] Hardware name: Generic DT based system
> [ 0.033770] PC is at build_sched_domains+0x7d0/0x1628

For me, this points to:

$ scripts/faddr2line vmlinux build_sched_domains+0x7d0/0x1628
build_sched_domains+0x7d0/0x1628:
find_next_bit_wrap at include/linux/find.h:455
(inlined by) build_sched_groups at kernel/sched/topology.c:1255
(inlined by) build_sched_domains at kernel/sched/topology.c:2603

which is the:

span = sched_domain_span(sd);

for_each_cpu_wrap(i, span, cpu) /* Here */ {
...
}

in build_sched_groups() so we are likely going off the allocated
cpumask size but before that, we do this in the caller:

sd->span_weight = cpumask_weight(sched_domain_span(sd));

which should have crashed too if we had a NULL pointer in the
cpumask range. So I'm at a loss. Maybe the pc points to a
different location in your build?

--
Thanks and Regards,
Prateek