Re: [PATCH] sched/topology: Initialize sd_span after assignment to *sd
From: Peter Zijlstra
Date: Mon Mar 23 2026 - 05:41:06 EST
On Sat, Mar 21, 2026 at 04:38:52PM +0000, K Prateek Nayak wrote:
> Nathan reported a kernel panic on his ARM builds after commit
> 8e8e23dea43e ("sched/topology: Compute sd_weight considering cpuset
> partitions") which was root caused to the compiler zeroing out the first
> few bytes of sd->span.
>
> During the debug [1], it was discovered that, on some configs,
> offsetof(struct sched_domain, span) at 292 was less than
> sizeof(struct sched_domain) at 296 resulting in:
>
> *sd = { ... }
>
> assignment clearing out first 4 bytes of sd->span which was initialized
> before.
>
> The official GCC specification for "Arrays of Length Zero" [2] says:
>
> Although the size of a zero-length array is zero, an array member of
> this kind may increase the size of the enclosing type as a result of
> tail padding.
>
> which means the relative offset of the variable length array at the end
> of the sturct can indeed be less than sizeof() the struct as a result of
> tail padding thus overwriting that data of the flexible array that
> overlapped with the padding whenever the struct is initialized as whole.
WTF! that's terrible :(
Why is this allowed, this makes no bloody sense :/
However the way we allocate space for flex arrays is: sizeof(*obj) +
count * sizeof(*obj->member); this means that we do have sufficient
space, irrespective of this extra padding.
Does this work?
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 51c29581f15e..defa86ed9b06 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -153,7 +153,21 @@ struct sched_domain {
static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
{
- return to_cpumask(sd->span);
+ /*
+ * Because C is an absolutely broken piece of shit, it is allowed for
+ * offsetof(*sd, span) < sizeof(*sd), this means that structure
+ * initialzation *sd = { ... }; which will clear every unmentioned
+ * member, can over-write the start of the flexible array member.
+ *
+ * Luckily, the way we allocate the flexible array is by:
+ *
+ * sizeof(*sd) + count * sizeof(*sd->span)
+ *
+ * this means that we have sufficient space for the whole flex array
+ * *outside* of sizeof(*sd). So use that, and avoid using sd->span.
+ */
+ unsigned long *bitmap = (void *)sd + sizeof(*sd);
+ return to_cpumask(bitmap);
}
extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],