Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg()
From: Andrea Righi
Date: Mon May 11 2026 - 01:41:31 EST
Hi Tejun,
On Sun, May 10, 2026 at 04:55:30PM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> I traced reload_loop with per-CPU ring probes around all @ops->priv
> and scx_root assign/clear sites. The race is a stomp:
>
> T2 unreg(K) T1 reg(K)
> ----------- ---------
> sch = ops->priv = sch_b800
> scx_disable; flush_disable_work
> [scx_root_disable: scx_root=NULL,
> mutex_unlock, state=DISABLED]
> mutex_lock; state ok
> scx_alloc_and_add_sched:
> ops->priv = sch_a800
> scx_root = sch_a800; init=0
> state=ENABLED; mutex_unlock
> [flush returns]
> RCU_INIT_POINTER(ops->priv, NULL) <-- clobbers sch_a800
> kobject_put(sch_b800)
Ah makes sense! Yes, that's the case.
>
> Reachable because the unreg waits on sch->helper while the next reg
> runs on the global scx_enable_helper, and scx_enable_mutex is released
> inside scx_root_disable() well before bpf_scx_unreg() reaches its
> RCU_INIT_POINTER. My trace caught 11us between PRIV_SET sch_a800 and
> the clobber; nothing bounds it.
>
> The posted patch suppresses the deref but leaves the stomp. Each
> stomp leaks one sch (the "sch's base reference will be put by
> bpf_scx_unreg()" contract assumes ops->priv still points at it), and
> in the case I caught, sch_a800 is already SCX_ENABLED with scx_root
> pointing at it - the bpf_link is gone but state stays ENABLED, so all
> future attaches fail with -EBUSY permanently.
>
> Suggestion: make @ops->priv the lifecycle binding. In
> scx_root_enable_workfn() (and scx_sub_enable_workfn()), after the
> existing state check and still under scx_enable_mutex, refuse with
> -EBUSY if @ops->priv is non-NULL. Unreg side keeps its current
> ordering.
I'll send a new version implementing this.
>
> One question: are there other paths that write or clear @ops->priv?
> I only see the rcu_assign_pointer in scx_alloc_and_add_sched and the
> RCU_INIT_POINTER(NULL) in bpf_scx_unreg().
AFAICS there's only the rcu_assign_pointer() in scx_alloc_and_add_sched() and
RCU_INIT_POINTER(NULL) in bpf_scx_unreg(), no other writers/clearers. So the
-EBUSY check should be sufficient to close all the races.
Thanks,
-Andrea