Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes

From: Cheng-Yang Chou

Date: Sat Apr 25 2026 - 21:48:06 EST


Hi Kuba,

On Thu, Apr 23, 2026 at 01:32:20PM +0000, Kuba Piecuch wrote:
> > On Mon, Mar 23, 2026 at 01:13:20PM -1000, Tejun Heo wrote:
> >> > The simple way to do this is to do scx_bpf_dsq_insert() at the very beginning,
> >> > once we know which task we would like to dispatch, and cancel the pending
> >> > dispatch via scx_bpf_dispatch_cancel() if any of the pre-dispatch checks fail
> >> > on the BPF side. This way, the "critical section" includes BPF-side checks, and
> >> > SCX will ignore the dispatch if there was a dequeue/enqueue racing with the
> >> > critical section.
> >> >
> >> > With this solution, we can throw an error if task_can_run_on_remote_rq() is
> >> > false, because we know that there was no racing cpumask change (if there was,
> >> > it would have been caught earlier, in finish_dispatch()).
> >>
> >> Yeah, I think this makes more sense. qseq is already there to provide
> >> protection against these events. It's just that the capturing of qseq is too
> >> late. If insert/cancel is too ugly, we can introduce another kfunc to
> >> capture the qseq - scx_bpf_dsq_insert_begin() or something like that - and
> >> stash it in a per-cpu variable. That way, qseq would be cover the "current"
> >> queued instance and the existing qseq mechanism would be able to reliably
> >> ignore the ones that lost race to dequeue.
> >
> > Since this has been stale for a while, I prepared a patch to implement
> > scx_bpf_dsq_insert_begin() as suggested.
>
> Thanks for creating the patch. A couple of thoughts:
>
> 1. Do we have a use case that requires dsq_insert_begin() that isn't
> satisfied using the "insert and then cancel if needed" approach?

IIUC, yes. scx_bpf_dispatch_cancel() is only registered in
scx_kfunc_ids_dispatch, so it is only callable from ops.dispatch().
dsq_insert_begin(), on the other hand, is available from both
ops.enqueue() and ops.dispatch() (SCX_KF_ENQUEUE | SCX_KF_DISPATCH).
Since there is nothing to cancel in ops.enqueue(), the insert-and-cancel
approach simply doesn't work there.

>
> 2. Do we want to restrict ourselves through the one qseq slot provided by
> dsq_insert_begin()? The most flexible approach IMO would be to simply
> allow BPF to read the qseq directly via a kfunc and then supply it to
> dsq_insert() later. With this, we can have multiple qseqs saved at the
> same time, and we can even pass them between CPUs, e.g. if one CPU
> dequeues a task for a sibling CPU, but we want the checks to be made inside
> the sibling's ops.dispatch() (I just made this use case it up, it may not
> be practical.)
> That said, exposing an internal thing like qseq to BPF may be a step too far.

In Tejun's reply back in [1], he suggested dsq_insert_begin() precisely
to avoid promoting qseq into the BPF ABI — which matches your own concern.
The single per-CPU slot is sufficient for the one-task-per-iteration
dispatch loops used by existing schedulers (e.g., scx_central).
If a concrete cross-CPU use case materializes later, we can always extend
dsq_insert() to accept an explicit qseq without breaking the current,
simpler path.

[1]: https://lore.kernel.org/all/acHJED4iAeytdC2l@xxxxxxxxxxxxxxx/

> Let me know what you think.
>

Please correct me if I'm missing something, thanks! ^0^

--
Cheers,
Cheng-Yang