Re: [PATCH v29 1/9] sched: Rework pick_next_task() and prev_balance() to avoid stale prev references

From: Peter Zijlstra

Date: Wed May 20 2026 - 05:52:17 EST


On Tue, May 19, 2026 at 07:45:21PM -0700, John Stultz wrote:
> On Tue, May 19, 2026 at 3:32 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, May 12, 2026 at 02:56:11AM +0000, John Stultz wrote:
> > > Historically, the prev value from __schedule() was the rq->curr.
> > > This prev value is passed down through numerous functions, and
> > > used in the class scheduler implementations. The fact that
> > > prev was on_cpu until the end of __schedule(), meant it was
> > > stable across the rq lock drops that the class->pick_next_task()
> > > and ->balance() implementations often do.
> > >
> > > However, with proxy-exec, the prev passed to functions called
> > > by __schedule() is rq->donor, which may not be the same as
> > > rq->curr and may not be on_cpu, this makes the prev value
> > > potentially unstable across rq lock drops.
> > >
> > > A recently found issue with proxy-exec, is when we begin doing
> > > return migration from try_to_wake_up(), its possible we may be
> > > waking up the rq->donor. When we do this, we proxy_resched_idle()
> > > to put_prev_set_next() setting the rq->donor to rq->idle, allowing
> > > the rq->donor to be return migrated and allowed to run.
> > >
> > > This however runs into trouble, as on another cpu we might be in
> > > the middle of calling __schedule(). Conceptually the rq lock is
> > > held for the majority of the time, but in calling pick_next_task()
> > > its possible the class->pick_next_task() handler or the
> > > ->balance() call may briefly drop the rq lock. This opens a
> > > window for try_to_wake_up() to wake and return migrate the
> > > rq->donor before the class logic reacquires the rq lock.
> > >
> > > Unfortunately pick_next_task() and prev_balance() pass in a prev
> > > argument, to which we pass rq->donor. However this prev value can
> > > now become stale and incorrect across a rq lock drop.
> > >
> > > So, to correct this, rework the pick_next_task() and
> > > prev_balance() calls so that they do not take a "prev" argument.
> > >
> > > Also rework the class ->pick_next_task() and ->balance()
> > > implementations to drop the prev argument, and in the cases
> > > where it was used, and have the class functions reference
> > > rq->donor directly, and not save the value across rq lock drops
> > > so that we don't end up with a stale references.
> > >
> > > Signed-off-by: John Stultz <jstultz@xxxxxxxxxx>
> >
> > Right, so I think I'll take patches 1,2,3,8,9 from the flat series and
> > munge this on top. That gets rid of all that pick_next_task() nonsense
> > and simplifies things a little.
>
> Sounds good to me! Let me know when you have a branch posted that I
> should use as a base for any further iterations on the rest of the
> proxy chagnes I submitted.

I have some random bits in queue:sched/proxy, this is more a
work-in-progress branch and will be rebased. It doesn't (yet) include
the full series you posted simply because I didn't get around to looking
at the remaining few patches yet.

Aside from doing that blocked_donor review, I was also side-tracked by
looking at doing some further cleanups wrt is_blocked.