Re: [PATCH v25 9/9] sched: Handle blocked-waiter migration (and return migration)

From: John Stultz

Date: Thu Mar 19 2026 - 17:30:01 EST

On Thu, Mar 19, 2026 at 5:50 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Mar 13, 2026 at 02:30:10AM +0000, John Stultz wrote:
> > +/*
> > + * If the blocked-on relationship crosses CPUs, migrate @p to the
> > + * owner's CPU.
> > + *
> > + * This is because we must respect the CPU affinity of execution
> > + * contexts (owner) but we can ignore affinity for scheduling
> > + * contexts (@p). So we have to move scheduling contexts towards
> > + * potential execution contexts.
> > + *
> > + * Note: The owner can disappear, but simply migrate to @target_cpu
> > + * and leave that CPU to sort things out.
> > + */
> > +static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
> > + struct task_struct *p, int target_cpu)
> > {
> > + struct rq *target_rq = cpu_rq(target_cpu);
> > +
> > + lockdep_assert_rq_held(rq);
> > +
> > + /*
> > + * Since we're going to drop @rq, we have to put(@rq->donor) first,
> > + * otherwise we have a reference that no longer belongs to us.
> > + *
> > + * Additionally, as we put_prev_task(prev) earlier, its possible that
> > + * prev will migrate away as soon as we drop the rq lock, however we
> > + * still have it marked as rq->curr, as we've not yet switched tasks.
> > + *
> > + * So call proxy_resched_idle() to let go of the references before
> > + * we release the lock.
> > + */
> > + proxy_resched_idle(rq);
>
> This comment confuses the heck out of me. It seems to imply we need to
> schedule before dropping rq->lock.

Fair point, I wrote that awhile back and indeed it's not really clear
(the rq->curr bit doesn't make much sense to me now).

There is a similar explanation is in proxy_deactivate() which maybe is
more clear?

Bascially since we are migrating a blocked donor, it could be
rq->donor, and we want to make sure there aren't any references from
this rq to it before we drop the lock. This avoids another cpu jumping
in and grabbing the rq lock and referencing rq->donor or cfs_rq->curr,
etc after we have migrated it to another cpu.

I'll rework it the comment to something like the above, but feel free
to suggest rewordings if you prefer.

> > +
> > + WARN_ON(p == rq->curr);
> > +
> > + deactivate_task(rq, p, DEQUEUE_NOCLOCK);
> > + proxy_set_task_cpu(p, target_cpu);
> > +
> > + /*
> > + * We have to zap callbacks before unlocking the rq
> > + * as another CPU may jump in and call sched_balance_rq
> > + * which can trip the warning in rq_pin_lock() if we
> > + * leave callbacks set.
> > + */
>
> It might be good to explain where these callbacks come from.

Ack. I've taken a swing at this and will include it in the next revision.

>
> > + zap_balance_callbacks(rq);
> > + rq_unpin_lock(rq, rf);
> > + raw_spin_rq_unlock(rq);
> > +
> > + attach_one_task(target_rq, p);
> > +
> > + raw_spin_rq_lock(rq);
> > + rq_repin_lock(rq, rf);
> > + update_rq_clock(rq);
> > +}
> > +
> > +static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> > + struct task_struct *p)
> > +{
> > + struct rq *this_rq, *target_rq;
> > + struct rq_flags this_rf;
> > + int cpu, wake_flag = WF_TTWU;
> > +
> > + lockdep_assert_rq_held(rq);
> > + WARN_ON(p == rq->curr);
> > +
> > + /*
> > + * We have to zap callbacks before unlocking the rq
> > + * as another CPU may jump in and call sched_balance_rq
> > + * which can trip the warning in rq_pin_lock() if we
> > + * leave callbacks set.
> > + */
>
> idem
>
> > + zap_balance_callbacks(rq);
> > + rq_unpin_lock(rq, rf);
> > + raw_spin_rq_unlock(rq);
>
> This is in fact the very same sequence as above.
>
...
>
> Hurm... how about something like so?

Sounds good. I've worked this in and am testing it now.

Thanks for the feedback and suggestions!
-john