Re: [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed
From: John Stultz
Date: Thu Apr 30 2026 - 01:44:38 EST
On Wed, Apr 29, 2026 at 2:00 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> On 4/29/2026 7:57 AM, John Stultz wrote:
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 8ec3b6d7d718b..6ea74aecc5fbd 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -6535,8 +6536,13 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
> >> * blocked on a mutex, and we want to keep it on the runqueue
> >> * to be selectable for proxy-execution.
> >> */
> >> - if (!should_block)
> >> + if (!should_block) {
> >> + guard(raw_spinlock)(&p->blocked_lock);
> >> + /* Stable against race */
> >> + if (task_is_blocked(p))
> >> + WRITE_ONCE(p->se.sched_proxy, 1);
> >> return false;
> >> + }
> >
> > So if we double check and find the task isn't blocked anymore, we
> > probably shouldn't return early here, no?
> >
> > Let me take a stab at the bit flag approach and see how it goes.
>
> In case you want to peek at my homework ;-)
>
Well, now it will just seem like I'm cheating! :)
> @@ -7043,8 +7053,16 @@ static void __sched notrace __schedule(int sched_mode)
> if (sched_proxy_exec()) {
> struct task_struct *prev_donor = rq->donor;
>
> + /*
> + * A wakeup raced with block_task();
> + * Clear blocked_on before running the task
> + * again.
> + */
> + if (unlikely(!prev_state && prev->blocked_on))
> + clear_task_blocked_on(prev, NULL);
> +
So similar to your previous change, I was hitting this same case, and
adding in your change here was bothering me a bit that it felt like
clearing things here was papering over an unexpected state change that
we didn't have before.
Part of the issue is the blocked_on state machine is already a little complex:
NULL -> ptr -> PROXY_WAKING ->NULL
(with some limited shortcuting from ptr->NULL when we are dealing with current)
So adding this BO_FLAG_PROXY/"latch" bit to the mix complicates things:
NULL -> ptr:unlatched -> ptr:latched -> PROXY_WAKING -> NULL
With extra lines for:
ptr:unlatched -> NULL
ptr:latched -> NULL
ptr:unlatched -> PROXY_WAKING
I kept on seeing these unexpected transitions where before we latched
the blocked_on state, we'd get a wakeup on another cpu from the owner
unlocking the lock, so we'd go ptr:unlatched -> PROXY_WAKING (along
with setting TASK_RUNNING). Then because we weren't latched, we might
be running midway through the lock logic. If we go into schedule()
we'll just come back out eventually (as we're TASK_RUNNING). Then when
then try again in the lock loop to set blocked_on to ptr:unlatched
again, we hit warnings because we are setting it to a ptr when it is
PROXY_WAKING and not NULL.
I'm sure this was obvious to you when you wrote the snippit above, but
I've been running a little slow, so let's not talk about how long it
took for me to actaully understand this. :P
And yes, your snippit above avoids this, but it feels a little
incidental, cleaning up after the fact. So I think it might be better
to simplify the state machine a little to prevent it.
My current thought is to cut out the ptr:unlatched -> PROXY_WAKING
transition. And instead in __set_task_blocked_on_waking(), instead of
checking blocked_on and doing an early return if it is null, we could
instead check if the latch bit was set, and if so, set blocked_on to
NULL. This bascially treats ptr:unlatched the same as NULL in most
cases for the state machine.
Now, earlier you made some nice explanations earlier about how some of
the blocked_on checks done w/ the rq_lock and not the blocked_lock
were safe because of the limited situations where we clear blocked_on.
And this at first seems to put that assumption risk, but I think it
still holds if we always consider ptr:unlatched equivalent to NULL. So
once ptr:latched is set, the rule still holds.
That gives us:
NULL -> ptr:unlatched -> ptr:latched -> PROXY_WAKING -> NULL
Where NULL and ptr:unlatched are functionally equivalent except for
the ability to transition to ptr:latched.
With:
ptr:latched -> NULL // only done on current
ptr:unlatched -> NULL // only done on current or when trying to set waking
So far my testing is doing ok with this. Let me know if you see any
holes though.
I'll try to cleanup my current changes (surely copying some of your
excellent work :), and send this out tomorrow (I'm shot for today) for
more concrete review.
thanks
-john