Re: [PATCH 1/5] sched/fair: Drop redundant RCU read lock in NOHZ kick path

From: Andrea Righi

Date: Sat May 16 2026 - 01:45:49 EST


Hi Shrikanth,

On Fri, May 15, 2026 at 12:19:16PM +0530, Shrikanth Hegde wrote:
>
>
> On 5/9/26 11:37 PM, Andrea Righi wrote:
> > nohz_balancer_kick() is reached from sched_balance_trigger(), which is
> > called from sched_tick(). sched_tick() runs with IRQs disabled, so the
> > additional rcu_read_lock/unlock() used around sched_domain accesses in
> > this path is redundant. Rely on the existing IRQ-disabled context (and
> > the rcu_dereference_all() checking) instead.
> >
> > The same applies to set_cpu_sd_state_idle(), called from the idle entry
> > path with IRQs disabled, and to set_cpu_sd_state_busy(), reachable via
> > nohz_balance_exit_idle() from two contexts: nohz_balancer_kick() (IRQs
> > disabled, as above) and sched_cpu_deactivate() (the CPUHP_AP_ACTIVE
> > teardown, which runs under cpus_write_lock(), so it cannot race with
> > sched-domain rebuilds). In both cases the rcu_dereference_all()
> > validation is sufficient.
> >
> > No functional change intended.
> >
>
> For this patch, few more comments below.
>
> Reviewed-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
>
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Suggested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> > Reviewed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
>
>
> > @@ -12868,17 +12860,13 @@ static void nohz_balancer_kick(struct rq *rq)
> > static void set_cpu_sd_state_busy(int cpu)
> > {
> > struct sched_domain *sd;
> > -
> > - rcu_read_lock();
> > sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> > if (!sd || !sd->nohz_idle)
> > - goto unlock;
> > + return;
> > sd->nohz_idle = 0;
> > atomic_inc(&sd->shared->nr_busy_cpus);
> > -unlock:
> > - rcu_read_unlock();
> > }
> > void nohz_balance_exit_idle(struct rq *rq)
> > @@ -12897,17 +12885,13 @@ void nohz_balance_exit_idle(struct rq *rq)
> > static void set_cpu_sd_state_idle(int cpu)
> > {
> > struct sched_domain *sd;
> > -
> > - rcu_read_lock();
> > sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> > if (!sd || sd->nohz_idle)
> > - goto unlock;
> > + return;
> > sd->nohz_idle = 1;
> > atomic_dec(&sd->shared->nr_busy_cpus);
> > -unlock:
> > - rcu_read_unlock();
> > }
> > /*
>
> I was looking at other users of sd_llc, i.e test_idle_core and set_idle_core.
> They have rcu_dereference_all. So callers need not call rcu_read_lock/unlock if
> the irq disabled/preempt_disabled.
>
> One more place would be update_idle_core. I think it is called with interrupt disabled
> in __schedule path.

Good point, __update_idle_core() reaches set_next_task_idle() via
pick_next_task() in __schedule(), and __schedule() disables IRQs before that
path.

Since set_idle_cores()/test_idle_cores() use rcu_dereference_all(), the
rcu_read_lock/unlock() pair in __update_idle_core() is indeed redundant. I can
send a follow-up patch for this.

>
> And in sched_ext, scx_idle_update_selcpu_topology, It seems to be tied to cpu hotplug and
> by same logic of cpus_write_lock held, one could remove redundant rcu_read_lock there as well.
>
> No?

For scx_idle_update_selcpu_topology() it's a bit more nuanced, if I'm not
missing anything:
- the helpers it uses (llc_weight/llc_span/numa_weight/numa_span) use plain
rcu_dereference(), so simply dropping rcu_read_lock() in the caller would
trip the lockdep check. They'd need to be converted to rcu_dereference_all()
first;
- the two call sites have different protection:
- handle_hotplug() runs from a CPU hotplug callback, so cpus_write_lock()
is held, serializes against sched-domain rebuilds,
- scx_enable() only holds cpus_read_lock(), which doesn't on
its own prevent cpuset sched-domain rebuilds (those run under
cpus_read_lock() too).

I think this one needs a separate, more careful patch. Maybe we should keep this
series scoped to the NOHZ kick path and address those as follow-ups?

Thanks,
-Andrea