Re: [PATCH 1/5] sched/fair: Drop redundant RCU read lock in NOHZ kick path

From: Shrikanth Hegde

Date: Sat May 16 2026 - 13:16:44 EST

On 5/16/26 11:15 AM, Andrea Righi wrote:

Hi Shrikanth,

On Fri, May 15, 2026 at 12:19:16PM +0530, Shrikanth Hegde wrote:

On 5/9/26 11:37 PM, Andrea Righi wrote:

nohz_balancer_kick() is reached from sched_balance_trigger(), which is
called from sched_tick(). sched_tick() runs with IRQs disabled, so the
additional rcu_read_lock/unlock() used around sched_domain accesses in
this path is redundant. Rely on the existing IRQ-disabled context (and
the rcu_dereference_all() checking) instead.

The same applies to set_cpu_sd_state_idle(), called from the idle entry
path with IRQs disabled, and to set_cpu_sd_state_busy(), reachable via
nohz_balance_exit_idle() from two contexts: nohz_balancer_kick() (IRQs
disabled, as above) and sched_cpu_deactivate() (the CPUHP_AP_ACTIVE
teardown, which runs under cpus_write_lock(), so it cannot race with
sched-domain rebuilds). In both cases the rcu_dereference_all()
validation is sufficient.

No functional change intended.

For this patch, few more comments below.

Reviewed-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>

Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Suggested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Reviewed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>

@@ -12868,17 +12860,13 @@ static void nohz_balancer_kick(struct rq *rq)
static void set_cpu_sd_state_busy(int cpu)
{
struct sched_domain *sd;
-
- rcu_read_lock();
sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
if (!sd || !sd->nohz_idle)
- goto unlock;
+ return;
sd->nohz_idle = 0;
atomic_inc(&sd->shared->nr_busy_cpus);
-unlock:
- rcu_read_unlock();
}
void nohz_balance_exit_idle(struct rq *rq)
@@ -12897,17 +12885,13 @@ void nohz_balance_exit_idle(struct rq *rq)
static void set_cpu_sd_state_idle(int cpu)
{
struct sched_domain *sd;
-
- rcu_read_lock();
sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
if (!sd || sd->nohz_idle)
- goto unlock;
+ return;
sd->nohz_idle = 1;
atomic_dec(&sd->shared->nr_busy_cpus);
-unlock:
- rcu_read_unlock();
}
/*

I was looking at other users of sd_llc, i.e test_idle_core and set_idle_core.
They have rcu_dereference_all. So callers need not call rcu_read_lock/unlock if
the irq disabled/preempt_disabled.

One more place would be update_idle_core. I think it is called with interrupt disabled
in __schedule path.

Good point, __update_idle_core() reaches set_next_task_idle() via
pick_next_task() in __schedule(), and __schedule() disables IRQs before that
path.

Since set_idle_cores()/test_idle_cores() use rcu_dereference_all(), the
rcu_read_lock/unlock() pair in __update_idle_core() is indeed redundant. I can
send a follow-up patch for this.

Thanks.

And in sched_ext, scx_idle_update_selcpu_topology, It seems to be tied to cpu hotplug and
by same logic of cpus_write_lock held, one could remove redundant rcu_read_lock there as well.

No?

For scx_idle_update_selcpu_topology() it's a bit more nuanced, if I'm not
missing anything:
- the helpers it uses (llc_weight/llc_span/numa_weight/numa_span) use plain
rcu_dereference(), so simply dropping rcu_read_lock() in the caller would
trip the lockdep check. They'd need to be converted to rcu_dereference_all()
first;
- the two call sites have different protection:
- handle_hotplug() runs from a CPU hotplug callback, so cpus_write_lock()
is held, serializes against sched-domain rebuilds,
- scx_enable() only holds cpus_read_lock(), which doesn't on
its own prevent cpuset sched-domain rebuilds (those run under
cpus_read_lock() too).

I think this one needs a separate, more careful patch. Maybe we should keep this
series scoped to the NOHZ kick path and address those as follow-ups?

Thanks,
-Andrea

Yes. That makes sense.