Re: [PATCH 1/2] sched/fair: consider hk_mask early in triggering ilb
From: Shrikanth Hegde
Date: Fri Mar 20 2026 - 05:29:20 EST
On 3/20/26 9:07 AM, K Prateek Nayak wrote:
Hello Shrikanth,
On 3/19/2026 12:23 PM, Shrikanth Hegde wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b19aeaa51ebc..02cca2c7a98d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,6 +7392,7 @@ static inline unsigned int cfs_h_nr_delayed(struct rq *rq)
static DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
static DEFINE_PER_CPU(cpumask_var_t, select_rq_mask);
static DEFINE_PER_CPU(cpumask_var_t, should_we_balance_tmpmask);
+static DEFINE_PER_CPU(cpumask_var_t, kick_ilb_tmpmask);
nit. We can rename and reuse select_rq_mask. Wakeups happen with IRQs
disabled and kick happens from the hrtimer handler so it should be safe
to reuse that and save some space.
Thoughts?
May be. but it could be a confusing name. sched_tmpmask?
We could similar stuff already to load_balance_mask, select_rq_mask.
So, i would prefer to keep it separate.
[..snip..]
@@ -12715,27 +12716,41 @@ static void nohz_balancer_kick(struct rq *rq)
*/
nohz_balance_exit_idle(rq);
+ /* ILB considers only HK_TYPE_KERNEL_NOISE housekeeping CPUs */
+
if (READ_ONCE(nohz.has_blocked_load) &&
- time_after(now, READ_ONCE(nohz.next_blocked)))
+ time_after(now, READ_ONCE(nohz.next_blocked))) {
flags = NOHZ_STATS_KICK;
+ cpumask_and(ilb_cpus, nohz.idle_cpus_mask,
+ housekeeping_cpumask(HK_TYPE_KERNEL_NOISE));
+ }
/*
- * Most of the time system is not 100% busy. i.e nohz.nr_cpus > 0
- * Skip the read if time is not due.
+ * Most of the time system is not 100% busy. i.e there are idle
+ * housekeeping CPUs.
+ *
+ * So, Skip the reading idle_cpus_mask if time is not due.
*
* If none are in tickless mode, there maybe a narrow window
* (28 jiffies, HZ=1000) where flags maybe set and kick_ilb called.
* But idle load balancing is not done as find_new_ilb fails.
- * That's very rare. So read nohz.nr_cpus only if time is due.
+ * That's very rare. So check (idle_cpus_mask & HK_TYPE_KERNEL_NOISE)
+ * only if time is due.
+ *
*/
if (time_before(now, nohz.next_balance))
goto out;
+ /* Avoid the double computation */
+ if (flags != NOHZ_STATS_KICK)
+ cpumask_and(ilb_cpus, nohz.idle_cpus_mask,
+ housekeeping_cpumask(HK_TYPE_KERNEL_NOISE));
+
/*
* None are in tickless mode and hence no need for NOHZ idle load
* balancing
*/
- if (unlikely(cpumask_empty(nohz.idle_cpus_mask)))
+ if (unlikely(cpumask_empty(ilb_cpus)))
return;
We can just use the return value from the previous cpumask_and() for
this and save on another cpumask iteration.
Makes sense. Will do.
if (rq->nr_running >= 2) {