[Patch v4 00/16] Cache aware scheduling enhancements

From: Tim Chen

Date: Wed May 13 2026 - 16:33:41 EST

This patch set contains cache-aware scheduling enhancements
and bug fixes on top of Peter's sched/cache branch:
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/cache

Patches 1 to 6 resolve the over-aggregation issue, which is the remaining
part of v4 that has not yet been merged into sched/cache. Patches 7 to 15
fix bugs reported by Sashiko (online and local).

Compared with cache-aware v4, the major change in the first part is
storing the LLC effective size in the per-CPU bottom sched_domain. This
allows checking whether a task's memory footprint exceeds the threshold
by fetching the value directly from the corresponding sched_domain,
instead of recalculating it every time. Besides, the NUMA balance
page-fault statistics is used instead of RSS to estimate the working
set. We also picked up Jianyong's optimization patch to reduce CPU scan
overhead. However, if NUMA balancing is not enabled we will not have
this working set estimate. Perhaps using RSS will be apprpriate for
such scenario.

Gengkun's CPU scan optimization is not
included for now and will be revisited after further tuning.

Most patches in the second part address race conditions. Each patch fixes
one independent issue to facilitate easier review.

Test results show that the current version keeps the same performance
as v4 for workloads and platforms we tested.

Future plans are to introduce fine-grained control of using cache aware
scheduling on specific tasks after the load-balance-based cache-aware
scheduling is merged:

- Look into task tagging (e.g. with schedqos framework, cgroup) for non process
based tasks grouping to LLC.
- Evaluate fast cache-aware aggregation in the wakeup path.

I will be on sabbatical from mid May to mid June. Chen Yu will still be
following up these patches.

Thanks.

Tim

Chen Yu (15):
sched/cache: Disable cache aware scheduling for processes with high
thread counts
sched/cache: Skip cache-aware scheduling for single-threaded processes
sched/cache: Calculate the LLC size and store it in sched_domain
sched/cache: Avoid cache-aware scheduling for memory-heavy processes
sched/cache: Add user control to adjust the aggressiveness of
cache-aware scheduling
sched/cache: Fix rcu warning when accessing sd_llc domain
sched/cache: Fix potential NULL mm pointer access
sched/cache: Annotate lockless accesses to mm->sc_stat.cpu
sched/cache: Fix unpaired account_llc_enqueue/dequeue
sched/cache: Fix checking active load balance by only considering the
CFS task
sched/cache: Fix race condition during sched domain rebuild
sched/cache: Fix cache aware scheduling enabling for multi LLCs system
sched/cache: Fix has_multi_llcs iff at least one partition has
multiple LLCs
sched/cache: Fix possible overflow when invalidating the preferred CPU
sched/cache: Fix stale preferred_llc for a new task

Jianyong Wu (1):
sched/cache: Allow only 1 thread of the process to calculate the LLC
occupancy

drivers/base/cacheinfo.c | 23 +++
include/linux/cacheinfo.h | 1 +
include/linux/sched.h | 5 +
include/linux/sched/topology.h | 7 +
init/init_task.c | 1 +
kernel/exit.c | 29 ++++
kernel/sched/debug.c | 14 +-
kernel/sched/fair.c | 256 +++++++++++++++++++++++++++++----
kernel/sched/sched.h | 7 +-
kernel/sched/topology.c | 240 +++++++++++++++++++++++++------
10 files changed, 509 insertions(+), 74 deletions(-)

--
2.32.0