Re: [RFC patch v3 00/20] Cache aware scheduling

From: Shrikanth Hegde
Date: Thu Jul 03 2025 - 16:00:53 EST




tl;dr

o Benchmark that prefer co-location and run in threaded mode see
  a benefit including hackbench at high utilization and schbench
  at low utilization.

o schbench (both new and old but particularly the old) regresses
  quite a bit on the tial latency metric when #workers cross the
  LLC size.

o client-server benchmarks where client and servers are threads
  from different processes (netserver-netperf, tbench_srv-tbench,
  services of DeathStarBench) seem to noticeably regress due to
  lack of co-location between the communicating client and server.

  Not sure if WF_SYNC can be an indicator to temporarily ignore
  the preferred LLC hint.

o stream regresses in some runs where the occupancy metrics trip
  and assign a preferred LLC for all the stream threads bringing
  down performance in !50% of the runs.


- When you have SMT systems, threads will go faster if they run in ST mode.
If aggregation happens in a LLC, they might end up with lower IPC.

Full data from my testing is as follows:

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel details

tip:      tip:sched/core at commit 914873bc7df9 ("Merge tag
           'x86-build-2025-05-25' of
           git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

llc-aware-lb-v3: tip + this series as is