tl;dr
o Benchmark that prefer co-location and run in threaded mode see
a benefit including hackbench at high utilization and schbench
at low utilization.
o schbench (both new and old but particularly the old) regresses
quite a bit on the tial latency metric when #workers cross the
LLC size.
o client-server benchmarks where client and servers are threads
from different processes (netserver-netperf, tbench_srv-tbench,
services of DeathStarBench) seem to noticeably regress due to
lack of co-location between the communicating client and server.
Not sure if WF_SYNC can be an indicator to temporarily ignore
the preferred LLC hint.
o stream regresses in some runs where the occupancy metrics trip
and assign a preferred LLC for all the stream threads bringing
down performance in !50% of the runs.
Full data from my testing is as follows:
o Machine details
- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)
o Kernel details
tip: tip:sched/core at commit 914873bc7df9 ("Merge tag
'x86-build-2025-05-25' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
llc-aware-lb-v3: tip + this series as is