On Wed, 30 Apr 2025 at 11:13, K Prateek Nayak<kprateek.nayak@xxxxxxx> wrote:
(+ more scheduler folks)commit 16b0a7a1a0af ("sched/fair: Ensure tasks spreading in LLC
tl;dr
JB has a workload that hates aggressive migration on the 2nd Generation
EPYC platform that has a small LLC domain (4C/8T) and very noticeable
C2C latency.
Based on JB's observation so far, reverting commit 16b0a7a1a0af
("sched/fair: Ensure tasks spreading in LLC during LB") and commit
c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
condition") helps the workload. Both those commits allow aggressive
migrations for work conservation except it also increased cache
misses which slows the workload quite a bit.
during LB") eases the spread of task inside a LLC so It's not obvious
for me how it would increase "a lot of CPU migrations go out of CCX,
then L3 miss,". On the other hand, it will spread task in SMT and in
LLC which can prevent running at highest freq on some system but I
don't know if it's relevant for this SoC.
commit c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
condition") makes newly idle migration happen more often which can
then do migrate tasks across LLC. But then It's more about why
enabling newly idle load balance out of LLC if it is so costly.
"relax_domain_level" helps but cannot be set at runtime and I couldn't
think of any stable / debug interfaces that JB hasn't tried out
already that can help this workload.
There is a patch towards the end to set "relax_domain_level" at
runtime but given cpusets got away with this when transitioning to
cgroup-v2, I don't know what the sentiments are around its usage.
Any input / feedback is greatly appreciated.