Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
From: Qais Yousef
Date: Thu May 28 2026 - 21:46:30 EST
On 05/28/26 14:50, Tom Gebhardt wrote:
> Hi Qais,
>
> Thanks for the clarification on sched-analyzer -- I'll look at the perfetto
> approach for task placement traces.
>
> In the meantime, I ran `perf stat` and `perf record -g` across three kernels
> at OC (2800 MHz) with `ondemand` governor, using the same stress-ng pipe
> workload (4 workers, 20s).
>
> Device: Raspberry Pi 5 (8 GB, C1-stepping, Cortex-A76), Bookworm arm64.
>
> perf stat results:
>
> Metric 6.6.78 7.0 stock 7.0+ttwu+vincent
> ------------------ --------- ---------- ----------------
> bogo ops/s 2 222 639 1 855 066 2 298 965
7.0+ttwu+vincent is the best, right?
Have you verified your actual workload is seeing benefit? I think when
I scanned the github bug you references the original report was observing
a regression in some real setup, not this stressng tests. I am wary some of
these stress tests don't necessarily represent real cases as it can over stress
a particular scenario and amplify minor problems that have no noticeable impact
in practice.
> IPC 1.72 1.47 1.76
> branch-misses 625M 1 270M 1 018M
> context-switches 15 145 738 22 750 121 18 905 924
> cache-miss rate 1.58% 1.74% 1.38%
>
> Key observations:
>
> 1. IPC drops 14% on 7.0 stock (1.72 -> 1.47). ttwu+vincent recovers it
> almost completely (1.76, slightly above 6.6). This is a genuine
> efficiency loss in the scheduler path, not a throughput/clock artifact.
Due to stalling you reckon?
>
> 2. Branch mispredictions double on 7.0 stock (+103% vs 6.6). ttwu+vincent
> reduces them by ~20% vs stock, but +63% above 6.6 remains -- this
> likely explains the residual ~1% gap after patching.
I might not be reading the numbers correctly but they seem higher
>
> 3. Context switches increase 50% on 7.0 stock. ttwu+vincent brings this
> down to +25% vs 6.6.
I hope that is something perfetto trace will help visualize the pattern that
lead to this higher context switching
>
> perf report (-g) highlights:
>
> On 6.6, `finish_task_switch` is barely visible in call graphs. On 7.0
> (both stock and patched), it appears prominently at 5-8% of samples,
> alongside elevated `_raw_spin_unlock_irqrestore` time. This points to
> genuine overhead in the context switch completion path, not lock contention
> between worker tasks.
Do you have the full (well, most relevant parts of it) output? It would be
interesting to use perf diff to see the difference of 7.0 stock vs 6.6 and
7.0+ttwu+vincent vs 6.6.
Maybe there's higher rq lock contention. But this finish_task_switch and
__raw_spin_unlock_irqrestore are common to see, especially when there's high
context switch rate. It might not necessarily indicate there's a problem.
>
> Regarding the "weird contention accidentally hidden" concern: I don't see
> evidence for that. The branch miss explosion and IPC drop on 7.0 stock are
> consistent with more complex/harder-to-predict scheduler control flow
> (EEVDF decision tree vs. CFS), not with a workload contention pattern that
> happens to be masked by task placement changes. ttwu+vincent genuinely
> reduces branch misses and restores IPC -- it doesn't just move the problem.
Not necessarily a workload contention but a scheduler lock or cache related on
a 'hot variable'. See [1] for example. I am hoping perf diff will help see
which part has gotten noticeably worse then you can inspect this function to
see where in the code the code has gotten slower; hopefully this can shed some
light how this unrelated patch is helping..
[1] https://lore.kernel.org/all/20240307085725.444486-2-sshegde@xxxxxxxxxxxxx/
>
> I'll try to get perfetto traces for the task placement / running vs.
> runnable time breakdown. Happy to provide the raw perf.data files if
> useful.
>
> Tom