Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time

From: Christian Loehle

Date: Sat May 16 2026 - 09:43:41 EST

On 5/15/26 14:57, Tom Gebhardt wrote:
> Hi Christian,
>
> Good point -- I ran additional tests with `performance` and `ondemand` governors
> side by side on the same kernel (7.0.0 + ttwu + patch 12 only):
>
> Clock Governor pipe bogo ops/s
> -------- ------------ ----------------
> 2400 MHz performance 2 095 187
> 2400 MHz ondemand 2 093 221
> 2800 MHz performance 2 415 817
> 2800 MHz ondemand 2 415 617
>
> The difference between governors is <0.1% -- well within noise. So you are
> right: the effect is not cpufreq-related. Whatever patch 12 changes, it
> affects the scheduler path directly, not through frequency selection.
>
> I also applied Vincent's fix [1] and benchmarked it:
>
> Kernel Clock pipe bogo ops/s Δ vs. 6.6.78
> ---------------------- -------- ---------------- ------------
> 6.6.78 2400 MHz 2 129 330 ±0%
> 6.6.78 2800 MHz 2 487 746 ±0%
> 7.0 + ttwu + patch 12 2400 MHz 2 093 221 −1.7%
> 7.0 + ttwu + patch 12 2800 MHz 2 415 617 −2.9%
> 7.0 + ttwu + Vincent 2400 MHz 2 077 526 −2.4%
> 7.0 + ttwu + Vincent 2800 MHz 2 458 151 −1.2%
>
> Vincent's fix gets very close to 6.6 at 2800 MHz (−1.2%) and is similar to
> patch 12 at 2400 MHz. Both are a large improvement over vanilla 7.0+ttwu
> (−22% at 2800 MHz) and plain 7.0 stock (−26% at 2800 MHz).

I tried to replicate using orion o6 and offlining all big CPUs leaving
4 little CPUs and an SMP system.
Workload:
for i in $(seq 0 19); do stress-ng --pipe 4 --pipe-ops 5000000 --metrics-brief --timeout 60 ; sleep 60 ; done
Results: (bogo ops/s real time)
7.1-rc3 powersave: 27186.17 ± 813.42
7.1-rc3vingu powersave: 26866.67 ± 899.51
7.1-rc3 performance: 78223.83 ± 4344.88
7.1-rc3vingu performance: 77289.57 ± 3321.10

As expected there's no significant change with Vincent's patch.
I didn't notice anything suspicious in the patch either, looks fine to me.
Next suspect is of course some interaction with Peter's ttwu series you've applied.
Alternatively you could also push your exact tree somewhere and I'll go and use
that myself.

>
> Note: [1] applied with a manual context fixup for the DELAY_DEQUEUE hunk --
> the rpi-7.0.y tree's dequeue_entity() differs slightly from mainline in that
> block (no update_entity_lag() call inside the DELAY_DEQUEUE early-return).
> The semantic intent of the hunk was preserved.
>
> [1] https://lore.kernel.org/lkml/agRyoe1wHyZ-vMk9@vingu-cube/
>
> Thanks for catching that and for offering to reproduce it.
>
> Tom