Re: [PATCH] sched/fair: Revert boost in cpu_util()

From: hongyan.xia(夏弘彦)

Date: Mon May 18 2026 - 22:42:17 EST

On 5/19/2026 9:17 AM, Qais Yousef wrote:
> On 05/18/26 11:37, hongyan.xia(夏弘彦) wrote:
>> On 5/18/2026 6:04 PM, Christian Loehle wrote:
>>> [Some people who received this message don't often get email from christian.loehle@xxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On 5/18/26 03:40, hongyan.xia(夏弘彦) wrote:
>>>> From: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>>>
>>>> We have seen a massive power consumption regression (20% SoC power
>>>> increase in many apps) after updating our kernel. After bisection we
>>>> pinpointed the regression to the cpu_util(boost) feature. After
>>>> reverting the boost feature the massive energy regression is gone.
>>>> Detailed trace analysis down below. The regression is found across quite
>>>> many apps but Youtube is one of the worst offenders, shown in the
>>>> 1080p60fps video benchmark:
>>>>
>>>> Setup FPS SoC Power (mW) diff
>>>> w/ boost 59.94 913.6
>>>> w/o boost 59.93 720.4 -21.15%
>>>>
>>>> Signed-off-by: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>>>
>>>> ---
>>>> Analysis:
>>>>
>>>> We found several problems that result in the power spike:
>>>>
>>>> 1. Arithmetic should not happen between util_avg and runnable_avg:
>>>>
>>>> After util = max(util, runnable) which potentially picks runnable value
>>>> in cpu_util(), we then add or subtract task util values from it. This
>>>> produces a value that is half-runnable-half-util which is ill-defined.
>>>> This alone should be a warning sign. This breaks EAS calculations in
>>>> many cases, leading to sub-optimal task placements.
>
> I don't think it does. The util signal itself has issues too :)

One issue I found is that it sometimes piles up tasks on the same CPU,
because rq.runnable_avg - task.util_avg is still very high and not much
lower than rq.runnable_avg, making EAS think there is no benefit in
spreading out tasks when other CPUs are empty.

But this problem is usually temporary and doesn't last long in reality.

>>>>
>>>> 2. Using the absolute value of runnable_avg to drive frequency is
>>>> too high to be reasonable:
>>>>
>>>> We use runnable in a _relative_ way to util to know whether there is
>>>> contention in several places. However, the _absolute_ value should not
>>>> be used like util. Runnable_avg tends to be significantly higher,
>>>> making it much easier to saturate frequency.
>>>>
>>>> For example, if three tasks each with a util of 100 contend on the same
>>>> rq, the rq util is 300 but runnable_avg shoots up to 900. 900 drives the
>>>> CPU at the max frequency, and it's highly questionable whether this
>>>> boost is the right decision.
>
> I think this is the idea. These tasks are waiting behind other tasks.
>
>>>>
>>>> 3. Runnable_avg may not even reflect true contention:
>>>>
>>>> When tasks are dependent, the bottleneck is often the data flow between
>>>> tasks, not the contention seen by runnable_avg. Boosting frequency with
>>>> runnable in such scenarios wastes power without performance benefits.
>
> I believe contention is used to describe several tasks fighting for CPU time
> but only a single task can run and the other will be waiting. But I think
> I know what you mean, I think this is the same I was highlighting in [1].
> We don't care if some tasks end up waiting for more.
>
>>>>
>>>> We found 1 has minor power regression but 2 and 3 regresses power
>>>> significantly. We have seen multiple applications with the
>>>> producer-consumer model with many worker threads suffer. When there is
>>>> IPC between producer and consumer, boosting frequency blindly does not
>>>> help performance at all if consumer is limited by how much data is flown
>>>> through. Youtube suffer from 1, 2 and 3 at the same time, leading to a
>>>> total SoC power regression of 20% shown in the results above.
>>>
>>> We did discuss removing runnable boost internally as well, but I’d love to see
>>> more data too.
>>> The original issue it was trying to solve was avoiding jank frames during load
>>> spikes, which YouTube does not really exercise. Some gaming workload data would
>>> therefore be a useful addition here.
>>
>> Although I would be glad to provide more data (after more benchmarks and
>> pending our internal approval), I wonder, what level of performance gain
>> do we expect from this feature to justify the big energy regression?
>>
>>> Runnable boost was considered as an alternative to approaches like reducing the
>>> PELT half-life and similar changes. Qais’ current ideas also try to tackle this
>>> problem, of course, so +CC.
>
> A lot of the current behavior is actually good for power by accident. And this
> runnable approach helps performance as a workaround to these issues. We need to
> defer some decisions to userspace and just give them a better way to decide
> their trade-offs. One person's regression is another person's gain..

To be honest, yes, we live in a world where many things work by accident
and there are definitely a lot of 'accidents' in schedutil. Our
motivation for this patch is mostly our real world test scenarios that
mimic customer day of use patterns, and it looks like the perf gain is
small compared with the energy regression across common apps.

>>>
>>> If you have run many workloads, do you also have data on where this feature actually
>>> helped, especially in reducing jank frames?
>>
>> We ran our Day of Use (DoU, including Facebook, Youtube and other
>> popular apps) test model and we did see a 6.6% increase in jank frames
>> after the revert. Dropped frames went up from 106 to 113 in a total of
>> 70210 frames. However, in our test model there is no way an increase of
>> 7 frames within 70210 justifies the energy regression between 10% and
>> 20% in a lot of apps, hence for us the trade-off decision is very clear
>> here.
>>
>> Another question from me is, if this feature has potentially buggy
>> corners or mathematical unsoundness (mostly the half-util-half-runnable
>> value inside cpu_util()), should we rely on its performance gain?
>>
>>>
>>> Some discussion from back then:
>>> https://lore.kernel.org/lkml/20230406155030.1989554-1-dietmar.eggemann@xxxxxxx/
>>> https://lore.kernel.org/lkml/20220829055450.1703092-1-dietmar.eggemann@xxxxxxx/
>
> Generally I remember I had concerns on this approach then [1]. I kept quite
> after it got merged and won't complain if it is removed now.
>
> [1] https://lore.kernel.org/lkml/20230504152328.twh3rqgq2o2gvd4u@airbuntu/

I must say I'm now almost completely echoing what you were saying. Sad
that I didn't see this thread back then. Our test results confirmed the
concerns in that thread, namely:

1. Whether it's a global win: The performance gain seems limited, like
the jank results (not with Jankbench, but actual animations animated by
common apps) I just shared with Christian.
2. Hurts power: Yes, we saw a dramatic 20% SoC power increase in certain
apps like Youtube playback.
3. Being selective: This is also our concern. In our analysis, looks
like it boosts frequency often in cases where it doesn't help perf.

Sad that these questions are answered 3 years later, but better late
than never :)