Re: [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load()

From: Lukasz Luba

Date: Wed Mar 25 2026 - 04:37:15 EST

On 3/24/26 12:03, Xuewen Yan wrote:

On Tue, Mar 24, 2026 at 6:45 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

On 3/24/26 02:20, Xuewen Yan wrote:

On Mon, Mar 23, 2026 at 9:25 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

On 3/23/26 11:06, Viresh Kumar wrote:

On 23-03-26, 10:52, Lukasz Luba wrote:

How is that okay ? What am I missing ?

I was missing !SMP :)

Right, there is a mix of two things.
The 'i' left but should be removed as well, since
this is !SMP code with only 1 cpu and i=0.

That's also why we sent out patch 1/2; after all, it is always 0 on
!SMP systems.

The whole split which has been made for getting
the load or utilization from CPU(s) needs to be
cleaned. The compiled code looks different since
it knows there is non-SMP config used.

Right, we are allocating that for num_cpus (which should be 1 CPU
anyway). The entire thing must be cleaned.

Do you want to clean that or I should do this?

It would be helpful if you can do it :)

OK, I will. Thanks for your involvement Viresh!

Xuewen please wait with your v2, I will send
a redesign of this left code today.

Okay, and Qais's point is also worth considering: do we actually need
sched_cpu_util()?
The way I see it, generally speaking, the request_power derived from
idle_time might be higher than what we get from sched_cpu_util().
Take this scenario as an example:
Consider a CPU running at the lowest frequency with 50% idle time,
versus one running at the highest frequency with the same 50% idle
time.
In this case, using idle_time yields the same load value for both.
However, sched_cpu_util() would report a lower load when the CPU
frequency is low. This results in a smaller request_power...

Right, there are 2 things to consider:
1. what is the utilization when the CPU still have idle time, e.g.
this 50% that you mentioned
2. what is the utilization when there is no idle time and CPU
is fully busy (and starts throttling due to heat)

In this thermal fwk we are mostly in the 2nd case. In that case the
utilization on CPU's runqueue goes to 1024 no mater the CPU's frequency.

Haha, indeed. When we debug IPA, we also keep the CPU constantly
running with basically no idle time.
In this scenario, we tested using both sched_cpu_util() and idle_time,
and for thermal control purposes, there was basically no difference
(likely because the load was at 100%).
Maybe we can cook up a test case where the CPU is overheating despite
having some idle time? That way we can compare how the two interfaces
perform.

I wish we could involve the GPU in those stress scenarios. Then we could
have some load (e.g. 70% on CPUs) and heavy computation on GPU which
heats up the whole silicon. In such experiment you could observe the
difference from those two methods of input power estimation.

Unfortunately, for the GPU it's hard to craft such benchmark
(at least for me lacking GPU programming experience at that level).
Ideally we could have something which controls the amount of GPU
computation and created heat...

We know which highest frequency was allowed to run and we pick the power
value from EM for it. That's why the estimation is not that bad (apart
from power variation for different flavors of workloads: heavy SIMD vs.
normal integer/load).

In 1st case scenario we might underestimate the power, but that
is not the thermal stress situation anyway, so the max OPP is
still allowed.

So far it is hard to find the best power model to use and robust CPU
load mechanisms. Adding more complexity and creating some
over-engineered code in the kernel to maintain might not have sense.
The thermal solutions are solved in the Firmware nowadays since the
kernel won't react that fast for some rapid changes.

We have to balance the complexity here.
Let's improve the situation a bit. It would be very much appreciated if
you could share information if those changes help your platform
(some older boards might not show any benefit with the new code).

Understood. We appreciate the balance between complexity and accuracy.
We could test these changes on our platforms and let you know if we
see any improvements in thermal stability or power estimation. Expect
an update from us in a few days.

Sounds great, thanks!