Re: [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load()

From: Lukasz Luba

Date: Tue Mar 24 2026 - 06:52:47 EST

On 3/24/26 02:20, Xuewen Yan wrote:

On Mon, Mar 23, 2026 at 9:25 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

On 3/23/26 11:06, Viresh Kumar wrote:

On 23-03-26, 10:52, Lukasz Luba wrote:

How is that okay ? What am I missing ?

I was missing !SMP :)

Right, there is a mix of two things.
The 'i' left but should be removed as well, since
this is !SMP code with only 1 cpu and i=0.

That's also why we sent out patch 1/2; after all, it is always 0 on
!SMP systems.

The whole split which has been made for getting
the load or utilization from CPU(s) needs to be
cleaned. The compiled code looks different since
it knows there is non-SMP config used.

Right, we are allocating that for num_cpus (which should be 1 CPU
anyway). The entire thing must be cleaned.

Do you want to clean that or I should do this?

It would be helpful if you can do it :)

OK, I will. Thanks for your involvement Viresh!

Xuewen please wait with your v2, I will send
a redesign of this left code today.

Okay, and Qais's point is also worth considering: do we actually need
sched_cpu_util()?
The way I see it, generally speaking, the request_power derived from
idle_time might be higher than what we get from sched_cpu_util().
Take this scenario as an example:
Consider a CPU running at the lowest frequency with 50% idle time,
versus one running at the highest frequency with the same 50% idle
time.
In this case, using idle_time yields the same load value for both.
However, sched_cpu_util() would report a lower load when the CPU
frequency is low. This results in a smaller request_power...

Right, there are 2 things to consider:
1. what is the utilization when the CPU still have idle time, e.g.
this 50% that you mentioned
2. what is the utilization when there is no idle time and CPU
is fully busy (and starts throttling due to heat)

In this thermal fwk we are mostly in the 2nd case. In that case the
utilization on CPU's runqueue goes to 1024 no mater the CPU's frequency.
We know which highest frequency was allowed to run and we pick the power
value from EM for it. That's why the estimation is not that bad (apart
from power variation for different flavors of workloads: heavy SIMD vs.
normal integer/load).

In 1st case scenario we might underestimate the power, but that
is not the thermal stress situation anyway, so the max OPP is
still allowed.

So far it is hard to find the best power model to use and robust CPU
load mechanisms. Adding more complexity and creating some
over-engineered code in the kernel to maintain might not have sense.
The thermal solutions are solved in the Firmware nowadays since the
kernel won't react that fast for some rapid changes.

We have to balance the complexity here.
Let's improve the situation a bit. It would be very much appreciated if
you could share information if those changes help your platform
(some older boards might not show any benefit with the new code).

Regards,
Lukasz