Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

From: Lukasz Luba

Date: Thu Mar 19 2026 - 06:21:41 EST


Hi Qais and Xuewen,

On 3/19/26 02:13, Xuewen Yan wrote:
On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:

On 03/18/26 13:55, Vincent Guittot wrote:
On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..20adb6fede2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,

unsigned long sched_cpu_util(int cpu)
{
- return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+ unsigned long util = scx_cpuperf_target(cpu);
+
+ if (!scx_switched_all())
+ util += cpu_util_cfs(cpu);
+
+ return effective_cpu_util(cpu, util, NULL, NULL);
}

This puts the common case of no ext muck into the slow path of that
static_branch.

+1
I was about to same


This wants to be something like:

unsigned long sched_cpu_util(int cpu)
{
unsigned long util = cpu_util_cfs(cpu);

if (scx_enabled()) {
unsigned long scx_util = scx_cpuperf_target(cpu);

also scx_cpuperf_target() does not reflect the utilization of the CPU
but the targeted perfromance level

Beside that, this sort of plug-and-play is a big concern. You picked up sched
ext and changed the behavior, then you'd need to get your thermal management to
work with that. Not retrospectively sprinkle these hacks around to force things
to work again.

In fact, I had considered this issue even before sending this patch.
Our initial fix was to modify the thermal subsystem specifically,
in cpufreq_cooling.c, when SCX is enabled, we stopped calling
sched_cpu_util() and instead used idle-time-based calculation to
obtain CPU util.

However, we later realized that sched_cpu_util() is used not only in
cpufreq_cooling.c but also in dtpm_cpu.c.
Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
This led us to propose the current patch.
Although scx_cpuperf_target may not accurately reflect the true CPU
utilization, as Christian pointed out, it’s still better than having
nothing at all.

That’s exactly why I marked this patch as RFC, I wasn’t sure whether
the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
It now seems clear that modifying only sched_cpu_util() is
insufficient. Ideally, we should also update cpufreq_cooling.c, since
we cannot guarantee that all SCX BPF programs will provide an accurate
cpuperf_target.
I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.


If you refer to the utilization driven power estimation - I tend to
agree. You might remember, we had a few corner-cases due to that
design and I tried to address it differently [1][2].

Xuewen please send your proposal for the cpufreq_cooling.c and I will
have a look there. If there still be a need to fix it more deeply
then I can help with some deeper thermal changes.

Estimating the real CPU's power based on available information in the
kernel is really hard. Although, let's try to fix what you
observe in your devices.

Regards,
Lukasz

[1] https://lore.kernel.org/linux-pm/20210622075925.16189-1-lukasz.luba@xxxxxxx/

[2] https://lore.kernel.org/linux-pm/20220406220809.22555-1-lukasz.luba@xxxxxxx/