Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()

From: Xuewen Yan

Date: Wed Mar 18 2026 - 22:14:49 EST


On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> On 03/18/26 13:55, Vincent Guittot wrote:
> > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bf948db905ed..20adb6fede2a 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > + unsigned long util = scx_cpuperf_target(cpu);
> > > > +
> > > > + if (!scx_switched_all())
> > > > + util += cpu_util_cfs(cpu);
> > > > +
> > > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }
> > >
> > > This puts the common case of no ext muck into the slow path of that
> > > static_branch.
> >
> > +1
> > I was about to same
> >
> > >
> > > This wants to be something like:
> > >
> > > unsigned long sched_cpu_util(int cpu)
> > > {
> > > unsigned long util = cpu_util_cfs(cpu);
> > >
> > > if (scx_enabled()) {
> > > unsigned long scx_util = scx_cpuperf_target(cpu);
> >
> > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > but the targeted perfromance level
>
> Beside that, this sort of plug-and-play is a big concern. You picked up sched
> ext and changed the behavior, then you'd need to get your thermal management to
> work with that. Not retrospectively sprinkle these hacks around to force things
> to work again.

In fact, I had considered this issue even before sending this patch.
Our initial fix was to modify the thermal subsystem specifically,
in cpufreq_cooling.c, when SCX is enabled, we stopped calling
sched_cpu_util() and instead used idle-time-based calculation to
obtain CPU util.

However, we later realized that sched_cpu_util() is used not only in
cpufreq_cooling.c but also in dtpm_cpu.c.
Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
This led us to propose the current patch.
Although scx_cpuperf_target may not accurately reflect the true CPU
utilization, as Christian pointed out, it’s still better than having
nothing at all.

That’s exactly why I marked this patch as RFC, I wasn’t sure whether
the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
It now seems clear that modifying only sched_cpu_util() is
insufficient. Ideally, we should also update cpufreq_cooling.c, since
we cannot guarantee that all SCX BPF programs will provide an accurate
cpuperf_target.
I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.

Thanks!


>
> This is a no from me. I think we have to keep the separation clear. And
> I haven't seen a single contribution back to scheduler out of these
> 'experiments'. Clearly everyone is going their own way and getting the
> threshold for us to get patches merged even higher not to break these out of
> tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> to accept.
>
> >
> >
> > >
> > > if (!scx_switched_all())
> > > scx_util += util;
> > >
> > > util = scx_util;
> > > }
> > >
> > > return effective_cpu_util(cpu, util, NULL, NULL);
> > > }