Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()
From: Vincent Guittot
Date: Thu Mar 19 2026 - 03:09:31 EST
On Thu, 19 Mar 2026 at 03:13, Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
>
> On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 03/18/26 13:55, Vincent Guittot wrote:
> > > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > > index bf948db905ed..20adb6fede2a 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > > >
> > > > > unsigned long sched_cpu_util(int cpu)
> > > > > {
> > > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > > > + unsigned long util = scx_cpuperf_target(cpu);
> > > > > +
> > > > > + if (!scx_switched_all())
> > > > > + util += cpu_util_cfs(cpu);
> > > > > +
> > > > > + return effective_cpu_util(cpu, util, NULL, NULL);
> > > > > }
> > > >
> > > > This puts the common case of no ext muck into the slow path of that
> > > > static_branch.
> > >
> > > +1
> > > I was about to same
> > >
> > > >
> > > > This wants to be something like:
> > > >
> > > > unsigned long sched_cpu_util(int cpu)
> > > > {
> > > > unsigned long util = cpu_util_cfs(cpu);
> > > >
> > > > if (scx_enabled()) {
> > > > unsigned long scx_util = scx_cpuperf_target(cpu);
> > >
> > > also scx_cpuperf_target() does not reflect the utilization of the CPU
> > > but the targeted perfromance level
> >
> > Beside that, this sort of plug-and-play is a big concern. You picked up sched
> > ext and changed the behavior, then you'd need to get your thermal management to
> > work with that. Not retrospectively sprinkle these hacks around to force things
> > to work again.
>
> In fact, I had considered this issue even before sending this patch.
> Our initial fix was to modify the thermal subsystem specifically,
> in cpufreq_cooling.c, when SCX is enabled, we stopped calling
> sched_cpu_util() and instead used idle-time-based calculation to
> obtain CPU util.
>
> However, we later realized that sched_cpu_util() is used not only in
> cpufreq_cooling.c but also in dtpm_cpu.c.
> Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it.
> This led us to propose the current patch.
> Although scx_cpuperf_target may not accurately reflect the true CPU
> utilization, as Christian pointed out, it’s still better than having
> nothing at all.
scx_cpuperf_target() reflects the targeted performance level, not the
utilisation. This is good for cpufreq but not here as you can have a
high performance target with low cpu utilization. You must provide the
right value
>
> That’s exactly why I marked this patch as RFC, I wasn’t sure whether
> the proper fix should be in cpufreq_cooling.c or in sched_cpu_util().
> It now seems clear that modifying only sched_cpu_util() is
> insufficient. Ideally, we should also update cpufreq_cooling.c, since
> we cannot guarantee that all SCX BPF programs will provide an accurate
> cpuperf_target.
> I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly.
>
> Thanks!
>
>
> >
> > This is a no from me. I think we have to keep the separation clear. And
> > I haven't seen a single contribution back to scheduler out of these
> > 'experiments'. Clearly everyone is going their own way and getting the
> > threshold for us to get patches merged even higher not to break these out of
> > tree 'experiments' is a big nuance. I think the sugov one was already a mistake
> > to accept.
> >
> > >
> > >
> > > >
> > > > if (!scx_switched_all())
> > > > scx_util += util;
> > > >
> > > > util = scx_util;
> > > > }
> > > >
> > > > return effective_cpu_util(cpu, util, NULL, NULL);
> > > > }