Re: [PATCH] cpufreq: cppc: Reduce cppc delivered perf sampling jitter

From: Breno Leitao

Date: Wed Jun 03 2026 - 06:54:56 EST


On Tue, Jun 02, 2026 at 04:20:52PM -0500, Jeremy Linton wrote:
> CPPC uses a pair of registers cycling at different frequencies to
> determine an accumulated performance level. For userspace reporting we
> want to convert this to an instantaneous CPU frequency, but over short
> time periods small errors caused by CPPC counter reads can cause
> fairly significant reported frequency variations even when the core
> CPU clock isn't changing.
>
> Reduce this by keeping a start sample fixed and retrying the end
> sample until the counter deltas are large enough to reduce short
> window error, or until adjacent delivered performance estimates are
> within the CPU's observed CPPC read noise floor.
>
> To begin, resample the initial pair a small fixed number of times
> looking for matching delivered performance deltas. This reduces the
> chance that a disturbed start sample anchors the rest of the
> calculation.
>
> Then look for an end sample while updating the noise floor from the
> best error seen between samples. The floor remains zero on systems
> with stable feedback reads, but lets noisy systems stop early once
> another retry is unlikely to improve the result. The retry loop is
> capped at 200 iterations, giving an ~20 usec explicit delay budget
> derived from ndelay(100).
>
> Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> ---
> drivers/cpufreq/cppc_cpufreq.c | 68 ++++++++++++++++++++++++++++++----
> 1 file changed, 61 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 7e7f9dfb7a24..362c08def420 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -50,7 +50,7 @@ struct cppc_freq_invariance {
> static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
> static struct kthread_worker *kworker_fie;
>
> -static int cppc_perf_from_fbctrs(u64 reference_perf,
> +static u64 cppc_perf_from_fbctrs(u64 reference_perf,
> struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> struct cppc_perf_fb_ctrs *fb_ctrs_t1);
>
> @@ -750,7 +750,7 @@ static inline u64 get_delta(u64 t1, u64 t0)
> return (u32)t1 - (u32)t0;
> }
>
> -static int cppc_perf_from_fbctrs(u64 reference_perf,
> +static u64 cppc_perf_from_fbctrs(u64 reference_perf,
> struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> struct cppc_perf_fb_ctrs *fb_ctrs_t1)
> {
> @@ -771,19 +771,71 @@ static int cppc_perf_from_fbctrs(u64 reference_perf,
> return (reference_perf * delta_delivered) / delta_reference;
> }
>
> -static int cppc_get_perf_ctrs_sample(int cpu,
> +/* CPPC read noise floor for early retry exit. */
> +static DEFINE_PER_CPU(u64, err_floor);
> +
> +#define CPPC_SAMPLE_MAX_RETRIES 200

Could the remaining tuning literals get the same treatment?
Specifically:

- the 10 initial-resample iteration count
- the 2000 multiplier in ref * 2000
- the 100 ns in ndelay(100)

Thanks
--breno