Re: [RFC PATCH v3 02/10] timekeeping: Remove xtime_remainder from ntp_error accumulation
From: David Woodhouse
Date: Fri May 22 2026 - 07:42:29 EST
On Wed, 2026-05-20 at 14:33 +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>
> The ntp_error accumulator tracks the difference between the time actually
> reported to consumers in xtime, and the *intended* time. The former is
> subject to a sawtooth effect due to the quantisation of 'mult', which
> means that it actually advances by 'xtime_interval' each tick, while
> the intended clock advances by 'ntp_tick'.
>
> By dithering between adjacent integer values of 'mult' which result in
> an 'xtime_interval' slightly higher/lower than the intended tick length,
> the advancement of xtime is kept on average to the intended rate.
>
> The accounting should therefore adjust ntp_error by adding ntp_tick and
> subtracting xtime_interval on each tick.
>
> Since commit a386b5af8edd ("time: Compensate for rounding on
> odd-frequency clocksources") the value subtracted has been
> (xtime_interval + xtime_remainder), which is wrong. The effect is a
> systematic drift whose magnitude depends on the value of xtime_remainder
> and the NTP frequency correction. NTP masks this by continuously
> adjusting the frequency to compensate, but with a fixed frequency (or an
> external reference clock like vmclock), the drift is exposed.
>
> The value of xtime_remainder actually does represent the difference
> between the tick period and xtime_interval, so simply adding it instead
> of (+ tick length - xtime_remainder) might have made sense... except
> that it's only calculated once at boot time, so it's inaccurate anyway.
> So just kill it with fire.
>
> Also remove it from the mult computation in timekeeping_adjust(), which
> used it to offset the division for the same (incorrect) reason.
>
> Fixes: a386b5af8edd ("time: Compensate for rounding on odd-frequency clocksources")
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
> Assisted-by: Kiro:claude-opus-4.6-1m
Hi John, I note you skipped this one and acked some of the rest of the
series. Do you have concerns?
This is the key change:
> @@ -2463,8 +2462,7 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
>
> /* Accumulate error between NTP and clock interval */
> tk->ntp_error += tk->ntp_tick << shift;
> - tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) <<
> - (tk->ntp_error_shift + shift);
> + tk->ntp_error -= tk->xtime_interval << (tk->ntp_error_shift + shift);
>
> return offset;
> }
We discussed this a bit, and as far as I could tell we were in violent
agreement.
The NTP code wants the time to advance by tk->ntp_tick in each tick,
but it actually only advances by tk->xtime_interval.
They differ due to the quantization error; tk->xtime_interval is the
precise multiple of the value selected for 'mult' for the tick that is
being accounted.
Therefore, ntp_error needs to accumulate the difference between the
two. Which used to be called 'xtime_remainder'.
By adding ntp_tick and subtracting xtime_interval, the code above *is*
accumulating precisely that difference.
It doesn't need to *also* subtract the value of 'xtime_remainder'
again, as it did before. That would actually render the whole
adjustment a no-op. Ignoring units, for brevity:
Given that¹:
ntp_tick = xtime_interval + xtime_remainder
These are therefore equivalent:
ntp_error += xtime_remainder
ntp_error += ntp_tick - xtime_interval
And *this*, as we currently have in our tree, is a no-op:
ntp_error += ntp_tick - (xtime_interval + xtime_remainder)
¹ well... it *would* be precisely a no-op if xtime_remainder was
actually kept accurate and it *was* the delta between ntp_tick and
xtime_interval at any given moment. In fact it's calculated once at
boot and never updated, so the net effect of the whole adjustment which
is being modified in the patch hunk above... is to repeatedly add a
tiny bit of error *instead* of the intended adjustment?
I'm reading back your messages and trying to work out if you were
trying to tell me that this stale boot time value of xtime_remainder
is somehow actually doing something sane... but I don't believe so?
I've debugged this by adding *absolute* tracking of the 'intended'
times which I called
• B: xtime+ntp_error (where NTP wants us to be right now)
• C: xtime_ntp_error+time_offset (where NTP wants to skew to *eventually*)
With the fixes in this series, the per-tick sanity checks that
ntp_error==B-xtime, and time_offset==C-B are all passing. I'm
confident, both based on code analysis and empirical testing, that the
changes here are correct.
I can now invoke the 'timekeeping_set_reference' proof of concept which
sets tick_length and ntp_error+time_offset to ask the kernel to follow
a specific line, and watch the core timekeeping *clamp* to it and
follow it with basically 0ns jitter; occasionally showing 1ns as the
inter-tick jitter due to 'mult' dithering shows up.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature