Re: [RFC PATCH 1/4] timekeeping: Remove xtime_remainder from ntp_error accumulation
From: David Woodhouse
Date: Tue May 19 2026 - 21:01:53 EST
On Tue, 2026-05-19 at 17:01 -0700, John Stultz wrote:
> On Tue, May 19, 2026 at 3:18 PM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, 2026-05-19 at 14:54 -0700, John Stultz wrote:
> > >
> > > > By simply adding ntp_interval and then subtracting the correct current
> > > > value of xtime_interval, logarithmic_accumulation doesn't *need* a
> > > > separate variable which tracks that delta. I don't quite understand why
> > > > it ever did add it in the first place.
> > >
> > > Again, it's due to the error caused by the granularity of the
> > > clocksource (against the HZ interval). That initial delta will
> > > otherwise cause an accurate clock to get erroniously steered either
> > > faster or slower when no NTP adjustments being made. The subtracting
> > > of that remainder is just trying to compensate for this base error.
> >
> > I must be confused. We *are* subtracting that remainder, not ignoring
> > it.
> >
> > The value of xtime_remainder was (tick_length - xtime_interval). It's
> > the discrepancy between the distance we *want* the clock to move in a
> > single tick, and the distance it *does* move in a single tick at a
> > given value of 'mult'.
> >
> > So we add tick_length, and we subtract xtime_interval. Thus adjusting
> > ntp_error by the difference between them.
> >
> > I don't understand why the old code was then *also* subtracting a stale
> > value of xtime_remainder which had been calculated at boot time.
> >
> > Am I missing something here?
>
> Hrm. Apologies, I fret I'm just repeating myself, and I'm not sure how
> to reconceptualize my point to be more helpful. :/
>
> But you skipped my question in my last mail, which I do think will help:
> * If there is no adjustment being made, why would there be a
> difference between the initial ntp_interval and the initial
> xtime_interval?
Because the ntp_interval is not a precisely multiple of the cycle
period?
The example I've been working with all week is 1ms tick, 2400MHz TSC.
So that is 1000000<<24 "shifted nanoseconds", or 16777216000000,
divided by 2400000 to give a 'mult' value of 6990506.667.
Truncated to an integer, that's 6990506, giving an xtime_interval of
16777214400000 and xtime_remainder of 1600000.
The dithering will do a duty cycle of one tick at 6990506 to two ticks
at 6990507, to achieve the overall desired rate.
> * Why would it exist? And if it does exist, what does it represent?
It represents the difference between the xtime *actually* added during
a tick (xtime_interval), and the time we *wanted* to add during a tick
(ntp_interval).
> * And if the frequency (cyc2ns) is correct, and no adjustment is being
> made, is it not erroneous to be adjusting the clock to correct for
> this accumulating delta?
I don't think it does accumulate in that state, does it? It sawtooths
around zero as we dither between mult and mult+1. And in my example,
over the course of *three* ticks it'll reach zero again, as
3*ntp_interval == 2*xtime_interval(mult+1) + 1*xtime_interval(mult)
> Maybe does it help to thinking of it it as quantization error from a
> single cycle length? And how it may not be able to match the
> tick_length at various HZ?
Right. Because it's all about the *fractional* part of 'mult'.
> But the quantization error doesn't have anything to do with the
> accuracy of that cycle length.
Right. That part is purely about arithmetic precision.
> But if not accounted for, that error accumulates in ntp_error and
> results in unwanted frequency adjustments.
Right. If ntp_error doesn't account for xtime_remainder each tick (or
more accurately, by +ntp_interval-xtime_interval), then errors will
accumulate and result in *effective* frequency adjustments because the
'mult' dithering will not follow the duty cycle that it should, and
xtime will not advance at the correct rate.
And since NTP works on a feedback loop, that in turn will result in
*actual* frequency adjustments (and alterations to tick_length) because
NTP will interpret the error-induced drift as the underlying oscillator
going faster/slower.
>
> > I've updated the commit message:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=459ebaef612
> >
> > > Now, it is imperfect because it's a constant adjustment and not
> > > proportioned to the NTP adjustments that might be later made.
> > >
> > > And clearly there is some issue as you're having problems with it
> > > using a fine-grained clocksource like the TSC where it really
> > > shouldn't be a major factor.
> >
> > None of it matters if you have NTP running and constantly re-adjusting
> > your clock. You can have a +50PPM drift through errors in the tracking,
> > and NTP will just assume your oscillator is running 50PPM fast and
> > adjust accordingly.
> >
> > But when you set the kernel to follow a precise y=mx+c line for the TSC
> > to real time conversion and *expect* it to do as it's told while
> > tracking the divergence to the nanosecond... all counters are course-
> > grained :)
>
> Something maybe as an alternative approach:
> What if instead of tinkering with the xtime_remainder separately, we
> just made the adjustment to the initial ntp_interval (propagated to
> the tk->ntp_tick) to handle the granularity error? This would maybe
> more intuitively address the conceptual "what the ntp machinery wants
> time to be" issue?
Does that help? Aren't we continuously adjusting the tick_length
because that's how we accommodate actual frequency changes of the
counter?
> Though we'd need to propagate that out to the ntp.c logic so it would
> show up in ntp_tick_length() (and maybe even further up into
> tick_usec). But then I worry the ntp userland might blindly overwrite
> this on startup, so maybe this approach won't work.
>
> Another thing to consider: we could maybe use some threshold for
> applying the granularity error correction, so we only apply the
> correction when the clocksource is coarse and the error is large
> enough to warrant it?
I'm still lost. What's the problem we're trying to solve here now?
AIUI the granularity error correction is simple; we add tick_length and
subtract xtime_interval, which is basically equivalent to adding the
*correct* value of xtime_remainder each tick. It's what this code was
always trying to do; it just got it wrong. Now it's fixed, why would we
only do it *sometimes*?
This tracking is precisely what ensures that the dithering picks 'mult'
vs. 'mult+1' in the right duty cycle. It's not just for coarse
clocksources; even my 2400MHz TSC will lose ~95ns/s if we don't track
this.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature