Re: [RFC PATCH 1/4] timekeeping: Remove xtime_remainder from ntp_error accumulation
From: John Stultz
Date: Tue May 19 2026 - 17:55:18 EST
On Tue, May 19, 2026 at 2:16 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
>
> On Mon, 2026-05-18 at 18:37 -0700, John Stultz wrote:
> > On Sat, May 16, 2026 at 1:25 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> > > Thanks. This has been making my brain hurt for most of the last week,
> > > but I think I finally have a handle on it.
> > >
> > > It looks like we track three different times (ignoring units):
> > >
> > > • A: The xtime that we actually output to the vDSO/etc.
> > >
> > > • B: xtime+ntp_error is the time we *want* to be outputting right now,
> > > but the mult dithering and monotonicity clawback keep us from it.
> >
> > I think of B, or specifically ntp_error, as the delta between our time
> > and what *NTP* (well, the in-kernel ntp machinery) wants the system to
> > be right now.
>
> Right. I think we're entirely on the same page there. The "right now"
> is the key.
>
> > As an aside, apologies if I'm asking obvious questions here, some of
> > your terminology is unfamiliar. While it's not used around the code or
> > patches, I can understand the dithering metaphor for the long-term
> > error adjustments to effectively allow for sub-integer mult
> > adjustments over time (similar to b&w dithering to approximate levels
> > of grey), but there is also the common use of indecisively dithering
> > time away or the astrophotography sense of intentionlally adding
> > noise, which I worry might cause some confusion to others as to what
> > you mean.
>
> Indeed. The two adjacent values of 'mult' will effectively make xtime
> proceed slightly faster than, or slightly slower than, the actual
> desired rate. It is exactly 'dithering' in the sense of approximating
> levels of grey, tracking ntp_error precisely in order to choose between
> mult vs. mult+1 for the next tick.
Its a nice analogy. Though if you're going to use it in your commit
messages, it might be good to also add a comment to the
timekeeping_adjustment() logic so there is an clear anchor to the code
for folks who might not be as familiar with the detail.
> > Also I'm not sure its very clear what you mean by "monotonicity clawback".
>
> This is the offset applied by timekeeping_apply_adjustment() in order
> to ensure that the observed xtime remains monotonic when the dithering
> switches back from 'mult+1' to 'mult' and a consumer may have seen a
> 'later' time than it's about to set in {cycle_last,mult}.
>
This one I find less clarifying, but I do recognize "adjustments to
the base xtime_nsec made when adjusting the multiplier due to
unaccumulated cycles" is a mouthful.
The "xtime_nsec adjustment in timekeeping_apply_adjustment()" is
probably easier/clearer?
> > > So ntp_error, being the delta between (B) and (A), needs to advance by
> > > tick_length - xtime_interval. Before this patch, xtime_remainder was
> > > *also* being subtracted from the 'what xtime advanced' side, but it
> > > isn't actually added to xtime; it *is* roughly the amount that needed
> > > to be accumulated in ntp_error here (except for the fact that
> > > xtime_remainder was calculated once at boot time and never updated).
> >
> > Again, I'm sure it could be miscalculated, or be misapplied, but as I
> > mentioned previously, the xtime_remainder is trying to address a
> > granularity error that is effectively baked into the delta between
> > xtime_interval and the initial ntp interval (essentially the initial
> > ntp_tick), which doesn't seem to be addressed here.
>
> My understanding is that xtime_remainder *is* the delta between the
> initial xtime_interval and the initial ntp_interval. Calculated once at
> boot and then permanently out of sync when mult and thus xtime_interval
> actually change.
Yes, it is the difference between the initial xtime_interval and the
initial ntp_interval, but maybe the key here is to ask "if there is no
NTP adjustment being made why is there a delta between these two
initial values?"
> By simply adding ntp_interval and then subtracting the correct current
> value of xtime_interval, logarithmic_accumulation doesn't *need* a
> separate variable which tracks that delta. I don't quite understand why
> it ever did add it in the first place.
Again, it's due to the error caused by the granularity of the
clocksource (against the HZ interval). That initial delta will
otherwise cause an accurate clock to get erroniously steered either
faster or slower when no NTP adjustments being made. The subtracting
of that remainder is just trying to compensate for this base error.
Now, it is imperfect because it's a constant adjustment and not
proportioned to the NTP adjustments that might be later made.
And clearly there is some issue as you're having problems with it
using a fine-grained clocksource like the TSC where it really
shouldn't be a major factor.
> > For fine-grained clocksources like the TSC its not likely a big issue,
> > but for coarser grained clocksources it seems like just removing this
> > would be a regression.
>
> I think it should be fine with coarser grained clocksources. The
> dithering sawtooth around the reference line will have a higher
> amplitude, but adding ntp_interval and subtracting xtime_interval for
> the mult value currently in effect is still the right thing to do.
I still worry ignoring it could cause behavior regressions on hardware
with coarse clocksources, and feels a little short sighted.
But again, I am excited for your work here and your deep investigation
of these behavioral issues you've found.
thanks
-john