Re: [PATCH v3 37/41] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
From: Sean Christopherson
Date: Thu May 21 2026 - 17:01:55 EST
On Thu, May 21, 2026, Dongli Zhang wrote:
> On 2026-05-15 12:19 PM, Sean Christopherson wrote:
> > Prefer the TSC over kvmclock for sched_clock if the TSC is constant,
> > nonstop, and not marked unstable via command line. I.e. use the same
> > criteria as tweaking the clocksource rating so that TSC is preferred over
> > kvmclock. Per the below comment from native_sched_clock(), sched_clock
> > is more tolerant of slop than clocksource; using TSC for clocksource but
> > not sched_clock makes little to no sense, especially now that KVM CoCo
> > guests with a trusted TSC use TSC, not kvmclock.
> >
> > /*
> > * Fall back to jiffies if there's no TSC available:
> > * ( But note that we still use it if the TSC is marked
> > * unstable. We do this because unlike Time Of Day,
> > * the scheduler clock tolerates small errors and it's
> > * very important for it to be as fast as the platform
> > * can achieve it. )
> > */
> >
> > The only advantage of using kvmclock is that doing so allows for early
> > and common detection of PVCLOCK_GUEST_STOPPED, but that code has been
> > broken for over two years with nary a complaint, i.e. it can't be
> > _that_ valuable. And as above, certain types of KVM guests are losing
> > the functionality regardless, i.e. acknowledging PVCLOCK_GUEST_STOPPED
> > needs to be decoupled from sched_clock() no matter what.
>
> Has it been broken for two years because of pvclock_clocksource_read_nowd()?
Yep. Because pvclock_clocksource_read_nowd() ignores PVCLOCK_GUEST_STOPPED, the
flag only ever gets recognized when the kernel reads WALL_CLOCK, which AFAICT
only happens at initial boot, and during suspend and resume.