Re: VMX Preemption Timer appears to be buggy on SKX, CLX, and ICX
From: Chao Gao
Date: Thu Jun 04 2026 - 22:56:55 EST
On Thu, Jun 04, 2026 at 02:59:45PM -0700, Jim Mattson wrote:
>?
>
>On Thu, Jun 4, 2026 at 12:58 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>>
>> On Wed, Jun 03, 2026, Jim Mattson wrote:
>> > On Thu, May 14, 2026 at 11:35 PM Chao Gao <chao.gao@xxxxxxxxx> wrote:
>> > >
>> > > >> EMR158. VMX-Preemption Timer May Expire Earlier With Certain Large Timer Values
>> > > >
>> > > >I assume the same erratum applies to previous generations as well?
>> > >
>> > > Yes.
>> >
>> > This test still fails on our SKX, CLX, and ICX systems.
>> >
>> > Sean,
>> >
>> > Were you thinking of enforcing a cap on delta_tsc in vmx_set_hv_timer()?
>>
>> Heh, to be honest, I wasn't thinking of a whole lot of nothing. Falling back to
>> hrtimers does seem like the easiest solution.
>
>I think vmx_set_hv_timer() should return -EINVAL for values impacted
>by this erratum. However, the only documented issue is for EMR, and we
>have not observed the problem on EMR. That's unsettling.
Could you clarify what tests you ran?
I am using the reproducer from Yuan:
https://lore.kernel.org/kvm/20240708055559.rl4w5xfhj3uru6j2@yy-desk-7060/
I write -1 to the VMX preemption timer, do VM-Enter, and have the guest
execute VMCALL to force a VM-Exit. On VM-Exit, we read back the preemption
timer. The delta should be very small; otherwise, the platform likely has the
same issue.
I tested several platforms, including EMR. The results are consistent with the
erratum, i.e., I observed premature VMX preemption-timer VM-Exits, and the
documented limit did not trigger premature VMX preemption-timer VM-Exits in my
testing.
>
>Chao:
>
>1) Should we just assume that all Intel CPUs are affected?
I think that is reasonable unless we have explicit evidence to exclude specific
parts.
>
>2) Is there any compelling reason not to simplify the limit to 2^25?
We can use 2^25 as a conservative bound, but it is much lower than necessary.
The current bound comes from theoretical analysis and was validated on multiple
platforms.
>
>3) Is it just coincidence that 25 + IA32_VMX_MISC[4:0] (on EMR) == 32,
>or should the limit be calculated as 32 - IA32_VMX_MISC[4:0]?
My understanding is that hardware scales the preemption-timer value and
converts it to a 32-bit core crystal clock counter, rather than directly
using a 32-bit TSC delta. IA32_VMX_MISC[4:0] likely participates in that
calculation.