Re: [PATCH v3 01/10] KVM: VMX: Refresh GUEST_PENDING_DBG_EXCEPTIONS.BS on all injected #DBs

From: Hou Wenlong

Date: Thu May 21 2026 - 08:39:17 EST


On Wed, May 20, 2026 at 09:11:20AM -0700, Sean Christopherson wrote:
> On Fri, May 15, 2026, Sean Christopherson wrote:
> > Move KVM's stuffing of GUEST_PENDING_DBG_EXCEPTIONS.BS when RFLAGS.TF=1 and
> > MOV/POP SS or STI blocking is active into the exception injection code so
> > that KVM fixes up the VMCS for all injected #DBs, not only those that are
> > reflected back into the guest after #DB interception. E.g. if KVM queues
> > a #DB in the emulator, or more importantly if userspace does save/restore
> > exactly on the #DB+shadow boundary, then KVM needs to massage the VMCS to
> > avoid the VM-Entry consistency check.
> >
> > Opportunistically update the wording of the comment to describe the
> > behavior as a workaround of flawed CPU behavior/architecture, to make it
> > clear that the *only* thing KVM is doing is fudging around a consistency
> > check. Per the SDM:
> >
> > There are no pending debug exceptions after VM entry if any of the
> > following are true:
> >
> > * The VM entry is vectoring with one of the following interruption
> > types: external interrupt, non-maskable interrupt (NMI), hardware
> > exception, or privileged software exception.
> >
> > I.e. forcing GUEST_PENDING_DBG_EXCEPTIONS.BS does *not* impact guest-
> > visible behavior.
> >
> > Fixes: b9bed78e2fa9 ("KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Reported-by: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
> > Closes: https://lore.kernel.org/all/b1a294bc9ed4dae532474a5dc6c8cb6e5962de7c.1757416809.git.houwenlong.hwl@xxxxxxxxxxxx
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 35 ++++++++++++++++++-----------------
> > 1 file changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 1701db1b2e18..a0a0ccf342d3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1909,6 +1909,24 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
> > u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> >
> > + /*
> > + * When injecting a #DB, single-stepping is enabled in RFLAGS, and STI
> > + * or MOV-SS blocking is active, set vmcs.PENDING_DBG_EXCEPTIONS.BS to
> > + * prevent a false positive from VM-Entry consistency check. VM-Entry
> > + * asserts that a single-step #DB _must_ be pending in this scenario,
> > + * as the previous instruction cannot have toggled RFLAGS.TF 0=>1
> > + * (because STI and POP/MOV don't modify RFLAGS), therefore the one
> > + * instruction delay when activating single-step breakpoints must have
> > + * already expired. However, the CPU isn't smart enough to peek at
> > + * vmcs.VM_ENTRY_INTR_INFO_FIELD and so doesn't realize that yes, there
> > + * is indeed a #DB pending/imminent.
> > + */
> > + if (ex->vector == DB_VECTOR &&
> > + (vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> > + vmx_get_interrupt_shadow(vcpu))
> > + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> > + vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
>
> Pulling in a Sashiko comment:
>
> : By restricting this workaround to only when a #DB is injected, does this
> : leave the VM vulnerable to a VM-Entry failure regression after live migration?
> :
> : KVM does not export GUEST_PENDING_DBG_EXCEPTIONS to userspace via
> : KVM_GET_VCPU_EVENTS. Therefore, upon migration, the destination KVM
> : initializes the VMCS with GUEST_PENDING_DBG_EXCEPTIONS=0.
> :
> : If a live migration occurs when the guest is in an active interrupt shadow
> : with RFLAGS.TF=1, but a different event is pending (or no event is pending
> : due to a host timer preemption), this DB_VECTOR check is skipped or
> : vmx_inject_exception() is never called.
> :

Nice AI review. This reminds me that before writing v1 I was wondering:
if KVM_GUESTDBG_SINGLESTEP is enabled and there is no #DB injection,
then single-stepping over STI should fail as well. Then I noticed the
following code:
```
/* When single-stepping over STI and MOV SS, we must clear the
* corresponding interruptibility bits in the guest state. Otherwise
* vmentry fails as it then expects bit 14 (BS) in pending debug
* exceptions being set, but that's not correct for the guest
* debugging
* case. */
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
vmx_set_interrupt_shadow(vcpu, 0);
```

However, at that time I was missing the live-migration case.

> : Crnsequently, KVM will attempt VM-Entry with TF=1, shadow=1, and BS=0.
> : The Intel SDM mandates that if RFLAGS.TF=1 and STI or MOV SS blocking is
> : active, the VM-Entry consistency check requires
> : GUEST_PENDING_DBG_EXCEPTIONS.BS=1. The hardware VM-Entry will fail due to
> : invalid guest state.
> :
> : Since vmx_guest_state_valid() does not check the GUEST_PENDING_DBG_EXCEPTIONS
> : field, KVM's emulation_required flag evaluates to false. KVM then falls
> : into the error path in __vmx_handle_exit(), dumping the VMCS and crashing
> : the guest by returning KVM_EXIT_FAIL_ENTRY to userspace.
> :
> : Does KVM need to handle the BS bit requirement in a broader context to
> : account for live migration when no #DB is being injected?
>
> Yes, but that's a different problem entirely[*], and isn't even solvable on AMD
> because SVM lacks an equivalent for GUEST_PENDING_DBG_EXCEPTIONS. Note, only
> MOV/POP-SS blocking matters, because STI blocking doesn't prevent single-step
> #DBs, and single-step #DBs have higher priority than IRQs.
>

Besides that, if MOV/POP SS is emulated with single-stepping, the
emulator currently injects a #DB. As a result, the guest would observe a
single-step #DB during the MOV/POP SS shadow, as stated in Intel SDM
28.7.1:

"If the VM entry is vectoring, there is no blocking by STI or by MOV SS
following the VM entry, regardless of the contents of the
interruptibility-state field."

If we try to fix this by not injecting #DB, then VM entry fails again.
Since emulating MOV/POP SS only happens with force emulation, perhaps we
can simply document this behavior?

Thanks!

> [*] https://lore.kernel.org/all/agUgeO5QNenQM9pT@xxxxxxxxxx