Re: [PATCH v6 01/16] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2

From: Sean Christopherson

Date: Tue May 26 2026 - 15:14:35 EST


On Fri, May 22, 2026, Yosry Ahmed wrote:
> On Fri, May 22, 2026 at 4:45 PM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
> >
> > On Fri, May 22, 2026 at 4:10 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Wed, May 06, 2026, Yosry Ahmed wrote:
> > > > According to the APM, TF on VMRUN causes a #DB after VMRUN completes on
> > > > the _host_ side. However, KVM injects a #DB in L2 context instead (or
> > > > exits to userspace if KVM_GUESTDBG_SINGLESTEP is set) in
> > > > kvm_skip_emulated_instruction().
> > > >
> > > > Introduce __kvm_skip_emulated_instruction(),
> > >
> > > Eh, just make svm_skip_emulated_instruction() visible via svm.h and call that
> > > directly. No need to bring SVM's mess into common KVM.
> >
> > Yeah I thought about doing that. The only reason I was hesitant is
> > that if [__]kvm_skip_emulated_instruction(),
>
> if [__]kvm_skip_emulated_instruction() gains new logic*

Well, __kvm_skip_emulated_instruction() can't gain new logic if it doesn't exist,
which is why I want to avoid it. IMO, it's _more_ likely that we'd add code to
__kvm_skip_emulated_instruction() that breaks the obscure VMRUN behavior.

And if we add logic to kvm_skip_emulated_instruction(), it's seems like a forgone
conclusion that we'll have to analyzing VMRUN to see what the "right" behavior is.

> > handling it here might be missed. Also, there's only one direct caller of
> > svm/vmx_skip_emulated_instruction() (and it's TASK_SWITCH interception, who
> > cares).

handle_exception_nmi() also calls it for INT1, though that code is broken (the
INT1 should count as an instruction).

> > So I think it's more consistent and future proof to refactor
> > kvm_skip_emulated_instruction() instead.

I actually think the task switch case is the perfect argument for using
svm_skip_emulated_instruction() directly. The only flows crazy enough to warrant
bypassing normal instruction skipping behavior are legacy task switches and VMRUN.
Sounds about right to me :-)