Re: [PATCH v4 4/6] KVM: x86/pmu: Re-evaluate Host-Only/Guest-Only on nested SVM transitions
From: Jim Mattson
Date: Thu Apr 09 2026 - 23:50:35 EST
On Thu, Apr 9, 2026 at 2:21 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Apr 09, 2026, Sean Christopherson wrote:
> > On Thu, Apr 09, 2026, Jim Mattson wrote:
> > > On Thu, Apr 9, 2026 at 10:48 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > > On Thu, Apr 09, 2026, Jim Mattson wrote:
> > > > > > > In general, this deferral is misguided. The G/H bits should be
> > > > > > > re-evaluated before we call kvm_pmu_instruction_retired() for an
> > > > > > > emulated instruction.
> > > > > > >
> > > > > > > > ...
> > > > > > > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > > > > > > > index f1c29ac306917..966e4138308f6 100644
> > > > > > > > --- a/arch/x86/kvm/x86.h
> > > > > > > > +++ b/arch/x86/kvm/x86.h
> > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > #include "kvm_cache_regs.h"
> > > > > > > > #include "kvm_emulate.h"
> > > > > > > > #include "cpuid.h"
> > > > > > > > +#include "pmu.h"
> > > > > > > >
> > > > > > > > #define KVM_MAX_MCE_BANKS 32
> > > > > > > >
> > > > > > > > @@ -152,6 +153,8 @@ static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
> > > > > > > > {
> > > > > > > > vcpu->arch.hflags |= HF_GUEST_MASK;
> > > > > > > > vcpu->stat.guest_mode = 1;
> > > > > > > > +
> > > > > > > > + kvm_pmu_handle_nested_transition(vcpu);
> > > > > > > > }
> > > > > > >
> > > > > > > This happens too late for VMRUN, since we have already called
> > > > > > > kvm_pmu_instruction_retired() via kvm_skip_emulated_instruction(), and
> > > > > > > VMRUN counts as a *guest* instruction.
> > > > > >
> > > > > > It's just VMRUN that's problematic though, correct? I.e. the scheme as a whole
> > > > > > is fine, we just need to special case VMRUN due to SVM's erratum^Warchitecture.
> > > > > > Alternatively, maybe we could get AMD to document the silly VMRUN behavior as an
> > > > > > erratum, then we could claim KVM is architecturally superior. :-D
> > > > >
> > > > > Here, it's just VMRUN. Above, it's WRMSR(EFER).
> > > >
> > > > But clearing EFER.SVME while in the guest generates architecturally undefined
> > > > behavior. I don't see any reason to complicate PMU virtualization for that
> > > > scenario, especially now that KVM synthesizes triple fault for L1.
> > >
> > > L1 can clear the virtual EFER.SVME. That is well-defined.
> >
> > Gah, I forgot that the H/G bits are ignored when EFER.SVME=0. That's really
> > annoying.
>
> What do you think about having two flavors of kvm_pmu_handle_nested_transition()?
> One that defers via a request, and a "special" (SVM-only?) version that does
> direct updates.
When would we use the deferred version? As far as the Intel PMU is
concerned. there's nothing special about a nested transition.
> Poking into PMU state in arbitrary contexts makes me nervous. E.g. when calling
> svm_leave_nested(), odds are good EFER isn't even correct, and the update *needs*
> to be deferred.
>
> I definitely don't love having two separate update mechanisms, but it seems like
> the safest option in this case.