Re: kvm guests crash when running "perf kvm top"

From: Jim Mattson

Date: Thu Mar 19 2026 - 00:06:46 EST


On Tue, Mar 17, 2026 at 9:02 AM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
>
> On Wed, Apr 9, 2025 at 10:05 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> > Long story short, masking PEBS_ENABLE with the guest's value (in addition to
> > what perf allows) fixes the issue on my end. Assuming testing goes well, I'll
> > post this as a proper patch.
> >
> > --
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index cdb19e3ba3aa..1d01fb43a337 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -4336,7 +4336,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > arr[pebs_enable] = (struct perf_guest_switch_msr){
> > .msr = MSR_IA32_PEBS_ENABLE,
> > .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> > - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
> > + .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
> > };
> >
> > if (arr[pebs_enable].host) {
>
> Because kvm_pmu->pebs_enable is optimistic, I think we also need to
> swap these two modifications to guest PGC in the code below:
>
> arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
> /* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
> arr[global_ctrl].guest |= arr[pebs_enable].guest;

Doh! The line above those two lines is:

arr[pebs_enable].guest &= ~kvm_pmu->host_cross_mapped_mask;

So, the ordering of the global_ctrl modifications is irrelevant.

> By the way, IA32_PEBS_ENABLE can be modified in NMI context by
> handle_pmi_common(), so the host value returned from this function may
> already be stale. We have seen cases where handle_pmi_common() clears
> a bit in IA32_PEBS_ENABLE between here and the next VM-entry. VM-exit
> restores the stale value with the bit set, and that bit persists
> indefinitely. If the next perf event assigned to that PMC is not a
> PEBS event, it magically becomes one. When an NMI arrives for PEBS
> buffer overflow, perf refuses to claim it, because it doesn't think
> any PEBS events are active. So, we get an "Uhhuh. NMI received for
> unknown reason" message on the console. A flood of these is enough to
> trigger the NMI watchdog and cause a panic.
>
> I think we need a fixup after VM-exit to clear any IA32_PEBS_ENABLE
> bits that were cleared by handle_pmi_common() between
> intel_guest_get_msrs() and VM-entry, but I'm not sure what the best
> API might be. Calling intel_guest_get_msrs() again seems too
> heavyweight. Maybe KVM could ask perf nicely to just rewrite the MSR
> with cpuc->pebs_enabled? Note that any erroneous IA32_PEBS_ENABLE bits
> are dormant post VM-exit, since any PMCs throttled by
> handle_pmi_common() will have their enable bits cleared in the
> corresponding event selector.