Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling

From: William Roche

Date: Mon Mar 16 2026 - 11:34:42 EST


On 3/13/26 21:10, Borislav Petkov wrote:
On Thu, Mar 12, 2026 at 11:44:04PM +0100, William Roche wrote:
Yazen, could you also please tell us if an existing non-SMCA AMD hardware
could crash on updating an SMCA register ?

So, the situation is this: if software needs to access a MCA_DESTATUS MSR
- which is part of AMD's MCA extensions - then software needs to check the
smca bit.

So your patch is correct. The justification about it is not.

It should talk about how software should touch that MSR *only* *after* having
checked mce_flags.smca.


Ok, I understand your point.

Because, it doesn't matter what KVM does or whoever - we all adhere to the hw
spec.

Because technically speaking, this code should blow up on non-SMCA machines
too because they do support deferred errors (Bulldozer for example) but they
will #GP on access to the MCA_DESTATUS MSRs as those are reserved there.

This is a little more complicated as Yazen raised the situation in his answer. But I agree that SMCA specific registers are reserved and should not be accessed without checking that it is allowed to do so, first.


So please rewrite your commit message to state that. And then you can talk
about what the real-life situation is which caught this.


Sure, I'm going to submit a new version of this patch using this new commit message:

x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines

Access to SMCA specific registers like MCA_DESTAT should only be done
after having checked the smca bit. Avoiding a non-SMCA machine (like
AMD QEMU/KVM VMs) crash during deferred error handling.

Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Signed-off-by: William Roche <william.roche@xxxxxxxxxx>
Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx


As to your use case - thanks for explaining it. If this is something which
people run, then it would be wonderful if we had a simple test script in the
kernel which verifies new changes don't break it and so that we can run it
periodically as part of testing.

That would be great !
If there is a framework to create simple test script running the built kernel into a VM, I'd be happy to know about it and create the test we are talking about -- as a separate fix proposal.

Thanks again for your feedback,
William.