[PATCH v3 0/1] AMD VM crashing on deferred memory error injection
From: “William Roche
Date: Tue Mar 17 2026 - 06:45:19 EST
From: William Roche <william.roche@xxxxxxxxxx>
v3 changes:
- Commit title and message changed to put the emphasis on SMCA access
correctness - Borislav Petkov feedback.
v2 changes:
- Commit title changed to:
x86/mce/amd: Fix VM crash during deferred error handling
- Commit message with capitalized QEMU and KVM as well as the imperative
statement suggested by Yazen
- "CC stable" tag placed after "Signed-off-by"
(The documentation asks for "the sign-off area" without more details)
- blank line added to separate SMCA code block and the update of
MCA_STATUS.
--
After the integration of the following commit:
7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling
A problem was found with AMD Qemu VM - it started to crash when dealing
with deferred memory error injection with a stack trace like:
mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)
amd_clear_bank+0x6e/0x70
machine_check_poll+0x228/0x2e0
? __pfx_mce_timer_fn+0x10/0x10
mce_timer_fn+0xb1/0x130
? __pfx_mce_timer_fn+0x10/0x10
call_timer_fn+0x26/0x120
__run_timers+0x202/0x290
run_timer_softirq+0x49/0x100
handle_softirqs+0xeb/0x2c0
__irq_exit_rcu+0xda/0x100
sysvec_apic_timer_interrupt+0x71/0x90
[...]
Kernel panic - not syncing: MCA architectural violation!
See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@xxxxxxxxxx/
We identified a problem with SMCA specific registers access from
non-SMCA platforms like a QEMU/KVM machine.
This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.
William Roche (1):
x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
--
2.47.3