Re: [PATCH 0/7] KVM: x86: APX reg prep work

From: Chang S. Bae

Date: Wed Mar 25 2026 - 14:29:00 EST

On 3/12/2026 10:47 AM, Sean Christopherson wrote:

On Thu, Mar 12, 2026, Chang S. Bae wrote:

However, that is sort of what-if scenarios at best. The host kernel still
manages EGPR context switching through XSAVE. Saving EGPRs into regs[] would
introduce an oddity to synchronize between two buffers: regs[] and
gfpu->fpstate, which looks like unnecessary complexity.

No, this looks ugly. If guest EGPR state is saved in vcpu->arch.regs[], the APX area there isn't necessary:

When the KVM API exposes state in XSAVE format, the frontend can handle this separately. Alongside uABI <-> guest fpstate copy functions, new copy functions may deal with the state between uABI <-> VCPU cache.

Further, one could think of exclusion as such:

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 76153dfb58c9..5404f9399eea 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -794,9 +794,10 @@ static u64 __init guest_default_mask(void)
{
/*
* Exclude dynamic features, which require userspace opt-in even
- * for KVM guests.
+ * for KVM guests, and APX as extended general-purpose register
+ * states are saved in the KVM cache separately.
*/
- return ~(u64)XFEATURE_MASK_USER_DYNAMIC;
+ return ~((u64)XFEATURE_MASK_USER_DYNAMIC | XFEATURE_MASK_APX);
}

But this default bitmask feeds into the permission bits:

fpu->guest_perm.__state_perm = guest_default_cfg.features;
fpu->guest_perm.__state_size = guest_default_cfg.size;

This policy looks clear and sensible: permission is granted only if space is reserved to save the state. If there is a strong desire to save memory, I think it should go through a more thorough review to revisit this policy.

Have you measured performance/latency overhead if KVM goes straight to context
switching R16-R31 at entry/exit? With PUSH2/POP2, it's "only" 8 more instructions
on each side.

Yup, when I check a prototype in the lab, it appears to be in the noise, with less than 1% overall variance.

If the overhead is in the noise, I'd be very strongly inclined to say KVM should
swap at entry/exit regardless of kernel behavior so that we don't have to special
case accesses on the back end.

Note: The hardware request discussed looks to be on-going. I don't know the decision yet. But at least for now let me add you to the off-list thread for your info.

Right now, I think the entry path may live with guest XCR0 in this regard. Since XSETBV is trapped/emulated, the shadow XCR0 remains in sync. The entry function can take an additional flag reflecting guest XCR0.APX, and gate EGPR access accordingly.

Then, it looks to keep the behavior aligned with the architecture:
* On initial enable, EGPRs are zeroed on entry following XSETBV exit
* If APX is disabled and later re-enabled, regs[] retains the state
while XCR0.APX=0 and restores it when returning from the re-enabling
XSETBV exit.