Re: [PATCH] KVM: x86: Add support for cmpxchg16b emulation

From: Sairaj Kodilkar

Date: Tue Mar 31 2026 - 05:33:38 EST

On 3/12/2026 5:16 PM, Paolo Bonzini wrote:

On 3/6/26 16:38, Sean Christopherson wrote:

-    if (ctxt->dst.bytes == 16)
+    /* Use of the REX.W prefix promotes operation to 128 bits */
+    if (ctxt->rex_bits & REX_W)

Uh, yeah, and the caller specifically pivoted on this size. I'm a-ok with a
sanity check, but it should be exactly that, e.g.

    if (WARN_ON_ONCE(8 + (ctxt->rex_bits & REX_W) * 8 != ctxt->dst.bytes))
        return X86EMUL_UNHANDLEABLE;

And then emulator_cmpxchg_emulated() should be taught to do cmpxchg16b itself:

    /* guests cmpxchg8b have to be emulated atomically */
    if (bytes > 8 || (bytes & (bytes - 1)))
        goto emul_write;

That might be a big lift, e.g. to get a 16-byte uaccess version. If it's
unreasonably difficult, we should reject emulation of LOCK CMPXCHG16B.

It's actually pretty easy because a kind soul already wrote the
cmpxchg8b version of the macros. Compile-tested only (and checked
that the asm does have the instruction):

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3a7755c1a4410..f40e815263233 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -388,14 +388,13 @@ do {                                    \
         *_old = __old;                        \
     likely(success);                    })

-#ifdef CONFIG_X86_32
-#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({    \
+#define __try_cmpxchg8b_16b_user_asm(_b, _ptr, _pold, _new, label)({    \
     bool success;                            \
     __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);        \
     __typeof__(*(_ptr)) __old = *_old;                \
     __typeof__(*(_ptr)) __new = (_new);                \
     asm_goto_output("\n"                        \
-             "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n"        \
+             "1: " LOCK_PREFIX "cmpxchg" #_b "b %[ptr]\n"    \

Hi Paolo,

I think its better to keep two implementations separate because this
adds an additional complexity while passing the upper and lower half
of the "_new"
The cmpxchg8b expects ecx as "(u32)((u64)_new >> 32)" where as
cmpxchg16b expects ecx as "(u64)((u128)_new >> 64)"

Thanks
Sairaj