Re: [PATCH 1/3] x86: KVM: VMX: Wrap GUEST_IA32_DEBUGCTL read/write with access functions

From: mlevitsk
Date: Thu May 01 2025 - 16:35:21 EST


On Tue, 2025-04-22 at 16:33 -0700, Sean Christopherson wrote:
> On Tue, Apr 15, 2025, Maxim Levitsky wrote:
> > Instead of reading and writing GUEST_IA32_DEBUGCTL vmcs field directly,
> > wrap the logic with get/set functions.
>
> Why? I know why the "set" helper is being added, but it needs to called out.
>
> Please omit the getter entirely, it does nothing more than obfuscate a very
> simple line of code.

In this patch yes. But in the next patch I switch to reading from 'vmx->msr_ia32_debugctl'
You want me to open code this access? I don't mind, if you insist.

>
> > Also move the checks that the guest's supplied value is valid to the new
> > 'set' function.
>
> Please do this in a separate patch. There's no need to mix refactoring and
> functional changes.

I thought that it was natural to do this in a the same patch. In this patch I introduce
a 'vmx_set_guest_debugctl' which should be used any time we set the msr given
the guest value, and VM entry is one of these cases.

I can split this if you want.

>
> > In particular, the above change fixes a minor security issue in which L1
>
> Bug, yes. Not sure it constitutes a meaningful security issue though.

I also think so, but I wanted to mention this just in case.

>
> > hypervisor could set the GUEST_IA32_DEBUGCTL, and eventually the host's
> > MSR_IA32_DEBUGCTL
>
> No, the lack of a consistency check allows the guest to set the MSR in hardware,
> but that is not the host's value.

That's what I meant - the guest can set the real hardware MSR. Yes, after the
guest exits, the OS value is restored. I'll rephrase this in v2.

>
> > to any value by performing a VM entry to L2 with VM_ENTRY_LOAD_DEBUG_CONTROLS
> > set.
>
> Any *legal* value. Setting completely unsupported bits will result in VM-Enter
> failing with a consistency check VM-Exit.

True.

>
> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/nested.c | 15 +++++++---
> > arch/x86/kvm/vmx/pmu_intel.c | 9 +++---
> > arch/x86/kvm/vmx/vmx.c | 58 +++++++++++++++++++++++-------------
> > arch/x86/kvm/vmx/vmx.h | 3 ++
> > 4 files changed, 57 insertions(+), 28 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index e073e3008b16..b7686569ee09 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2641,6 +2641,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx);
> > bool load_guest_pdptrs_vmcs12 = false;
> > + u64 new_debugctl;
> >
> > if (vmx->nested.dirty_vmcs12 || nested_vmx_is_evmptr12_valid(vmx)) {
> > prepare_vmcs02_rare(vmx, vmcs12);
> > @@ -2653,11 +2654,17 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> > if (vmx->nested.nested_run_pending &&
> > (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
> > kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
> > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
> > + new_debugctl = vmcs12->guest_ia32_debugctl;
> > } else {
> > kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
> > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
> > + new_debugctl = vmx->nested.pre_vmenter_debugctl;
> > }
> > +
> > + if (CC(!vmx_set_guest_debugctl(vcpu, new_debugctl, false))) {
>
> The consistency check belongs in nested_vmx_check_guest_state(), only needs to
> check the VM_ENTRY_LOAD_DEBUG_CONTROLS case, and should be posted as a separate
> patch.

I can move it there. Can you explain why though you want this? Is it because of the
order of checks specified in the PRM?

Currently GUEST_IA32_DEBUGCTL of the host is *written* in prepare_vmcs02. 
Should I also move this write to nested_vmx_check_guest_state?

Or should I write the value blindly in prepare_vmcs02 and then check the value
of 'vmx->msr_ia32_debugctl' in nested_vmx_check_guest_state and fail if the value
contains reserved bits? 
I don't like that idea that much IMHO.


>
> > + *entry_failure_code = ENTRY_FAIL_DEFAULT;
> > + return -EINVAL;
> > + }
> > +
> > +static void __vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data)
> > +{
> > + vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> > +}
> > +
> > +bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
> > +{
> > + u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);
> > +
> > + if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
> > + kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
> > + data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
> > + invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
> > + }
> > +
> > + if (invalid)
> > + return false;
> > +
> > + if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
> > + VM_EXIT_SAVE_DEBUG_CONTROLS))
> > + get_vmcs12(vcpu)->guest_ia32_debugctl = data;
> > +
> > + if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
> > + (data & DEBUGCTLMSR_LBR))
> > + intel_pmu_create_guest_lbr_event(vcpu);
> > +
> > + __vmx_set_guest_debugctl(vcpu, data);
> > + return true;
>
> Return 0/-errno, not true/false.

There are plenty of functions in this file and KVM that return boolean.

e.g: 

static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr)

static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 struct vmcs12 *vmcs12)


static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)

...


I personally think that functions that emulate hardware should return boolean values
or some hardware specific status code (e.g VMX failure code) because the real hardware
never returns -EINVAL and such.


Best regards,
Maxim Levitsky




>