Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root
From: Yan Zhao
Date: Sun May 18 2025 - 23:59:36 EST
On Sat, May 17, 2025 at 01:50:53AM +0800, Edgecombe, Rick P wrote:
> On Fri, 2025-05-16 at 12:01 +0800, Yan Zhao wrote:
> > > Maybe we should rename nx_huge_page_workaround_enabled to something more
> > > generic
> > > and do the is_mirror logic in kvm_mmu_do_page_fault() when setting it. It
> > > should
> > > shrink the diff and centralize the logic.
> > Hmm, I'm reluctant to rename nx_huge_page_workaround_enabled, because
> >
> > (1) Invoking disallowed_hugepage_adjust() for mirror root is to disable page
> > promotion for TDX private memory, so is only applied to TDP MMU.
> > (2) nx_huge_page_workaround_enabled is used specifically for nx huge pages.
> > fault->huge_page_disallowed = fault->exec && fault-
> > >nx_huge_page_workaround_enabled;
>
> Oh, good point.
>
> >
> > if (fault->huge_page_disallowed)
> > account_nx_huge_page(vcpu->kvm, sp,
> > fault->req_level >= it.level);
> >
> > sp->nx_huge_page_disallowed = fault->huge_page_disallowed.
> >
> > Affecting fault->huge_page_disallowed would impact
> > sp->nx_huge_page_disallowed as well and would disable huge pages entirely.
> >
> > So, we still need to keep nx_huge_page_workaround_enabled.
> >
> > If we introduce a new flag fault->disable_hugepage_adjust, and set it in
> > kvm_mmu_do_page_fault(), we would also need to invoke
> > tdp_mmu_get_root_for_fault() there as well.
> >
> > Checking for mirror root for non-TDX VMs is not necessary, and the invocation
> > of
> > tdp_mmu_get_root_for_fault() seems redundant with the one in
> > kvm_tdp_mmu_map().
>
> Also true. What about a wrapper for MMU code to check instead of fault-
> >nx_huge_page_workaround_enabled then?
Like below?
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1b2bacde009f..0e4a03f44036 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
return 0;
}
+static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
+{
+ return fault->nx_huge_page_workaround_enabled || is_mirror;
+}
+
/*
* Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
* page tables and SPTEs to translate the faulting guest physical address.
@@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
int r;
- if (fault->nx_huge_page_workaround_enabled || is_mirror)
+ if (is_fault_disallow_huge_page_adust(fault, is_mirror))
disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);
/*
> Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> passing in an is_mirror arg?
It's an optimization.
As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
can avoid checking mirror for each sp, which remains unchanged in a root.
> There must be a way to have it fit in better with disallowed_hugepage_adjust()
> without adding so much open coded boolean logic.