Re: [PATCH 16/17] KVM: x86: Move error handling inside free_external_spt()
From: Binbin Wu
Date: Wed Apr 08 2026 - 22:09:22 EST
On 3/28/2026 4:14 AM, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> Move the logic for TDX’s specific need to leak pages when reclaim
> fails inside the free_external_spt() op, so this can be done in TDX
> specific code and not the generic MMU.
>
> Do this by passing the SP in instead of the external page table
> pointer. This way TDX code can set sp->external_spt to NULL. Since the
> error is now handled internally, change the op to return void. This way
> it also operated like a normal free in that success is guaranteed from
operated -> operates ?
> the callers perspective.
>
> Opportunistically, drop the unused level arg while adjusting the sp arg.
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> [re-wrote log and massaged op name]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> ---
> Notable changes since last discussion
> - Since free_external_sp() is dropped in the latter DPAMT patches, don't
> bother renaming free_external_spt().
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 2 +-
> arch/x86/include/asm/kvm_host.h | 3 +--
> arch/x86/kvm/mmu/tdp_mmu.c | 13 ++-----------
> arch/x86/kvm/vmx/tdx.c | 25 +++++++++++--------------
> 4 files changed, 15 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index ed348c6dd445..10ccf6ea9d9a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -96,7 +96,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> KVM_X86_OP(load_mmu_pgd)
> KVM_X86_OP_OPTIONAL_RET0(set_external_spte)
> -KVM_X86_OP_OPTIONAL_RET0(free_external_spt)
> +KVM_X86_OP_OPTIONAL(free_external_spt)
> KVM_X86_OP(has_wbinvd_exit)
> KVM_X86_OP(get_l2_tsc_offset)
> KVM_X86_OP(get_l2_tsc_multiplier)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09588e797e4b..fbc39f0bb491 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1881,8 +1881,7 @@ struct kvm_x86_ops {
> u64 new_spte, enum pg_level level);
>
> /* Update external page tables for page table about to be freed. */
> - int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> - void *external_spt);
> + void (*free_external_spt)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp);
>
>
> bool (*has_wbinvd_exit)(void);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 806788bdecce..575033cc7fe4 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -455,17 +455,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared);
> }
>
> - if (is_mirror_sp(sp) &&
> - WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level,
> - sp->external_spt))) {
Nit:
One thing might be worth to mention in the cover letter is that before the change,
if tdx_reclaim_page() return an error code, the warning will be triggered. After
the change, the warning is covered by the TDX_BUG_ON_3(), which is deeper in the
stack. So it's clearer that tdx_reclaim_page() failure is not handled silently.
> - /*
> - * Failed to free page table page in mirror page table and
> - * there is nothing to do further.
> - * Intentionally leak the page to prevent the kernel from
> - * accessing the encrypted page.
> - */
> - sp->external_spt = NULL;
> - }
> + if (is_mirror_sp(sp))
> + kvm_x86_call(free_external_spt)(kvm, base_gfn, sp);
>
> call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index bfbadba8bc08..d064b40a6b31 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1765,27 +1765,24 @@ static void tdx_track(struct kvm *kvm)
> kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE);
> }
>
> -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
> - enum pg_level level, void *private_spt)
> +static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
gfn is also not used in the function right now.
Also, since sp is passed now, the gfn can be got from sp->gfn, should gfn
also be dropped?
> + struct kvm_mmu_page *sp)
> {
> - struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> -
> /*
> - * free_external_spt() is only called after hkid is freed when TD is
> - * tearing down.
> * KVM doesn't (yet) zap page table pages in mirror page table while
> * TD is active, though guest pages mapped in mirror page table could be
> * zapped during TD is active, e.g. for shared <-> private conversion
> * and slot move/deletion.
> + *
> + * In other words, KVM should only free mirror page tables after the
> + * TD's hkid is freed, when the TD is being torn down.
> + *
> + * If the S-EPT PTE can't be removed for any reason, intentionally leak
> + * the page to prevent the kernel from accessing the encrypted page.
> */
> - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm))
> - return -EIO;
> -
> - /*
> - * The HKID assigned to this TD was already freed and cache was
> - * already flushed. We don't have to flush again.
> - */
> - return tdx_reclaim_page(virt_to_page(private_spt));
> + if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> + tdx_reclaim_page(virt_to_page(sp->external_spt)))
> + sp->external_spt = NULL;
> }
>
> static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,