Re: [RFC PATCH 12/21] KVM: TDX: Determine max mapping level according to vCPU's ACCEPT level

From: Edgecombe, Rick P
Date: Fri May 16 2025 - 18:03:04 EST


On Fri, 2025-05-16 at 14:30 +0800, Yan Zhao wrote:
> > Looking more closely, I don't see why it's too hard to pass in a
> > max_fault_level
> > into the fault struct. Totally untested rough idea, what do you think?
> Thanks for bringing this up and providing the idea below. In the previous TDX
> huge page v8, there's a similar implementation [1] [2].
>
> This series did not adopt that approach because that approach requires
> tdx_handle_ept_violation() to pass in max_fault_level, which is not always
> available at that stage. e.g.
>
> In patch 19, when vCPU 1 faults on a GFN at 2MB level and then vCPU 2 faults
> on
> the same GFN at 4KB level, TDX wants to ignore the demotion request caused by
> vCPU 2's 4KB level fault. So, patch 19 sets tdx->violation_request_level to
> 2MB
> in vCPU 2's split callback and fails the split. vCPU 2's
> __vmx_handle_ept_violation() will see RET_PF_RETRY and either do local retry
> (or
> return to the guest).

I think you mean patch 20 "KVM: x86: Force a prefetch fault's max mapping level
to 4KB for TDX"?

>
> If it retries locally, tdx_gmem_private_max_mapping_level() will return
> tdx->violation_request_level, causing KVM to fault at 2MB level for vCPU 2,
> resulting in a spurious fault, eventually returning to the guest.
>
> As tdx->violation_request_level is per-vCPU and it resets in
> tdx_get_accept_level() in tdx_handle_ept_violation() (meaning it resets after
> each invocation of tdx_handle_ept_violation() and only affects the TDX local
> retry loop), it should not hold any stale value.
>
> Alternatively, instead of having tdx_gmem_private_max_mapping_level() to
> return
> tdx->violation_request_level, tdx_handle_ept_violation() could grab
> tdx->violation_request_level as the max_fault_level to pass to
> __vmx_handle_ept_violation().
>
> This series chose to use tdx_gmem_private_max_mapping_level() to avoid
> modification to the KVM MMU core.

It sounds like Kirill is suggesting we do have to have demotion in the fault
path. IIRC it adds a lock, but the cost to skip fault path demotion seems to be
adding up.

>
> [1]
> https://lore.kernel.org/all/4d61104bff388a081ff8f6ae4ac71e05a13e53c3.1708933624.git.isaku.yamahata@xxxxxxxxx/
> [2
> ]https://lore.kernel.org/all/3d2a6bfb033ee1b51f7b875360bd295376c32b54.17089336
> 24.git.isaku.yamahata@xxxxxxxxx/