Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Edgecombe, Rick P
Date: Tue Apr 14 2026 - 16:01:43 EST
On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
> A runtime TDX module update can conflict with TD lifecycle operations that
> are update-sensitive.
>
> Today, update-sensitive operations include:
>
> - TD build: TD measurement is accumulated across multiple
> TDH.MEM.PAGE.ADD, TDH.MR.EXTEND, and TDH.MR.FINALIZE calls.
How exactly does this work? There is a new error I see:
TDX_INCOMPATIBLE_MRTD_CONTEXT. It gets returned by those seamcalls sometimes.
I think(?)... TDX_INCOMPATIBLE_MRTD_CONTEXT gets returned by those seamcalls if
the problematic type of tdx module update happens between TDH.MNG.INIT and
TDH.MR.FINALIZE. So if the compat flag is not passed into shutdown... then KVM
gets some new surprise error codes to handle deep in the MMU.
>
> - TD migration: intermediate crypto state is saved/restored across
> interrupted/resumed TDH.EXPORT.STATE.* and TDH.IMPORT.STATE.* flows.
We don't need to consider migration for this change.
>
> If an update races TD build, for example, TD measurement can become
> incorrect and attestation can fail.
>
> The TDX architecture exposes two approaches:
>
> 1) Avoid updates during update-sensitive operations.
> 2) Detect incompatibility after update and recover.
>
> Post-update detection (option #2) is not a good fit: as discussed in [1],
> future module behavior may expand update-sensitive operations in ways that
> make KVM ABIs unstable and will break userspace.
>
> "Do nothing" is also not preferred: while it keeps kernel code simple, it
> lets the issue leak into the broader stack, where both detection and
> recovery require significantly more effort.
This subject has had a lot of debate (as linked below in the log), but the way
this is written leaves a lot of questions. "do nothing" is not an option it
says, but the code does just that when UPDATE_COMPAT_SENSITIVE is not supported.
>
> So, use option #1. Specifically, request "avoid update-sensitive" behavior
> during TDX module shutdown and map the resulting failure to -EBUSY so
> userspace can distinguish an update race from other failures.
>
> When the "avoid update-sensitive" feature isn't supported, proceed with
> updates. If a race occurs between module update and update-sensitive
> operations, failures happen at a later stage (e.g., incorrect TD
> measurements in attestation reports for TD build). Effectively, this
> means "let userspace update at their own risk".
>
Above it says we can't just do nothing, we need the flag. And then this argues
that we can do nothing because we can rely on userspace to deal with the
issue... This log is maybe just trying to put a brave face on an imperfect
compromise?
So, while I don't want to re-open the debate, I'm not sure the patch
justification is going to pass scrutiny as is.
In the link [2], Dan says "Do not make Linux carry short lived one-off
complexity", and also "Do not include logic to disable updates, document the
expectation in the tool."
It seems this does not exclude the option to just to always pass the compat
flag. Basically assume that the TDX module will always support
UPDATE_COMPAT_SENSITIVE if it supports TDX module updates. Which I guess we
should expect should eventually be true.
In [2] Dan was also against checking the UPDATE_COMPAT_SENSITIVE feature0 bit to
gate the feature.
For the record, I don't like allowing the update without the compat bit set, and
my concern has nothing to do with userspace roles and responsibilities. Instead
it's because we are over budget on complexity for handling SEAMCALL errors
within KVM and this makes things worse to keep track of.
tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
working on this code and considering if it is a valid KVM_BUG_ON()? After this
patch, the answer is...well sometimes. It depends on the previous modules
specific feature0 bits, an understanding on admins expectations, and the
behavior of some far away code in arch/x86. Gah.
Actually, the diff Dan objected to was checking and printing a specific helpful
error. Maybe he does not mind much more simply checking an extra bit in
tdx_supports_runtime_update()? Otherwise, I'd think to just unconditionally pass
UPDATE_COMPAT_SENSITIVE without checking for support. Essentially mandate that
it is always supported if TDX module update is supported.
> Userspace can check if
> the feature is supported or not. The alternative of blocking updates
> entirely is rejected [2] as it introduces permanent kernel complexity to
> accommodate limitations in early TDX module releases that userspace can
> handle.
>
> Note: this implementation is based on a reference patch by Vishal [3].
> Note2: moving "NO_RBP_MOD" is just to centralize bit definitions.
>
> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> Reviewed-by: Tony Lindgren <tony.lindgren@xxxxxxxxxxxxxxx>
> Link: https://lore.kernel.org/linux-coco/aQIbM5m09G0FYTzE@xxxxxxxxxx/ # [1]
> Link: https://lore.kernel.org/kvm/699fe97dc212f_2f4a100b@dwillia2-mobl4.notmuch/ # [2]
> Link: https://lore.kernel.org/linux-coco/CAGtprH_oR44Vx9Z0cfxvq5-QbyLmy_+Gn3tWm3wzHPmC1nC0eg@xxxxxxxxxxxxxx/ # [3]
> ---