Re: [PATCH v8 17/21] x86/virt/seamldr: Abort updates on failure

From: Chao Gao

Date: Fri May 08 2026 - 05:19:32 EST


On Thu, Apr 30, 2026 at 01:06:38PM -0700, Dave Hansen wrote:
>I don't like how this is being done.
>
> 1. Introduce this do{}while() loop
> 2. Do 20 other patches
> 3. Introduce a thing that can make it change
> 4. Change the fundamental flow of the loop, to fix #3
>
>I'd much rather have:
>
> 1. Introduce this do{}while() loop
> 2. Tweak fundamental flow of the loop from the last patch when I can
> remember it. Allude to future failures.
> 3. Do 20 other patches
> 4. Introduce a thing that uses #2

OK, that makes sense. I'll reorder the series so this patch comes immediately
after the skeleton patch.

>
>
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index c81b26c4bac1..9b8f571eb03f 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -220,6 +220,7 @@ enum module_update_state {
>> static struct {
>> enum module_update_state state;
>> int thread_ack;
>> + bool failed;
>> /*
>> * Protect update_data. Raw spinlock as it will be acquired from
>> * interrupt-disabled contexts.
>> @@ -284,12 +285,15 @@ static int do_seamldr_install_module(void *seamldr_params)
>> break;
>> }
>>
>> - ack_state();
>> + if (ret)
>> + WRITE_ONCE(update_data.failed, true);
>> + else
>> + ack_state();
>> } else {
>> touch_nmi_watchdog();
>> rcu_momentary_eqs();
>> }
>
>I don't like how this is turning out either. I don't like all the nested
>conditions or ack_state() that hides its mucking with update data while
>its caller mucks with it directly. It's just all hacked together.
>
>Defer all of the acking, and *failed* acking to the ack_state() helper.

OK. I'll fold both normal and failed acking into ack_state().


>
>Also, I'm kinda peeved that you copied and pasted the
>touch_nmi_watchdog()/rcu_momentary_eqs() bits and none of the comments.
>This is a rather subtle use of both. If you want this to be a normal
>"spinning in stop machine" idiom, then create a helper and put the
>comments there.

Those two calls were added in stop_machine() to improve debuggability.

The issue they address is that a stop_machine() callback can hang on one
CPU. Without touch_nmi_watchdog() and rcu_momentary_eqs(), the other CPUs
that are merely spinning in the wait loop can also report hard lockup and
RCU stall warnings, which obscures the actual stuck CPU.

I agree that this behavior makes sense in stop_machine() as common
infrastructure. But this update path does not take an arbitrary callback
function, so that that debuggability is not strictly necessary here. I'll
drop those calls from this path unless there is an objection.

>
>Also, this is a case where:
>
> do {
> cpu_relax();
> newstate = READ_ONCE(update_data.state);
>
> if (newstate == curstate) {
> // can cpu_relax() just go in here??
> touch_nmi_watchdog();
> rcu_momentary_eqs();
> continue;
> }
>
> switch() {
> // state changing here
> }
> } while (...);
>
>is a much more sane setup. You're not paying the if() indentation cost
>for the entire state transition block. You're also putting the "shut up
>the warnings" code out of the way where you can forget about it.
>

Agreed. Will do.