Re: [PATCH] vfio/type1: Retry follow_pfnmap_start() when PFNMAP is zapped

From: David Hildenbrand (Arm)

Date: Thu Mar 19 2026 - 09:19:12 EST

On 3/19/26 09:36, Boone, Max wrote:
>
>
>> On Mar 18, 2026, at 10:22 PM, Alex Williamson <alex@xxxxxxxxxxx> wrote:
>>
>> […]
>>
>>> + /*
>>> + * follow_pfnmap_start() returns -EINVAL for
>>> + * invalid parameters and non-present entries.
>>> + * If that happens here after a successful
>>> + * fixup_user_fault(), it is likely that the
>>> + * pfnmap has been zapped. Retry instead of
>>> + * failing.
>>> + */
>>
>> It's a little stronger than that, right? We're betting that the only
>> remaining non-zero return is due to a race and we can introduce what
>> appears to be potential for an infinite loop here because -EAGAIN will
>> get kicked out to redo the vma_lookup() and fixup_user_fault() should
>> return a genuine error if we're completely in the weeds. Should we
>> make this a little stronger and more specific? Thanks,
>
> I’d say that the best case would be to have follow_pfnmap_start() return
> -EINVAL or -ENOENT w.r.t. which of the two return values it is. But then
> again, we could theoretically run into an infinite loop I guess - as the zap
> and faulting could run in lockstep (the race window is extremely small
> though).

Well, in theory :) To hit that race repeatedly, you'd really have to be
quite lucky I guess.

But the real question is: if user space triggered the pinning, and user
space keeps hurting itself to make progress, is that a real problem?

I guess the crucial part would be to

a) Have some cond_resched(() in there?
b) Checking for fatal signals somewhere?
c) Possibly drop locks (mmap lock?) every now and then?

For GUP, a) and b) are in place in __get_user_pages().

c) might be done, but I think it's less deterministic.

>
> We could make the retry above bounded, and bubble up a -EBUSY such
> that users of the ioctl can decide to retry instead of fail?

Would that be a possible ABI break? You'd really have to only do that in
a case where user space does stupid things, I guess.

>
> David, you mentioned that gup already has retry logic that we don’t have
> with follow_fault_pfn() -> follow_pfnmap_start(). Would we potentially run
> into an infinite loop with this change?

GUP triggers page faults through faultin_page(). If handle_mm_fault()
returns

* VM_FAULT_COMPLETED we return -EAGAIN
* VM_FAULT_ERROR we return the error
* VM_FAULT_RETRY we return -EBUSY
* Otherwise 0

In the caller __get_user_pages(), we
* Retry immediately with ret == 0
* Return to the GUP caller (letting it retry) with -EBUSY/-EAGAIN

Having at least a) and b) sounds reasonable. Not sure about having c),
might be tricky if we are not allowed to drop the lock.

--
Cheers,

David