Re: [PATCH] vfio/type1: Retry follow_pfnmap_start() when PFNMAP is zapped

From: Alex Williamson

Date: Wed Mar 18 2026 - 17:23:20 EST

On Tue, 17 Mar 2026 16:07:45 +0100
Max Boone via B4 Relay <devnull+mboone.akamai.com@xxxxxxxxxx> wrote:

> From: Max Boone <mboone@xxxxxxxxxx>
>
> A race between page table walking (e.g. via procfs numa_maps) and VFIO DMA
> pinning can lead to temporary failures in follow_pfnmap_start(). When a
> PUD entry is split and concurrently refaulted, the PFNMAP mapping may be
> temporarily zapped, causing follow_pfnmap_start() to return an error.
>
> Although follow_pfnmap_start() returns an -EINVAL this is not due to
> invalid parameters, but rather because of the pfnmap being non-present.
> Treat it as such, and retry by returning -EAGAIN, similar to how GUP
> handles such races.
>
> This avoids propagating an unexpected -EINVAL to userspace, like follows:
> [dma_map]
> dma_map iova=0x000000000000 size=0x000004000000 vaddr=0x00007f7800000000
> dma_map FAILED iova=0x020000000000: [Errno 22] Invalid argument
> dma_map iova=0x040000000000 size=0x000002000000 vaddr=0x00007f5780000000
>
> Which would've succeeded on a retry.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: a77f9489f1d7 ("vfio: use the new follow_pfnmap API")
> Signed-off-by: Max Boone <mboone@xxxxxxxxxx>
> ---
> drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5167bec14..3a0d0bbb9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -559,9 +559,17 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> if (ret)
> return ret;
>
> + /*
> + * follow_pfnmap_start() returns -EINVAL for
> + * invalid parameters and non-present entries.
> + * If that happens here after a successful
> + * fixup_user_fault(), it is likely that the
> + * pfnmap has been zapped. Retry instead of
> + * failing.
> + */

It's a little stronger than that, right? We're betting that the only
remaining non-zero return is due to a race and we can introduce what
appears to be potential for an infinite loop here because -EAGAIN will
get kicked out to redo the vma_lookup() and fixup_user_fault() should
return a genuine error if we're completely in the weeds. Should we
make this a little stronger and more specific? Thanks,

Alex

> ret = follow_pfnmap_start(&args);
> if (ret)
> - return ret;
> + return -EAGAIN;
> }
>
> if (write_fault && !args.writable) {
>
> ---
> base-commit: 96ca4caf9066f5ebd35b561a521af588a8eb0215
> change-id: 20260317-retry-pin-on-reclaimed-pud-dfb9e26eb8cf
>
> Best regards,