Re: [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev

From: Pranjal Shrivastava

Date: Wed Mar 25 2026 - 17:25:12 EST


On Wed, Mar 25, 2026 at 08:41:46PM +0000, Samiullah Khawaja wrote:
> On Wed, Mar 25, 2026 at 08:24:24PM +0000, Pranjal Shrivastava wrote:
> > On Tue, Feb 03, 2026 at 10:09:46PM +0000, Samiullah Khawaja wrote:
> > > Add APIs that can be used to preserve and unpreserve a vfio cdev. Use
> > > the APIs exported by the IOMMU core to preserve/unpreserve device. Pass
> > > the LUO preservation token of the attached iommufd into IOMMU preserve
> > > device API. This establishes the ownership of the device with the
> > > preserved iommufd.
> > >
> > > Signed-off-by: Samiullah Khawaja <skhawaja@xxxxxxxxxx>
> > > ---
> > > drivers/iommu/iommufd/device.c | 69 ++++++++++++++++++++++++++++++++++
> > > include/linux/iommufd.h | 23 ++++++++++++
> > > 2 files changed, 92 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > index 4c842368289f..30cb5218093b 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -2,6 +2,7 @@
> > > /* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
> > > */
> > > #include <linux/iommu.h>
> > > +#include <linux/iommu-lu.h>
> > > #include <linux/iommufd.h>
> > > #include <linux/pci-ats.h>
> > > #include <linux/slab.h>
> > > @@ -1661,3 +1662,71 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > > iommufd_put_object(ucmd->ictx, &idev->obj);
> > > return rc;
> > > }
> > > +
> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
> > > +int iommufd_device_preserve(struct liveupdate_session *s,
> > > + struct iommufd_device *idev,
> > > + u64 *tokenp)
> > > +{
> > > + struct iommufd_group *igroup = idev->igroup;
> > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > + struct iommufd_hw_pagetable *hwpt;
> > > + struct iommufd_attach *attach;
> > > + int ret;
> > > +
> > > + mutex_lock(&igroup->lock);
> > > + attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
> >
> > By explicitly looking up IOMMU_NO_PASID, we skip any PASID attachments
> > the device might have. Since PASID live update is NOT supported in this
> > series, should we check if the pasid_attach xarray contains anything
> > other than IOMMU_NO_PASID and return -EOPNOTSUPP?
> >
> > Otherwise, we silently fail to preserve those domains without informing
> > the VMM?
>
> VMM should be able to preserve the NO_PASID domains even if it has PASID
> attachments. This is the intended behaviour, I will document it in the
> uAPI docs.

I think I'm miscommunicating here. My concern isn't about whether the
kernel can mechanically preserve the NO_PASID domain when PASID
attachments exist. I agree that part works fine.

My concern is purely about silent state loss. If a VMM asks the kernel
to preserve a device, it expects the entire IOMMU state for that device
to be safely handed over. If the kernel silently skips the PASID
attachments and returns success (0), the VMM on the new kernel will wake
up assuming those PASIDs are still perfectly intact. When the guest
attempts a PASID-tagged DMA, it will unexpectedly fault.

So the question is: how strictly should the kernel protect userspace
from this footgun? A few options that I can see:

1. Rely on uAPI docs
2. Fail the preserve ioctl (-EOPNOTSUPP) if active PASID attachments
are detected.
3. Add an opt-in flag: We could add a flag to the ioctl
(IOMMU_LU_FLAG_IGNORE_PASID) so userspace has to explicitly
acknowledge the state drop?

Options 2 or 3 are especially important when we consider backwards
compatibility. If this series is merged in 7.2 with the "silent drop"
behavior now, when full PASID live update support is eventually added
in a future kernel, userspace will have no robust way to know if it's
running on a kernel that preserves PASIDs or silently drops them. By
returning an error or requiring a flag now, we reserve the right to
cleanly implement the feature later without breaking the UAPI contract.

This is an open question from me, I'm okay with any of the 3 options
I'd like to know what the maintainers think about this as well.

[ ---- >8 ----- ]

Thanks,
Praan