Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy

From: Xu Yilun
Date: Tue Jul 01 2025 - 22:31:56 EST


On Tue, Jul 01, 2025 at 09:13:15AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote:
> > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
> > >
> > > > I need to reconsider this, seems we need a dedicated vdev lock to
> > > > synchronize concurrent vdev abort/destroy.
> > >
> > > It is not possible to be concurrent
> > >
> > > destroy is only called once after it is no longer possible to call
> > > abort.
> >
> > I'm almost about to drop the "abort twice" idea. [1]
> >
> > [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@xxxxxxxxxx/
> >
> > See from the flow below,
> >
> > T1. iommufd_device_unbind(idev)
> > iommufd_device_destroy(obj)
> > mutex_lock(&idev->igroup->lock)
> > iommufd_vdevice_abort(idev->vdev.obj)
> > mutex_unlock(&idev->igroup->lock)
> > kfree(obj)
> >
> > T2. iommufd_destroy(vdev_id)
> > iommufd_vdevice_destroy(obj)
> > mutex_lock(&vdev->idev->igroup->lock)
> > iommufd_vdevice_abort(obj);
> > mutex_unlock(&vdev->idev->igroup->lock)
> > kfree(obj)
> >
> > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is
> > possible the idev is already freed at that time:
> >
> > iommufd_destroy(vdev_id)
> > iommufd_vdevice_destroy(obj)
> > iommufd_device_unbind(idev)
> > iommufd_device_destroy(obj)
> > mutex_lock(&idev->igroup->lock)
> > mutex_lock(&vdev->idev->igroup->lock) (wait)
> > iommufd_vdevice_abort(idev->vdev.obj)
> > mutex_unlock(&idev->igroup->lock)
> > kfree(obj)
> > mutex_lock(&vdev->idev->igroup->lock) (PANIC)
> > iommufd_vdevice_abort(obj)
> > ...
>
> Yes, you can't touch idev inside the destroy function at all, under
> any version. idev is only valid if you have a refcount on vdev.
>
> But why are you touching this lock? Arrange things so abort doesn't
> touch the idev??

idev has a pointer idev->vdev to track the vdev's lifecycle.
idev->igroup->lock protects the pointer. At the end of
iommufd_vdevice_destroy() this pointer should be NULLed so that idev
knows vdev is really destroyed.

I haven't found a safer way for vdev to sync up its validness with idev
w/o touching idev.

I was thinking of using vdev->idev and some vdev lock for tracking
instead. Then iommufd_vdevice_abort() doesn't touch idev. But it is
still the same, just switch to put idev in risk:


iommufd_destroy(vdev_id)
iommufd_vdevice_destroy(obj)
iommufd_device_unbind(idev)
iommufd_device_destroy(obj)
mutex_lock(&vdev->some_lock)
mutex_lock(&idev->vdev->some_lock) (wait)
iommufd_vdevice_abort(obj)
mutex_unlock(&vdev->some_lock)
kfree(obj)
mutex_lock(&idev->vdev->some_lock) (PANIC)
iommufd_vdevice_abort(idev->vdev.obj)
...

Thanks,
Yilun