RE: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()

From: Tian, Kevin

Date: Fri Mar 27 2026 - 04:28:12 EST


> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Friday, March 20, 2026 5:35 AM
>
> On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> > On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct
> pci_dev *pdev)
> > > guard(mutex)(&group->mutex);
> > > - /* Re-entry is not allowed */
> > > - if (WARN_ON(group->resetting_domain))
> > > - return -EBUSY;
> > > + if (group->resetting_domain) {
> > > + group->reset_cnt++;
> > > + return 0;
> > > + }
> > > ret = __iommu_group_alloc_blocking_domain(group);
> >
> > pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> > They operate on the specific pdev passed in, but use group-wide
> > state (resetting_domain, reset_cnt) to track the reset lifecycle.
> >
> > Interestingly, the broken_worker in patch 3 of the ATC timeout
> > series DOES have an explicit singleton check:
> >
> > if (list_is_singular(&group->devices)) {
> > /* Note: only support group with a single device */
> >
> > This reveals an implicit assumption: the entire prepare/done
> > mechanism works correctly only for singleton groups. For
> > multi-device groups:
> >
> > - prepare() only detaches the specific pdev, leaving other
> > devices in the group still attached to the original domain

no, attach is per-group. all devices are changed together.

> > - The group-wide resetting_domain/reset_cnt state can be
> > corrupted by concurrent resets on different devices (as
> > discussed above)
>
> That's a phenomenal catch!
>
> I think we shall have a reset_ndevs in the group and a reset_depth
> in the gdev.
>

I didn't get how the state might be corrupted.

If there are two devices in a group both being reset, you only
want to switch back to the original domain after both devices
have finished reset.

so a reset_cnt is sufficient to cover both scenarios:

- nested prepare/done in a single device resetting path
- concurrent prepare/done in multiple devices resetting paths

v4 is way too complex. Probably it's required in the following
ATS timeout series (which I haven't thought about carefully).
but for the fix here seems a simple logic like v2 does is ok?