Re: [PATCH v2 1/7] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds

From: Nicolin Chen

Date: Wed Mar 18 2026 - 16:28:16 EST


On Wed, Mar 18, 2026 at 04:02:01PM +0800, Shuai Xue wrote:
> On 3/18/26 3:15 AM, Nicolin Chen wrote:
> > - * Caller must use pci_dev_reset_iommu_prepare() with pci_dev_reset_iommu_done()
> > - * before/after the core-level reset routine, to unset the resetting_domain.
> > + * Caller must use pci_dev_reset_iommu_done() after a successful PCI-level reset
> > + * to unset the resetting_domain. If the reset fails, caller can choose to keep
> > + * the device in the resetting_domain to protect system memory using IOMMU from
> > + * any bad ATS.
>
> I think you mean ...to protect system memory from DMA via stale ATC entries.

"the DMA via stale ATC entries" sounds too specific to this issue.

I intended to generalize the message in the kdocs. Maybe should've
drop "from any bad ATS" as well?

> > *
> > * Return: 0 on success or negative error code if the preparation failed.
> > *
> > @@ -3961,9 +3963,9 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
> > guard(mutex)(&group->mutex);
> > - /* Re-entry is not allowed */
> > - if (WARN_ON(group->resetting_domain))
> > - return -EBUSY;
> > + /* Already prepared */
> > + if (group->resetting_domain)
> > + return 0;
>
> Could you elaborate more on why Re-entry is allowed now? For example:
>
> /*
> * Allow re-entry: if a previous reset failed, the device remains in
> * resetting_domain. A subsequent reset attempt must be able to call
> * prepare() again without failing.
> */

Sure. I will add this.

> > @@ -977,7 +978,15 @@ int pci_dev_acpi_reset(struct pci_dev *dev, bool probe)
> > ret = -ENOTTY;
> > }
> > - pci_dev_reset_iommu_done(dev);
> > + /*
> > + * The reset might be invoked to recover a serious error. E.g. when the
> > + * ATC failed to invalidate its stale entries, which can result in data
> > + * corruption. Thus, do not unblock ATS until a successful reset.
> > + */
> > + if (!ret || !pci_ats_supported(dev))
> > + pci_dev_reset_iommu_done(dev);
>
> pci_dev_reset_iommu_prepare() already does an early return for non-ATS
> devices, and pci_dev_reset_iommu_done() also early-returns
> when !pci_ats_supported(). So for non-ATS devices, done()
> is always a no-op regardless of whether it's called or not.
>
> Do we really need check pci_ats_supported(dev) here?

You are right. It was for a precaution as pci_dev_reset_iommu_done()
isn't maintained in the pci.c file.

Kevin suggested to move this part into pci_dev_reset_iommu_done() so
these conditions can be merged. And pci callers will be simply:
pci_dev_reset_iommu_done(dev, !ret);

> > @@ -4978,7 +5010,15 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
> > pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
> > reg);
> > - pci_dev_reset_iommu_done(dev);
> > + /*
> > + * The reset might be invoked to recover a serious error. E.g. when the
> > + * ATC failed to invalidate its stale entries, which can result in data
> > + * corruption. Thus, do not unblock ATS until a successful reset.
> > + */
> > + if (!rc || !pci_ats_supported(dev))
> > + pci_dev_reset_iommu_done(dev);
> > + else
> > + pci_warn(dev, "Reset failed. Blocking ATS to protect memory\n");
> > return rc;
> > }
>
> cxl_reset_bus_function() calls pci_reset_bus_function() which already
> contains the conditional pci_dev_reset_iommu_done() logic. Then the
> outer function calls done() again. The second call is a no-op because
> resetting_domain has already been cleared, but this is confusing.
>
> Additionally, cxl_reset_bus_function() does its own
> pci_dev_reset_iommu_prepare(), while pci_reset_bus_function() also
> calls prepare() internally. With the re-entry change in this patch,
> the nested prepare() silently succeeds.
>
> Is it a expected behavior?

No.. This is a bug prior to this patch. The nested prepare/done()
is wrong. I will submit a patch fixing this.

Thanks!
Nicolin