Re: [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
From: Pranjal Shrivastava
Date: Mon Jun 01 2026 - 02:22:53 EST
On Mon, Jun 01, 2026 at 06:00:15AM +0000, Ankit Soni wrote:
> > @@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
> > else
> > dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
> >
> > - if (dev_is_pci(dev))
> > - pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> > + if (dev_is_pci(dev)) {
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > + if (pci_ats_supported(pdev)) {
> > + ret = pci_prepare_ats(pdev, PAGE_SHIFT);
> > + if (ret) {
> > + iommu_dev = ERR_PTR(ret);
> > + goto out_err;
> > + }
> > + }
> > + }
> >
> > out_err:
> > + if (IS_ERR(iommu_dev))
> > + iommu_ignore_device(iommu, dev);
> > +
> > return iommu_dev;
> > }
> >
>
> Hi,
> This regresses IRQ remapping in the PD_MODE_NONE branch. By design
> rlookup_table[devid] must stay valid for IR - init.c:2257 documents
> this: "Do not return an error to enable IRQ remapping ...". Pre-patch
> the PD_MODE_NONE branch returned ERR_PTR(-ENODEV) without nulling
> rlookup, precisely so irq_remapping_alloc() / __rlookup_amd_iommu()
> keep working; this unconditional cleanup violates that.
> The new pci_prepare_ats() failure path has the same shape:
> amd_iommu_set_pci_msi_domain() ran earlier and parented dev->msi_domain
> on iommu->ir_domain, but on this new out_err that's not unwound. So
> nulling rlookup_table[devid] makes irq_remapping_alloc() return -EINVAL
> on the first MSI alloc for the device. Sashiko also flagged this in [1];
>
> Also if iommu_init_device() branch fails, iommu_ignore_device() will be
> called twice.
>
Hi Ankit,
Ack. Sashiko made me realize that this regresses IRQ mapping for AMD,
and I agree that the call to iommu_ignore_device() is a bit too
aggressive as it wipes the rlookup_table entry required for IRQ
remapping, particularly in PD_MODE_NONE.
I was thinkig to address this in the next version as follows:
1. Split the probe error paths:
- Proper init failures (like iommu_init_device) will continue to call
iommu_ignore_device(). I will fix the double invocation here.
- Config failures (like ATS mismatch or PD_MODE_NONE) will return an
an error but skip caling iommu_ignore_device(), preserving the
rlookup_table entry for IRQ remapping.
2. Resolve the Use-After-Free (UAF):
To prevent the UAF on the "DMA-only" failure path, I will ensure that
the hardware Device Table Entry (DTE) is set to a safe state (like
blocked or bypass) and the dev_data->dev pointer is cleared, as the
IOMMU core does not invoke release_device() after a probe failure.
3. Fix iommu_ignore_device() infrastructure:
I will address the pre-existing bugs identified by Sashiko:
- Fix clearing order (calling setup_aliases before clearing the
rlookup_table).
- Replace the non-atomic memset() on the hardware dev_table with an
atomic DTE update.
That said, I'm investigating the safest way to revert the MSI domain
assignment on probe failure to avoid the dangling domain issue pointed
out by Sashiko. Maybe we can add an amd_iommu_restore_msi_domain() helper
to revert the assignment made in amd_iommu_set_pci_msi_domain() on probe
failure?
Please, let me know if that sounds okay?
Also, I'm wondering if I should send this as a separate series specific
to AMD which is unrelated to this one? Or maybe handle AMD IOMMU in a
separate series altogether. Let me know if you (and Vasanth / Suravee)
would prefer that?
Thanks,
Pranjal