RE: [PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device
From: Tian, Kevin
Date: Wed Mar 18 2026 - 22:35:56 EST
> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Thursday, March 19, 2026 9:31 AM
>
> On Wed, Mar 18, 2026 at 07:31:18AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > > Sent: Wednesday, March 18, 2026 3:16 AM
> > >
> > > +static void iommu_group_broken_worker(struct work_struct *work)
> > > +{
> > > + struct iommu_group *group =
> > > + container_of(work, struct iommu_group, broken_work);
> > > + struct pci_dev *pdev = NULL;
> > > + struct device *dev;
> > > +
> > > + scoped_guard(mutex, &group->mutex) {
> > > + /* Do not block the device again if it has been recovered */
> > > + if (!READ_ONCE(group->requires_reset))
> > > + goto out_put;
> > > + if (list_is_singular(&group->devices)) {
> > > + /* Note: only support group with a single device */
> >
> > this series is about fixing a vulnerability. Then it sounds incomplete to
> > leave certain configuration still under risk. Probably we should first
> > ensure ATS can be enabled only in singleton group, just like how we
> > did for pci_enable_pasid()?
>
> I understand your concern. But I am not very sure about applying
> limitation to ATS support. Would this block some existing cases?
what about just throwing out a message to warn that enabling ATS
in a non-singleton iommu group may suffer from unquarantined
situation if any device in the group triggers a ATC invalidation timeout?
>
> > > + dev = iommu_group_first_dev(group);
> > > + if (dev_is_pci(dev)) {
> > > + pdev = to_pci_dev(dev);
> > > + pci_dev_get(pdev);
> > > + }
> > > + }
> > > + }
> > > +
> > > + if (pdev) {
> > > + /*
> > > + * Quarantine the device completely. This will be cleared
> > > upon
> > > + * a pci_dev_reset_iommu_done() call indicating the recovery.
> > > + */
> > > + pci_dev_lock(pdev);
> > > + pci_dev_reset_iommu_prepare(pdev);
> >
> > let's rename it to iommu_quarantine_device() to be called here. then
> > have another wrapper pci_dev_reset_iommu_prepare() to call it too.
> >
> > this path has nothing to do with reset.
>
> But the implementation of that iommu_quarantine_device would be
> still to shift to resetting_domain. Perhaps, we can rename that
> to quarantine_domain or so.
>
yes let's rename that too. the purpose of this function is clearly about
quarantine. reset is just one user of it.