RE: [PATCH rc] PCI: Fix nested pci_dev_reset_iommu_prepare/done()
From: Tian, Kevin
Date: Wed Mar 18 2026 - 23:01:20 EST
> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Thursday, March 19, 2026 10:43 AM
>
> On Thu, Mar 19, 2026 at 02:22:39AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > > Sent: Thursday, March 19, 2026 6:00 AM
> > >
> > > Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
> > > internally while both are calling pci_dev_reset_iommu_prepare/done().
> > >
> > > This results in a nested pci_dev_reset_iommu_prepare/done() call:
> > > cxl_reset_bus_function():
> > > pci_dev_reset_iommu_prepare(); // outer
> > > pci_reset_bus_function():
> > > pci_dev_reset_iommu_prepare(); // inner (re-entry)
> > > ...
> > > pci_dev_reset_iommu_done(); // inner
> > > pci_dev_reset_iommu_done(); // outer
> > >
> > > However, pci_dev_reset_iommu_prepare() doesn't allow a re-entry for
> now.
> > >
> > > Digging further, I found that most functions in pci_dev_specific_reset()
> > > except reset_ivb_igd() are calling pcie_flr() that has nested those two
> > > functions as well.
> > >
> > > Drop the outer callbacks in:
> > > - cxl_reset_bus_function()
> > > - pci_dev_specific_reset()
> > >
> > > Replace with adding the callbacks in:
> > > + reset_ivb_igd()
> > >
> >
> > I wonder whether it's cleaner to just make pci_dev_reset_iommu_prepare()
> > reentrant here.
> >
> > what this patch does requires additional attention on specific reset
> functions.
> >
> > e.g. reset_hinic_vf_dev() has a special logic waiting for firmware reset
> > done after calling pcie_flr(). With this patch the _done() notification will
> > be triggered earlier than expectation.
>
> !
>
> > the original way is more maintainable with prepare/done equipped in
> > the highest level reset helpers in pci_reset_fn_methods[].
>
> Yes. I agree. I will add a reset_cnt and allow re-entry.
>
> > btw the AI review tool gave a comment:
> >
> > https://sashiko.dev/#/patchset/20260318220028.1146905-1-
> nicolinc%40nvidia.com
>
> My browsers on both Linux and Windows blocked this site :-/
>
> > it won't hold if my proposal is agreed. But it still applies to your quarantine
> > series which relies on accurate report of success reset.
>
> Would you mind posting it here, if it's not too long?
>
Here it is:
--
By removing __pci_dev_specific_reset(), the IOMMU preparation error checking
is now delegated to the individual device-specific reset handlers. Will this
cause IOMMU preparation failures to be silently ignored?
Looking at handlers like reset_intel_82599_sfp_virtfn(), the return value of
pcie_flr() is not checked:
static int reset_intel_82599_sfp_virtfn(struct pci_dev *dev, bool probe)
{
if (!probe)
pcie_flr(dev);
return 0;
}
Similarly, delay_250ms_after_flr() ignores the return value when actually
performing the reset:
static int delay_250ms_after_flr(struct pci_dev *dev, bool probe)
{
if (probe)
return pcie_reset_flr(dev, PCI_RESET_PROBE);
pcie_reset_flr(dev, PCI_RESET_DO_RESET);
msleep(250);
return 0;
}
If pci_dev_reset_iommu_prepare() fails inside pcie_flr(), the function aborts
the FLR and returns an error. However, since the quirk handlers swallow this
error, the PCI subsystem is falsely told that the device reset succeeded.
Could this falsely reported success leave sensitive device state uncleared
between host and guest environments? Can we ensure the return value from the
inner reset functions is properly propagated to prevent this?