Re: [PATCH v11 4/9] PCI: Add additional checks for flr reset

From: Alex Williamson

Date: Wed Mar 25 2026 - 12:32:08 EST


On Tue, 24 Mar 2026 17:49:36 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
> > If a device is in an error state, then any reads of device registers can
> > return error value. Add additional checks to validate if a device is in an
> > error state before doing an flr reset.
>
> s/flr/FLR/ (also in subject)
>
> Also please extend the subject to say something specific about the
> "additional checks". E.g.,
>
> PCI: Fail FLR when config space inaccessible
>
> > Cc: stable@xxxxxxxxxxxxxxx
> > Reviewed-by: Benjamin Block <bblock@xxxxxxxxxxxxx>
> > Reviewed-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
> > Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
> > ---
> > drivers/pci/pci.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 373421f4b9d8..8e4d924f4e88 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
> > */
> > int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > {
> > + u32 reg;
> > +
> > if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> > return -ENOTTY;
> >
> > if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> > return -ENOTTY;
> >
> > + if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
> > + pci_warn(dev, "Device unable to do an FLR\n");
> > + return -ENOTTY;
> > + }
>
> I guess the point of this is to detect devices that are inaccessible?
> The same sort of thing as in pci_dev_save_and_disable() from patch
> 3/9? But we use "dev->error_state == pci_channel_io_perm_failure"
> instead?
>
> No matter what we do, this has the same race as in patch 3/9. And I
> think using dev->error_state also depends on AER being enabled, which
> cuts out many PCIe devices.
>
> I think using the same exact code as in pci_dev_save_and_disable()
> would be more straightforward. And also the same sort of wording in
> the message, e.g., "Device config space inaccessible; unable to FLR"
> or similar. I foresee many of these messages in my future, and it
> will be helpful to have a specific clue about why FLR failed :)

I think pci_af_flr() and pci_pm_reset() are all subject to this as
well, some of the device specific resets too. Maybe we need a common
helper with a string provided so we can differentiate the callers, even
if we don't make all the conversions in this series. Thanks,

Alex