Re: [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
From: Bjorn Helgaas
Date: Wed Mar 25 2026 - 19:21:40 EST
On Wed, Mar 25, 2026 at 06:56:46AM +0100, Lukas Wunner wrote:
> On Tue, Mar 24, 2026 at 02:45:25PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On 3/23/2026 4:24 PM, Bjorn Helgaas wrote:
> > > eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
> > > cleared PCI_EXP_SLTCTL_HPIE so that when the link goes down, we
> > > wouldn't get a PCI_EXP_SLTSTA_DLLSC interrupt and wake the system.
> > >
> > > I don't know the details of why the PCI_EXP_SLTSTA_DLLSC would cause
> > > that wakeup. I would think pciehp should field that, and it should be
> > > able to figure out whether to bring the port out of D3hot.
> ...
> The problem is, PME not only shares the interrupt with hotplug
> (PCIe r7.0 sec 6.7.3.4), but if INTx is used it also shares the
> interrupt with link bandwidth management, AER and DPC. So there's
> lots of potential for spurious PME interrupts and I fear waking up
> the entire hierarchy below the Root Port on every interrupt may
> result in much worse power consumption.
This interrupt sharing bit is critical information for commit logs and
comments in the hotplug and PME paths that are especially sensitive to
it. I'm embarrassed at how much time I wasted before remembering
that.
> At least Switch Upstream and Downstream Ports below the Root Port
> need to be woken to access config space of Endpoints. With Thunderbolt,
> these may be in D3cold and waking them up consumes a non-trivial amount
> of time and energy.
>
> As an aside, I note that the code in drivers/pci/pcie/pme.c doesn't
> take into account that there may be Switch Upstream and Downstream
> Ports between the Root Port and the wakeup-signaling device and
> those switch ports may be in D3hot or D3cold. Which means config space
> of the wakeup-signaling device is inaccessible. pci_check_pme_status()
> happens to be written in such a way that if it reads fabricated
> "all ones" responses from the device, it assumes that the device
> is signaling wakeup. The final pci_write_config_word() in
> pci_check_pme_status() will be lost but there's a call to
> pci_enable_wake(pci_dev, PCI_D0, false) upon runtime resume
> which makes up for the lost write, so the code happens to work.
> Just be aware of pitfalls there...
Oh, my. Sigh.