Re: [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
From: Mika Westerberg
Date: Fri Mar 27 2026 - 07:16:42 EST
Hey,
On Thu, Mar 26, 2026 at 02:23:50PM -0700, Kuppuswamy Sathyanarayanan wrote:
> Hi Mika,
>
> On 3/25/2026 11:12 PM, Mika Westerberg wrote:
> > On Wed, Mar 25, 2026 at 02:12:48PM -0700, Kuppuswamy Sathyanarayanan wrote:
> >>
> >>
> >> On 3/24/2026 11:11 PM, Mika Westerberg wrote:
> >>> On Tue, Mar 24, 2026 at 02:45:25PM -0700, Kuppuswamy Sathyanarayanan wrote:
> >>>>> eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
> >>>>> cleared PCI_EXP_SLTCTL_HPIE so that when the link goes down, we
> >>>>> wouldn't get a PCI_EXP_SLTSTA_DLLSC interrupt and wake the system.
> >>>>>
> >>>>> I don't know the details of why the PCI_EXP_SLTSTA_DLLSC would cause
> >>>>> that wakeup. I would think pciehp should field that, and it should be
> >>>>> able to figure out whether to bring the port out of D3hot.
> >>>>>
> >>>>> Anyway, with this patch it looks like we'll leave PCI_EXP_SLTCTL_HPIE
> >>>>> set, and potentially get that PCI_EXP_SLTSTA_DLLSC interrupt again?
> >>>>
> >>>> I have tested this patch on Catlow Lake. Enabling HPIE does not result in
> >>>> spurious wakeups as mentioned in Mika's patch.
> >>>>
> >>>> Mika, any comments?
> >>>
> >>> What do you have connected to the slot?
> >>
> >> A network card.
> >
> > Okay.
> >
> > Out of interest how do you hotplug it? :)
>
> We physically remove and insert the card.
Got it.
> >>> IIRC the interrupt triggers when presence change toggles (due to the link
> >>> going down).
> >>>
> >>
> >> I have tested the s3 mode. I was able to see message related to system entering
> >> suspend and then coming back again after (after user intervention). I also noted
> >> pcie_disable_interrupt() called before suspend and pcie_enable_interrupt() called
> >> after resume.
> >
> > In case of S3 the BIOS also configures the hardware before entering
> > suspend. On client at least it's suspend-to-idle and any interrupt will
> > bring the CPU and the system out of it. It could be that that's the reason
> > you don't see any issue if this is server system and it goes into full S3?
> >
>
> Looking at the kernel logs, the system is actually using suspend-to-idle
> (s2idle), not full S3:
>
> PM: suspend entry (s2idle)
>
> So this is the same suspend mode where you observed the spurious wakeup issue.
> Interestingly, we're not seeing the problem on Catlow Lake with HPIE enabled.
>
> I am trying to understand the wakeup sequence in your case. IIUC, before the
> system enters suspend, it will put the device and port in D3hot, right? So link
> down should happen before the system goes to sleep or idle. At what point does
> the spurious DLLSC interrupt occur that causes the unwanted wakeup?
I think in case of tunneled PCIe it is presence detect that toggles and
triggers the interrupt if left enabled.
The flow is something like this (from my memory):
1. User enters s2idle.
2. PM core suspends devices.
3. PCI core suspends the devices behind the root port and then the root
port itself. This makes the root port be in D3hot and the link below it
is still in L1.
4. PCI/ACPI turns of the power resource attached to the root port. This
puts the link into L2/3 ready and then PERST# is asserted in which case
the tunnels are gone and presence detect changes and the link enters L2
and the root port enters D3cold.
In your case does the root port enter D3cold? Does it have power resource?
Or it stays in D3hot? We should not put any hotplug ports into low power
states if they don't have HotPlugSupportInD3 property as described here:
https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
There is some BIOS support needed for the D3cold. We have that in client
but I have not been dealing with the server so not familiar how things are
done there. I would think power savings are not that important in big iron.