Re: [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during suspend
From: Bjorn Helgaas
Date: Fri May 02 2025 - 11:00:46 EST
On Fri, May 02, 2025 at 11:20:51AM +0800, hans.zhang@xxxxxxxxxxx wrote:
> From: Hans Zhang <hans.zhang@xxxxxxxxxxx>
>
> When PCIe ASPM L1 is enabled (CONFIG_PCIEASPM_POWERSAVE=y), certain
CONFIG_PCIEASPM_POWERSAVE=y only sets the default. L1 can be enabled
dynamically regardless of the config.
> NVMe controllers fail to release LPI MSI-X interrupts during system
> suspend, leading to a system hang. This occurs because the driver's
> existing power management path does not fully disable the device
> when ASPM is active.
I have no idea what this has to do with ASPM L1. I do see that
nvme_suspend() tests pcie_aspm_enabled(pdev) (which seems kind of
janky and racy). But this doesn't explain anything about what would
cause a system hang.
> The fix adds an explicit device disable and reset preparation step
> in the suspend path after successfully setting the power state.
> This ensures proper cleanup of interrupt resources even when ASPM
> L1 is enabled, preventing the system from hanging during suspend.
Maybe there's a clue in the 600 lines of debug output that I trimmed,
but without some interpretation, I have no idea how to find it.
Unless you see similar problems on other systems, I would suspect an
issue with the SoC or the SoC driver where you do see problems.
Bjorn