Re: [PATCH v16 04/10] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
From: Dan Williams
Date: Sun Mar 29 2026 - 17:58:12 EST
Terry Bowman wrote:
> The CXL driver's uncorrectable (UCE) protocol error handling will be updated
> in the future. One required change is for the CXL error handlers to force a
> system panic when a UCE is detected.
Similar comment as the last patch "future" has no meaning and "required"
comes with no context.
> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>
> To clarify, PCI's merge_result() implemented in err.c is not to be changed.
> merge_result() is not aware of PCI_ERS_RESULT_PANIC and will not return
> PCI_ERS_RESULT_PANIC.
>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@xxxxxxx>
> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>
> ---
>
> Changes in v15 -> v16:
> - None
>
> Changes in v14 -> v15:
> - None
>
> Changes in v13 -> v14:
> - Add review-by for Dan
> - Update Title prefix (Bjorn)
> - Removed merge_result. Only logging error for device reporting the
> error (Dan)
>
> Changes in v12->v13:
> - Add Dave Jiang's, Jonathan's, Ben's review-by
> - Typo fix (Ben)
>
> Changes v11 -> v12:
> - Documentation requested (Lukas)
> ---
> Documentation/PCI/pci-error-recovery.rst | 2 ++
> include/linux/pci.h | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 43838723fde9..55be63f1a649 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -102,6 +102,8 @@ Possible return values are::
> PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> + PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
> + PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
> };
>
> A driver does not have to implement all of these callbacks; however,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 1c270f1d5123..0d6ad11e3422 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -933,6 +933,9 @@ enum pci_ers_result {
>
> /* No AER capabilities registered for the driver */
> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> + /* System is unstable, panic. Is CXL specific */
I will note that panic from error reporting is not a CXL specific
phenomenon. One of the future topics to resolve is indeed the
discrepency between ACPI/GHES and PCI/AER. PCI/AER does not panic in
situations that ACPI/GHES does (see ghes_panic()).
I also notice that helpful TODO in drivers/pci/pcie/err.c was removed
by:
b06d125e6280 ("PCI/ERR: Remove misleading TODO regarding kernel panic")
...which did not make any comment on that mismatch.