Re: [PATCH v16 02/10] PCI/CXL: Update unregistration for AER-CXL and CPER-CXL kfifos

From: Dan Williams

Date: Sun Mar 29 2026 - 17:28:04 EST


Terry Bowman wrote:
> The current AER-CXL kfifo unregistration does not cancel pending work after
> clearing the work function pointer. In addition, cancel_work_sync() is
> called on behalf of the CPER-CXL kfifo in cxl_ras_exit() and should be
> moved into the kfifo deregistration function.
>
> Add logic to cancel the AER-CXL kfifo's pending work in
> cxl_unregister_proto_err_work().
>
> Move the CPER-CXL kfifo cancel call from cxl_ras_exit() to
> cxl_cper_unregister_prot_err_work(). Release the CPER-CXL spinlock
> before calling cancel_work_sync() to avoid deadlock.
>
> In both kfifo unregistration cases, add the necessary synchronization
> to enforce proper lock ordering: protect pointer updates under the
> lock, and clear the work pointer, then cancel any outstanding work
> after the lock is released.
>
> Link: https://lore.kernel.org/linux-cxl/6982ca54e094b_55fa1005@dwillia2-mobl4.notmuch/
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Assisted-by: Azure:gtp-4.1-nano-key
>
> ----
>
> Changes in v16:
> - New commit
> ---
> drivers/acpi/apei/ghes.c | 6 +++++-
> drivers/cxl/core/ras.c | 1 -
> drivers/pci/pcie/aer_cxl_vh.c | 9 ++++++++-
> 3 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8acd2742bb27..de935e0e1dcf 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -776,8 +776,12 @@ int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> if (cxl_cper_prot_err_work != work)
> return -EINVAL;
>
> - guard(spinlock)(&cxl_cper_prot_err_work_lock);
> + spin_lock(&cxl_cper_prot_err_work_lock);
> cxl_cper_prot_err_work = NULL;
> + spin_unlock(&cxl_cper_prot_err_work_lock);

In this case for a single statement getting re-indented a scoped_guard()
conversion is appropriate, but not significant enough to respin the
patch.

> +
> + cancel_work_sync(work);
> +
> return 0;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 006c6ffc2f56..949d8c8ecdfe 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -124,7 +124,6 @@ int cxl_ras_init(void)
> void cxl_ras_exit(void)
> {
> cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> - cancel_work_sync(&cxl_cper_prot_err_work);

Looks good.

> }
>
> static void cxl_dport_map_ras(struct cxl_dport *dport)
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index 7e2bc1894395..ebca1112652a 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -74,8 +74,15 @@ EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");
>
> void cxl_unregister_proto_err_work(void)
> {
> - guard(rwsem_write)(&cxl_proto_err_kfifo.rwsema);
> + struct work_struct *work;
> +
> + down_write(&cxl_proto_err_kfifo.rwsema);
> + work = cxl_proto_err_kfifo.work;
> cxl_proto_err_kfifo.work = NULL;
> + up_write(&cxl_proto_err_kfifo.rwsema);
> +
> + if (work)
> + cancel_work_sync(work);

This really should have been its own fix or folded, but at this point
given there are also reference counting issues to fix we can do that
work incrementally.