RFC: CXL: How to handle trace event ABI break from CXL_HEADERLOG_SIZE fix?

From: Bowman, Terry

Date: Wed May 20 2026 - 15:55:56 EST


Hi Mauro and Everyone,

During development of the CXL protocol series [1], the Sashiko tool
identified an issue in the CXL RAS UCE trace logging for Ports and Endpoints.
Specifically, the CXL RAS UCE header log size is incorrectly defined in
drivers/cxl/cxl.h.

The UCE header log size is currently defined as 128 u32s (512 bytes),
whereas it should be 16 u32s (64 bytes) per CXL r4.0 8.2.4.17.7. Correcting
this will change the trace format and break the existing ABI contract
with rasdaemon.

How would you recommend proceeding to resolve this? I’ve included the Sashiko
review comments here:

commit 07869896095a6c3e7f10bf70e3c8c7b3f11530e1
Author: Terry Bowman <terry.bowman@xxxxxxx>
cxl: Fix CXL_HEADERLOG_SIZE to match RAS Capability size
This patch corrects CXL_HEADERLOG_SIZE from SZ_512 (512 bytes) to 64
bytes to match the actual size of the Header Log field in the CXL RAS
Capability Structure. The old value caused header_log_copy() to overrun
its iomap mapping, and caused the cxl_aer_uncorrectable_error trace
event to memcpy 448 bytes of kernel stack into the ring buffer.

> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..3e69bcdbbbed 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -158,8 +158,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
> #define CXL_RAS_HEADER_LOG_OFFSET 0x18
> #define CXL_RAS_CAPABILITY_LENGTH 0x58
> -#define CXL_HEADERLOG_SIZE SZ_512
> -#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
> +#define CXL_HEADERLOG_SIZE 64
> +#define CXL_HEADERLOG_SIZE_U32 (CXL_HEADERLOG_SIZE / sizeof(u32))

[ ... ]

The commit message acknowledges that this shrinks the header_log array
in both cxl_aer_uncorrectable_error and cxl_port_aer_uncorrectable_error
from 128 u32s to 16 u32s, changing the binary trace event ABI.

The event format exposed at:
/sys/kernel/tracing/events/cxl/cxl_aer_uncorrectable_error/format

will reflect the new smaller layout after this patch. Userspace tools
such as rasdaemon, perf, and trace-cmd that were built against the old
format file and hardcode 128 u32s (512 bytes) for header_log will
silently misparse events from a patched kernel -- reading the 448 bytes
of unrelated ring-buffer data that follow as though they were part of
the header_log field, and computing wrong offsets for any field after
it.

The old 512-byte size was always wrong per the spec, so no correctly
written tool could have fully relied on it. But does the Cc: stable tag
risk propagating this ABI change to stable kernels where production CXL
RAS monitoring tools may have been written against what was, however
incorrectly, a previously stable layout?
Is there a way to accompany this with a tools/ or Documentation/ update
to help userspace consumers detect and adapt to the new format, rather
than silently misparsing it?

[1] - https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@xxxxxxx/

Regards,
Terry