Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER reporting paths

From: Karolina Stolarek
Date: Mon May 05 2025 - 05:59:39 EST


On 29/04/2025 17:54, Jonathan Cameron wrote:
On Fri, 25 Apr 2025 16:12:26 +0200
Karolina Stolarek <karolina.stolarek@xxxxxxxxxx> wrote:

OK, that means even if we manage to inject a PCIe error, AER wouldn't be
able to look up the Source ID and other values it needs to report an
error, which is not quite the solution I was looking for.

Isn't the source ID in the CPER record? (Device ID field) or do
you mean something else?

Ah, sorry, I got confused on the way. I meant that even if we have the Device ID in CPER set, the specific device has no data in aer_regs if we inject an error using the GHES error injection script. We probably would end up with !info->status in aer_print_error(), thus printing only a line about "Inaccessible" agent and return early.

The aim is specifically to allow exercising FW first error handling
paths because it's a pain to get real systems that have firmware to inject
the full range of what the kernel etc need to handle.

Does this include PCIe errors? If so, that probably doesn't make sense
to try to test my patch on an actual system?

Ideally test it on a real system as well, but indeed the intent is to
allow testing of PCI errors on emulation.

I understand. Do you have pointers on how to inject it on a real system? All info I could find about FW error injection pointed to the qemu scripts I mentioned.

x86 support for emulated injection is a work in progress (more of a mess wrt
to the different ways the event signaling is handled than it is on arm64).

I did have an earlier version of that work wired up to the same
hooks as the native CXL error injection but I dropped it from my QEMU
CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly
revising the infrastructure. I'll bring it back when I get time.

I understand, I saw some of your series while looking for ways to test
my patch. Thank you very much for your work. As you can see, there are
people actually looking forward to it :)

Great! I'll try and get back to wiring it all up again sometime soon.

Awesome, thanks.

Bjorn, is this patch blocking the ratelimiting series? Would it be acceptable to use public logs in the commit message? I'm asking because it looks like there's no easy way to trigger the GHES path, or it would take some time, further delaying the ratelimiting work.

All the best,
Karolina


Jonathan