Re: [v8 PATCH 0/2] Add L1 and L2 error detection for A53, A57 and A72
From: Borislav Petkov
Date: Mon May 05 2025 - 05:11:25 EST
On Sun, May 04, 2025 at 05:27:38PM -0700, Vijay Balakrishna wrote:
> Hello,
>
> This is an attempt to revive [v5] series. I have attempted to address comments
> and suggestions from Marc Zyngier since [v5]. Additionally, I have extended
I'd like to hear from ARM folks here, whether this makes sense to have still.
> support for A72 processors. Testing the driver on a problematic A72 SoC
> has led to the detection of Correctable Errors (CEs). Below are logs captured
> from the problematic SoC during various boot instances.
>
> [ 876.896022] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
>
> [ 3700.978086] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
>
> [ 976.956158] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
>
> [ 1427.933606] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
>
> [ 192.959911] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
>
> Our primary focus is on A72. We have a significant number of A72-based systems
Then zap the support for the other CPUs as supporting those is futile.
cortex_arm64_l1_l2.c - I don't want an EDAC driver per RAS functional unit.
Call this edac_a72 or whatever, which will contain all A72 RAS functionality
support. ARM folks will give you a good idea here if you don't have.
Also, I'd need at least a reviewer entry to MAINTAINERS for patches to this
driver because you'll be the only ones testing this as you have vested
interest in this working.
The dt patch needs a reviewed-by too.
Once that is addressed, I'll take a look.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette