[PATCH 1/2] panic: add taint flag for recoverable hardware errors

From: Breno Leitao
Date: Fri Jul 04 2025 - 06:56:09 EST


This change introduces a new taint flag, bit 20 ('H'), to indicate when
the kernel has identified recoverable hardware failures during runtime.

The flag is documented in tainted-kernels.rst, defined in panic.h, added
to the taint_flags array in panic.c, and supported in the
kernel-chktaint debugging tool.

Marking kernels that have encountered recoverable hardware errors helps
correlate future issues with hardware events, improving diagnostics and
support for affected systems

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
Documentation/admin-guide/tainted-kernels.rst | 7 ++++++-
include/linux/panic.h | 3 ++-
kernel/panic.c | 1 +
tools/debugging/kernel-chktaint | 8 ++++++++
4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index a0cc017e44246..28185e9c0e039 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -102,7 +102,8 @@ Bit Log Number Reason that got the kernel tainted
17 _/T 131072 kernel was built with the struct randomization plugin
18 _/N 262144 an in-kernel test has been run
19 _/J 524288 userspace used a mutating debug operation in fwctl
-=== === ====== ========================================================
+ 20 _/H 1048576 hardware recoverable failures identified
+=== === ======= ========================================================

Note: The character ``_`` is representing a blank in this table to make reading
easier.
@@ -189,3 +190,7 @@ More detailed explanation for tainting
19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
to use the devices debugging features. Device debugging features could
cause the device to malfunction in undefined ways.
+
+ 20) ``H`` if the kernel identified any recoverable hardware failure earlier
+ during its operation. This helps to correlate possible future issues to
+ the fact that the hardware got a recoverable error.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 4adc657669354..d8241a052d69a 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -73,7 +73,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
#define TAINT_RANDSTRUCT 17
#define TAINT_TEST 18
#define TAINT_FWCTL 19
-#define TAINT_FLAGS_COUNT 20
+#define TAINT_HW_ERROR_RECOVERED 20
+#define TAINT_FLAGS_COUNT 21
#define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)

struct taint_flag {
diff --git a/kernel/panic.c b/kernel/panic.c
index b0b9a8bf4560d..fd13baf5d94bc 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -540,6 +540,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
TAINT_FLAG(RANDSTRUCT, 'T', ' ', true),
TAINT_FLAG(TEST, 'N', ' ', true),
TAINT_FLAG(FWCTL, 'J', ' ', true),
+ TAINT_FLAG(HW_ERROR_RECOVERED, 'H', ' ', false),
};

#undef TAINT_FLAG
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index e7da0909d0970..b2099155a820c 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -212,6 +212,14 @@ else
echo " * fwctl's mutating debug interface was used (#19)"
fi

+T=`expr $T / 2`
+if [ `expr $T % 2` -eq 0 ]; then
+ addout " "
+else
+ addout "H"
+ echo " * the kernel identified recoverable hardware errors (#20)"
+fi
+
echo "For a more detailed explanation of the various taint flags see"
echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources"
echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html";

--
2.47.1