Re: [PATCH v8 0/6] mm/memory-failure: add panic option for unrecoverable pages

From: Andrew Morton

Date: Wed May 27 2026 - 15:39:54 EST


On Wed, 27 May 2026 07:06:13 -0700 Breno Leitao <leitao@xxxxxxxxxx> wrote:

> A multi-bit ECC error on a kernel-owned page that the memory failure
> handler cannot recover is currently swallowed: PG_hwpoison is set, the
> event is logged, and the kernel keeps running. The corrupted memory
> remains accessible to the kernel and either drives silent data
> corruption or surfaces seconds-to-minutes later as an apparently
> unrelated crash. In a large fleet that delayed, unattributable crash
> turns into significant engineering effort to root-cause; in a kdump
> configuration, by the time the crash happens the original error
> context (faulting PFN, MCE/GHES record, page state) is long gone.
>
> This series adds an opt-in sysctl,
> vm.panic_on_unrecoverable_memory_failure, that converts an
> unrecoverable kernel-page hwpoison event into an immediate panic with
> a clean dmesg/vmcore that still contains the original failure
> context. The default is disabled so existing workloads see no
> change.

Thanks. That does seem useful.

I'll pass at this time, due to -rc5 and not-very-reviewed.

AI review said a few things. It claims to have found one pre-existing
issue.

https://sashiko.dev/#/patchset/20260527-ecc_panic-v8-0-9ea0cfa16bb0@xxxxxxxxxx