Re: [PATCH 1/2] mm/memory-failure: add panic_on_unrecoverable_memory_failure sysctl

From: Breno Leitao

Date: Tue Mar 31 2026 - 06:32:54 EST


Hi Miaohe,

On Tue, Mar 31, 2026 at 10:27:33AM +0800, Miaohe Lin wrote:
> On 2026/3/30 21:45, Breno Leitao wrote:
> > On Mon, Mar 30, 2026 at 03:55:00PM +0800, Miaohe Lin wrote:
> >> On 2026/3/23 23:29, Breno Leitao wrote:
> >>
> >>> @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
> >>> pr_err("%#lx: recovery action for %s: %s\n",
> >>> pfn, action_page_types[type], action_name[result]);
> >>>
> >>> + if (sysctl_panic_on_unrecoverable_mf &&
> >>> + type == MF_MSG_GET_HWPOISON && result == MF_IGNORED)
> >>> + panic("Memory failure: %#lx: unrecoverable page", pfn);
> >>
> >> MF_MSG_GET_HWPOISON contains some other scenarios. For example, an isolated folio will
> >> make get_hwpoison_page return -EIO so we will see MF_MSG_GET_HWPOISON and MF_IGNORED in
> >> action_result. But that's recoverable if folio is used by userspace thus panic will be
> >> unacceptable.
> >> Will it better to check type against MF_MSG_KERNEL_HIGH_ORDER?
> >
> > Yes, I was discussing this with akpm, and maybe the better
> > approach would be to panic for types MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_KERNEL.
> >
> > In both cases, it seems that, the page would not be able to migrate. What do
> > you think about a change like this:
> >
> >
> > @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
> > pr_err("%#lx: recovery action for %s: %s\n",
> > pfn, action_page_types[type], action_name[result]);
> >
> > + if (sysctl_panic_on_unrecoverable_mf && result == MF_IGNORED &&
> > + (type == MF_MSG_KERNEL || type == MF_MSG_KERNEL_HIGH_ORDER))
> > + panic("Memory failure: %#lx: unrecoverable page", pfn);
> > +
> > return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
> > }
> >
>
> Maybe MF_MSG_UNKNOWN can also be considered? Kernel can't do anything further
> for those folios.

Agreed, I'll incorporate that change.

> BTW I think current code can't reach to MF_MSG_KERNEL and MF_MSG_UNKNOWN cases
> bacause there is always a (PageHuge() || HWPoisonHandlable()) check before calling
> identify_page_state.

You're absolutely right. I'd like to address this observation as well in the
updated patch.

Thanks,
--breno