Re: [PATCH resend] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
From: Muchun Song
Date: Fri May 22 2026 - 05:08:00 EST
> On May 22, 2026, at 09:03, Wupeng Ma <mawupeng1@xxxxxxxxxx> wrote:
>
> Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page
> can trigger a recursive spinlock self-deadlock (AA deadlock) on
> hugetlb_lock when racing with a concurrent unmap:
>
> thread#0 thread#1
> -------- --------
> madvise(folio, MADV_HWPOISON)
> -> poisons the folio successfully
> madvise(folio, MADV_HWPOISON) unmap(folio)
> try_memory_failure_hugetlb
> get_huge_page_for_hwpoison
> spin_lock_irq(&hugetlb_lock) <- held
> __get_huge_page_for_hwpoison
> hugetlb_update_hwpoison()
> -> MF_HUGETLB_FOLIO_PRE_POISONED
> goto out:
> folio_put()
> refcount: 1 -> 0
> free_huge_folio()
> spin_lock_irqsave(&hugetlb_lock)
> -> AA DEADLOCK!
>
> The out: path in __get_huge_page_for_hwpoison() calls folio_put() to
> drop the GUP reference while the hugetlb_lock is still held by the
> hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap
> has released the page table mapping reference, folio_put() drops the
> folio refcount to zero, triggering free_huge_folio() which attempts
> to re-acquire the non-recursive hugetlb_lock.
>
> Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper
> into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the
> folio_put() at the out: label so the folio is always released outside
> the lock.
>
> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
> Signed-off-by: Wupeng Ma <mawupeng1@xxxxxxxxxx>
Acked-by: Muchun Song <muchun.song@xxxxxxxxx>
Thanks.