Re: [PATCH resend] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
From: Kefeng Wang
Date: Fri May 22 2026 - 05:20:48 EST
On 5/22/2026 9:03 AM, Wupeng Ma wrote:
Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page
can trigger a recursive spinlock self-deadlock (AA deadlock) on
hugetlb_lock when racing with a concurrent unmap:
thread#0 thread#1
-------- --------
madvise(folio, MADV_HWPOISON)
-> poisons the folio successfully
madvise(folio, MADV_HWPOISON) unmap(folio)
try_memory_failure_hugetlb
get_huge_page_for_hwpoison
spin_lock_irq(&hugetlb_lock) <- held
__get_huge_page_for_hwpoison
hugetlb_update_hwpoison()
-> MF_HUGETLB_FOLIO_PRE_POISONED
goto out:
folio_put()
refcount: 1 -> 0
free_huge_folio()
spin_lock_irqsave(&hugetlb_lock)
-> AA DEADLOCK!
The out: path in __get_huge_page_for_hwpoison() calls folio_put() to
drop the GUP reference while the hugetlb_lock is still held by the
hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap
has released the page table mapping reference, folio_put() drops the
folio refcount to zero, triggering free_huge_folio() which attempts
to re-acquire the non-recursive hugetlb_lock.
Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper
into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the
folio_put() at the out: label so the folio is always released outside
the lock.
Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Wupeng Ma <mawupeng1@xxxxxxxxxx>
Reviewed-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>