Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one

From: David Hildenbrand (Arm)

Date: Mon May 11 2026 - 03:10:20 EST


On 5/6/26 11:44, Dev Jain wrote:
> Simplify the code by refactoring the folio_test_hugetlb() branch into
> a new function.
>
> While at it, convert BUG helpers to WARN helpers.
>
> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
> ---
> mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 69 insertions(+), 48 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a5f067a09de0f..a98acdea0530a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
> }
>
> +/* Returns false if unmap needs to be aborted */
> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,

I'm wondering whether we should make it clearer that this belongs to the
try_to_unmap family by calling it

ttu_hugetlb_folio

> + struct folio *folio, struct page_vma_mapped_walk *pvmw,
> + struct page *page, enum ttu_flags flags, pte_t *pteval,
> + struct mmu_notifier_range *range, bool *exit_walk)
> +{
> + /*
> + * The try_to_unmap() is only passed a hugetlb page
> + * in the case where the hugetlb page is poisoned.
> + */
> + VM_WARN_ON_PAGE(!PageHWPoison(page), page);

IIRC, we will never actually get a tail page here.

Can we avoid passing a page by checking instead whether the hugetlb folios is
marked as having a poisoned page?

See the folio_test_set_hwpoison() in hugetlb_update_hwpoison().

So you can simply use folio_test_hwpoison here instead.


> + /*
> + * huge_pmd_unshare may unmap an entire PMD page.
> + * There is no way of knowing exactly which PMDs may
> + * be cached for this mm, so we must flush them all.
> + * start/end were already adjusted above to cover this
> + * range.
> + */
> + flush_cache_range(vma, range->start, range->end);
> +
> + /*
> + * To call huge_pmd_unshare, i_mmap_rwsem must be
> + * held in write mode. Caller needs to explicitly
> + * do this outside rmap routines.
> + *
> + * We also must hold hugetlb vma_lock in write mode.
> + * Lock order dictates acquiring vma_lock BEFORE
> + * i_mmap_rwsem. We can only try lock here and fail
> + * if unsuccessful.
> + */
> + if (!folio_test_anon(folio)) {
> + struct mmu_gather tlb;
> +
> + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
> + if (!hugetlb_vma_trylock_write(vma)) {
> + *exit_walk = true;
> + return false;
> + }
> +
> + tlb_gather_mmu_vma(&tlb, vma);
> + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
> + hugetlb_vma_unlock_write(vma);
> + huge_pmd_unshare_flush(&tlb, vma);
> + tlb_finish_mmu(&tlb);
> + /*
> + * The PMD table was unmapped,
> + * consequently unmapping the folio.
> + */
> + *exit_walk = true;
> + return true;
> + }
> + hugetlb_vma_unlock_write(vma);
> + tlb_finish_mmu(&tlb);
> + }
> + *pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
> + if (pte_dirty(*pteval))
> + folio_mark_dirty(folio);
> +
> + *exit_walk = false;
> + return true;


Can we instead introduce some enum that tells the caller how to proceed?

I assume we have

(a) Abort walk (ret = false + page_vma_mapped_walk_done())

(b) Walk done (ret = true + page_vma_mapped_walk_done())

(c) Continue walk (call page_vma_mapped_walk())

enum ttu_walk_result {
TTU_WALK_CONTINUE,
TTU_WALK_ABORT,
TTU_WALK_DONE
}

> +}
> +
> /*
> * @arg: enum ttu_flags will be passed to this argument
> */
> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> PageAnonExclusive(subpage);
>
> if (folio_test_hugetlb(folio)) {
> - bool anon = folio_test_anon(folio);
> -
> - /*
> - * The try_to_unmap() is only passed a hugetlb page
> - * in the case where the hugetlb page is poisoned.
> - */
> - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
> - /*
> - * huge_pmd_unshare may unmap an entire PMD page.
> - * There is no way of knowing exactly which PMDs may
> - * be cached for this mm, so we must flush them all.
> - * start/end were already adjusted above to cover this
> - * range.
> - */
> - flush_cache_range(vma, range.start, range.end);
> + bool exit_walk;
>
> - /*
> - * To call huge_pmd_unshare, i_mmap_rwsem must be
> - * held in write mode. Caller needs to explicitly
> - * do this outside rmap routines.
> - *
> - * We also must hold hugetlb vma_lock in write mode.
> - * Lock order dictates acquiring vma_lock BEFORE
> - * i_mmap_rwsem. We can only try lock here and fail
> - * if unsuccessful.
> - */
> - if (!anon) {
> - struct mmu_gather tlb;
> -
> - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> - if (!hugetlb_vma_trylock_write(vma))
> - goto walk_abort;
> -
> - tlb_gather_mmu_vma(&tlb, vma);
> - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
> - hugetlb_vma_unlock_write(vma);
> - huge_pmd_unshare_flush(&tlb, vma);
> - tlb_finish_mmu(&tlb);
> - /*
> - * The PMD table was unmapped,
> - * consequently unmapping the folio.
> - */
> - goto walk_done;
> - }
> - hugetlb_vma_unlock_write(vma);
> - tlb_finish_mmu(&tlb);
> + ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
> + flags, &pteval, &range,
> + &exit_walk);
> + if (exit_walk) {
> + page_vma_mapped_walk_done(&pvmw);
> + break;

In the old walk_abort case you wouldn't set ret = false?

When returning the enum you could simply do something like

switch (ret) {
case TTU_WALK_ABORT:
goto walk_abort;
case TTU_WALK_DONE:
goto walk_done;
default:
break;
}


While I like this patch, can we please just move all the hugetlb shite into this
helper function?

Essentially, get rid of hugetlb special casing in the remainder of the function.

That also makes the function name clearer (right now it's only doing a part of
hugetlb folio unmapping).

--
Cheers,

David