Re: [PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers

From: Nico Pache

Date: Wed Mar 18 2026 - 12:57:26 EST

On 3/16/26 12:17 PM, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 11, 2026 at 03:13:11PM -0600, Nico Pache wrote:
>> The anonymous page fault handler in do_anonymous_page() open-codes the
>> sequence to map a newly allocated anonymous folio at the PTE level:
>> - construct the PTE entry
>> - add rmap
>> - add to LRU
>> - set the PTEs
>> - update the MMU cache.
>
> Yikes yeah this all needs work. Thanks for looking at this!

np! I believe it looks much cleaner now, and it has the added benefit of
cleaning up some of the mTHP patchset.

I also believe you suggested this so I'll add your SB when I add your RB tag :)

>
>>
>> Introduce a two helpers to consolidate this duplicated logic, mirroring the
>
> NIT: 'Introduce a two helpers' -> "introduce two helpers'

ack!

>
>> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>>
>> map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
>> references, adds anon rmap and LRU. This function also handles the
>> uffd_wp that can occur in the pf variant.
>>
>> map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
>> counter updates, and mTHP fault allocation statistics for the page fault
>> path.
>
> MEGA nit, not sure why you're not just putting this in a bullet list, just
> weird to see not-code indented here :)

Ill drop the indentation!

>
>>
>> The zero-page read path in do_anonymous_page() is also untangled from the
>> shared setpte label, since it does not allocate a folio and should not
>> share the same mapping sequence as the write path. Make nr_pages = 1
>> rather than relying on the variable. This makes it more clear that we
>> are operating on the zero page only.
>>
>> This refactoring will also help reduce code duplication between mm/memory.c
>> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
>> mapping that can be reused by future callers.
>
> Maybe worth mentioning subseqent patches that will use what you set up
> here?

ok ill add something like "... that will be used when adding mTHP support to
khugepaged, and may find future reuse by other callers." Speaking of which I
tried to leverage this elsewhere and I believe that will take extra focus and
time, but may be doable.

>
> Also you split things out into _nopf() and _pf() variants, it might be
> worth saying exactly why you're doing that or what you are preparing to do?

ok ill expand on this in the "bullet list" you reference above.

>
>>
>> Reviewed-by: Dev Jain <dev.jain@xxxxxxx>
>> Reviewed-by: Lance Yang <lance.yang@xxxxxxxxx>
>> Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
>> Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
>
> There's nits above and below, but overall the logic looks good, so with
> nits addressed/reasonably responded to:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx>

Thanks :) Ill take care of those

>
>> ---
>> include/linux/mm.h | 4 ++++
>> mm/memory.c | 60 +++++++++++++++++++++++++++++++---------------
>> 2 files changed, 45 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 4c4fd55fc823..9fea354bd17f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4903,4 +4903,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>>
>> void snapshot_page(struct page_snapshot *ps, const struct page *page);
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool uffd_wp);
>
> How I hate how uffd infiltrates all our code like this.
>
> Not your fault :)
>
>> +
>> #endif /* _LINUX_MM_H */
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6aa0ea4af1fc..5c8bf1eb55f5 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -5197,6 +5197,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>> return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>> }
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool uffd_wp)
>> +{
>> + unsigned int nr_pages = folio_nr_pages(folio);
>
> const would be good
>
>> + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
>> +
>> + entry = pte_sw_mkyoung(entry);
>> +
>> + if (vma->vm_flags & VM_WRITE)
>> + entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> + if (uffd_wp)
>> + entry = pte_mkuffd_wp(entry);
>> +
>> + folio_ref_add(folio, nr_pages - 1);
>> + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> + folio_add_lru_vma(folio, vma);
>> + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
>> + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>> +}
>> +
>> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
>> +{
>> + unsigned int order = folio_order(folio);
>
> const would be good here also!

ack on the const's

>
>> +
>> + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
>> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
>
> Is 1 << order strictly right here? This field is a long value, so 1L <<
> order maybe? I get nervous about these shifts...
>
> Note that folio_large_nr_pages() uses 1L << order so that does seem
> preferable.

Ok sounds good thanks!

>
>> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
>> +}
>> +
>> /*
>> * We enter with non-exclusive mmap_lock (to exclude vma changes,
>> * but allow concurrent faults), and pte mapped but not yet locked.
>> @@ -5243,7 +5274,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>> return handle_userfault(vmf, VM_UFFD_MISSING);
>> }
>> - goto setpte;
>> + if (vmf_orig_pte_uffd_wp(vmf))
>> + entry = pte_mkuffd_wp(entry);
>> + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>
> How I _despise_ how uffd is implemented in mm. Feels like open coded
> nonsense spills out everywhere.
>
> Not your fault obviously :)
>
>> +
>> + /* No need to invalidate - it was non-present before */
>> + update_mmu_cache_range(vmf, vma, addr, vmf->pte,
>> + /*nr_pages=*/ 1);
>
> Is there any point in passing vmf here given you pass NULL above, and it
> appears that nobody actually uses this? I guess it doesn't matter but
> seeing this immediately made we question why you set it in one, and not the
> other?
>
> Maybe I'm mistaken and some arch uses it? Don't think so though.
>
> Also can't we then just use update_mmu_cache() which is the single-page
> wrapper of this AFAICT? That'd make it even simpler.

>
> Having done this, is there any reason to keep the annoying and confusing
> initial assignment of nr_pages = 1 at declaration time?
>
> It seems that nr_pages is unconditionally assigned before it's used
> anywhere now at line 5298:
>
> nr_pages = folio_nr_pages(folio);
> addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
> ...
>
> It's kinda weird to use nr_pages again after you go out of your way to
> avoid it using folio_nr_pages() in map_anon_folio_pte_nopf() and
> folio_order() in map_anon_folio_pte_pf().

Yes I believe so, thank you that is cleaner! In the past I had nr_pages as a
variable but David suggested being explicit with the "1" so it would be obvious
its a single page... update_mmu_cache() should indicate the same thing.

>
> But yeah, ok we align the address and it's yucky maybe leave for now (but
> we can definitely stop defaulting nr_pages to 1 :)
>
>> + goto unlock;
>> }
>>
>> /* Allocate our own private page. */
>> @@ -5267,11 +5305,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> */
>> __folio_mark_uptodate(folio);
>>
>> - entry = folio_mk_pte(folio, vma->vm_page_prot);
>> - entry = pte_sw_mkyoung(entry);
>> - if (vma->vm_flags & VM_WRITE)
>> - entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> -
>> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>> if (!vmf->pte)
>> goto release;
>> @@ -5293,19 +5326,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> folio_put(folio);
>> return handle_userfault(vmf, VM_UFFD_MISSING);
>> }
>> -
>> - folio_ref_add(folio, nr_pages - 1);
>> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
>> - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>> - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> - folio_add_lru_vma(folio, vma);
>> -setpte:
>> - if (vmf_orig_pte_uffd_wp(vmf))
>> - entry = pte_mkuffd_wp(entry);
>> - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
>> -
>> - /* No need to invalidate - it was non-present before */
>> - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
>> + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
>> + vmf_orig_pte_uffd_wp(vmf));
>
> So we're going from:
>
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> entry = pte_mkwrite(also dirty...)
> folio_ref_add(nr_pages - 1)
> add_mm_counter(... MM_ANON_PAGES, nr_pages)
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> if (vmf_orig_pte_uffd_wp(vmf))
> entry = pte_mkuffd_wp(entry)
> set_ptes(mm, addr, vmf->pte, entry, nr_pages)
> update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages)
>
> To:
>
> <map_anon_folio_pte_nopf>
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> entry = pte_mkwrite(also dirty...)
> if (vmf_orig_pte_uffd_wp(vmf) <passed in via uffd_wp>) <-- reordered
> entry = pte_mkuffd_wp(entry)
> folio_ref_add(nr_pages - 1)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> set_ptes(mm, addr, pte, entry, nr_pages)
> update_mmu_cache_range(NULL, vma, addr, pte, nr_pages)
>
> <map_anon_folio_pte_pf>
> add_mm_counter(... MM_ANON_PAGES, nr_pages) <-- reordered
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC) <-- reodrdered
>
> But the reorderings seem fine, and it is achieving the same thing.
>
> All the parameters being passed seem correct too.

Thanks for the review and verifying the logic :) I was particularly scared of
changing the page fault handler, so im glad multiple people have confirmed this
seems fine.

Cheers,
-- Nico

>
>> unlock:
>> if (vmf->pte)
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>> --
>> 2.53.0
>>
>
> Cheers, Lorenzo
>