Re: [PATCH 2/4] mm: add a template-based fast path for zone-device page init

From: Li Zhe

Date: Mon May 18 2026 - 05:56:28 EST


On Mon, 18 May 2026 09:51:34 +0300, rppt@xxxxxxxxxx wrote:

> Hi,
>
> On Fri, May 15, 2026 at 04:20:43PM +0800, Li Zhe wrote:
> > On 64-bit builds, memmap_init_zone_device() spends most of its time
> > repeating the same struct page initialization for every PFN. Prepare a
> > template page through the existing slow path once, then copy that
> > template into each destination page and fix up the PFN-dependent state
> > afterwards.
> >
> > Keep the optimized path disabled when the page_ref_set tracepoint is
> > active, because the template-copy path bypasses set_page_count() and
> > would otherwise hide the corresponding trace event.
> >
> > Non-64-bit builds continue to use the existing slow path.
>
> ZONE_DEVICE depends on MEMORY_HOTPLUG and MEMORY_HOTPLUG is only supported
> for 64 bits, so there can't be 32-bit builds for ZONE_DEVICE functionality.

Thanks for the clarification.
Indeed ZONE_DEVICE depends on MEMORY_HOTPLUG which is 64-bit only. I
will refine the description accordingly in v2.

> > Tested in a VM with a 100 GB fsdax namespace device configured with
> > map=dev on Intel Ice Lake server. This test exercises the nd_pmem rebind
> > path (pfns_per_compound == 1).
> >
> > Test procedure:
> > Rebind the nd_pmem driver 30 times and collect the memmap initialization
> > time from the pr_debug() output of memmap_init_zone_device().
> >
> > Base(v7.1-rc3):
> > First binding: 1486 ms
> > Average of subsequent rebinds: 273.52 ms
> >
> > With this patch:
> > First binding: 1421 ms
> > Average of subsequent rebinds: 246.14 ms
> >
> > This reduces the average rebind time from 273.52 ms to 246.14 ms, or
> > about 10%.
> >
> > Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
> > ---
> > mm/mm_init.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 96 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 5244acb96dbb..4c475c71a9d6 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1013,7 +1013,7 @@ static inline int zone_device_page_init_refcount(
> > }
> > }
> >
> > -static void __ref generic_init_zone_device_page(struct page *page,
> > +static void __ref generic_init_zone_device_page_slow(struct page *page,
> > unsigned long pfn, unsigned long zone_idx, int nid,
> > struct dev_pagemap *pgmap)
> > {
> > @@ -1040,12 +1040,9 @@ static void __ref generic_init_zone_device_page(struct page *page,
> > set_page_count(page, 0);
> > }
> >
> > -static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> > - unsigned long zone_idx, int nid,
> > - struct dev_pagemap *pgmap)
> > +static void __ref zone_device_page_init_pageblock(struct page *page,
> > + unsigned long pfn)
>
> Please move splitting _pageblock helper into the first patch, so that the
> first patch would contain all code movement.

Thanks, I will move the _pageblock helper split into the first patch
in v2.

> > {
> > - generic_init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> > -
> > /*
> > * Mark the block movable so that blocks are reserved for
> > * movable at startup. This will force kernel allocations
> > @@ -1062,6 +1059,88 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> > }
> > }
> >
> > +static inline void __init_zone_device_page(struct page *page, unsigned long pfn,
> > + unsigned long zone_idx, int nid,
> > + struct dev_pagemap *pgmap)
> > +{
> > + generic_init_zone_device_page_slow(page, pfn, zone_idx, nid, pgmap);
> > + zone_device_page_init_pageblock(page, pfn);
> > +}
> > +
> > +#if BITS_PER_LONG == 64
> > +static inline bool zone_device_page_init_optimization_enabled(void)
> > +{
> > + /*
> > + * We use template pages and assign page->_refcount via memory copy.
> > + * This means the optimized path bypasses set_page_count(), so the
> > + * page_ref_set tracepoint cannot observe this initialization.
> > + * Skip the optimized path when the tracepoint is enabled.
> > + */
> > + return !page_ref_tracepoint_active(page_ref_set);
> > +}
> > +
> > +static inline void struct_page_layout_check(void)
> > +{
> > + BUILD_BUG_ON(sizeof(struct page) & (sizeof(u64) - 1));
>
> Does it have to be a BUILD_BUG()? Can't we fallback to slow path if struct
> page has a weird size?
> Just do the check in zone_device_page_init_optimization_enabled().

Thanks, I'll replace the BUILD_BUG_ON() with a runtime check and fall
back to the slow path accordingly.

> > +}
> > +
> > +static inline void init_template_page(struct page *template,
> > + unsigned long pfn,
> > + unsigned long zone_idx,
> > + int nid,
> > + struct dev_pagemap *pgmap)
>
> The name should include zone_device to avoid confusion with regular pages.

Thanks, I will rename it to include zone_device in v2.

> > +{
> > + generic_init_zone_device_page_slow(template, pfn, zone_idx, nid, pgmap);
> > +}
> > +
> > +/*
> > + * Initialize parts that differ from the template
> > + */
> > +static inline void generic_init_zone_device_page_finish(struct page *page,
> > + unsigned long pfn)
> > +{
> > +#ifdef SECTION_IN_PAGE_FLAGS
> > + set_page_section(page, pfn_to_section_nr(pfn));
>
> Can we add a stub for set_page_address() for !SECTION_IN_PAGE_FLAGS case
> and drop the #ifdef here and in set_page_links()?

Thanks, I will add the stub and remove the #ifdef in the next version.

> > +#endif
> > +#ifdef WANT_PAGE_VIRTUAL
> > + if (!is_highmem_idx(ZONE_DEVICE))
> > + set_page_address(page, __va(pfn << PAGE_SHIFT));
>
> set_page_address() is a not when WANT_PAGE_VIRTUAL, you can drop the ifdef.

Upon checking the implementation, set_page_address() also has another
implementation for HASHED_PAGE_VIRTUAL

Following the style of __init_single_page(), we only want to call
set_page_address() under WANT_PAGE_VIRTUAL for ZONE_DEVICE initialization,
so would it be acceptable to keep the #ifdef guard here?

> > +#endif
> > +}
> > +
> > +static void init_zone_device_page_from_template(struct page *page,
> > + unsigned long pfn, const struct page *template)
>
> zone_device_page_init_from_template() please.

Thanks, I will rename it to zone_device_page_init_from_template in v2.

Thanks,
Zhe