Re: [PATCH] mm/memory_hotplug: maintain N_NORMAL_MEMORY during hotplug

From: Joshua Hahn

Date: Fri Mar 27 2026 - 10:39:16 EST


On Fri, 27 Mar 2026 20:42:47 +0800 Hao Li <hao.li@xxxxxxxxx> wrote:

Hello Hao,

I hope you are doing well, thank you for the patch!

> N_NORMAL_MEMORY is initialized from zone population at boot, but memory
> hotplug currently only updates N_MEMORY. As a result, a node that gains
> normal memory via hotplug can remain invisible to users iterating over
> N_NORMAL_MEMORY, while a node that loses its last normal memory can stay
> incorrectly marked as such.

The second part feels more important than the second part, doing a quick
glance through the code I can see a few N_NORMAL_MEMORY iterators that
are in some hot paths like shrink_memcg. Iterating over nodes that don't
contain any NORMAL memory seems like an inefficiency rather than a bug
though.

> Restore N_NORMAL_MEMORY maintenance directly in online_pages() and
> offline_pages(). Set the bit when a node that currently lacks normal
> memory onlines pages into a zone <= ZONE_NORMAL, and clear it when
> offlining removes the last present pages from zones <= ZONE_NORMAL.
>
> This restores the intended semantics without bringing back the old
> status_change_nid_normal notifier plumbing which was removed in
> 8d2882a8edb8.
>
> Current users that benefit include list_lru, zswap, nfsd filecache,
> hugetlb_cgroup, and has_normal_memory sysfs reporting.
>
> Signed-off-by: Hao Li <hao.li@xxxxxxxxx>
> ---
>
> This patch also prepares for a subsequent SLUB change that makes
> can_free_to_pcs() rely on N_NORMAL_MEMORY to decide whether an object can be
> freed to the sheaf.
>
> ---
> mm/memory_hotplug.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index bc805029da51..5498744aa1f1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1155,6 +1155,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
> int need_zonelists_rebuild = 0;
> unsigned long flags;
> int ret;
> + bool need_set_normal_memory = false;
>
> /*
> * {on,off}lining is constrained to full memory sections (or more
> @@ -1180,6 +1181,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
> if (ret)
> goto failed_addition;
> }
> + /* Adding normal memory to the node for the first time */
> + if (!node_state(nid, N_NORMAL_MEMORY) && zone_idx(zone) <= ZONE_NORMAL)
> + need_set_normal_memory = true;
>
> ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> @@ -1209,6 +1213,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>
> if (node_arg.nid >= 0)
> node_set_state(nid, N_MEMORY);
> + if (need_set_normal_memory)
> + node_set_state(nid, N_NORMAL_MEMORY);
> if (need_zonelists_rebuild)
> build_all_zonelists(NULL);

Do we need the flag here? As far as I can tell, we can just skip this and just
directly check whether this is the first normal memory we are adding to the
node here and set the bit. Then we can remove the flag and the extraneous
check. We won't do any notifier work so I think it should be OK.

> @@ -1908,6 +1914,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> unsigned long flags;
> char *reason;
> int ret;
> + bool need_clear_normal_memory = false;
> + unsigned long node_normal_pages = 0;
> + enum zone_type zt;
>
> /*
> * {on,off}lining is constrained to full memory sections (or more
> @@ -1977,6 +1986,13 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> goto failed_removal_isolated;
> }
> }
> + /*
> + * Check whether this operation removes the node's last normal memory.
> + */
> + for (zt = 0; zt <= ZONE_NORMAL; zt++)
> + node_normal_pages += pgdat->node_zones[zt].present_pages;
> + if (nr_pages >= node_normal_pages && zone_idx(zone) <= ZONE_NORMAL)
> + need_clear_normal_memory = true;
>
> ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> @@ -2055,6 +2071,12 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> /* reinitialise watermarks and update pcp limits */
> init_per_zone_wmark_min();

Same here, couldn't we just iterate through the paegs here and accumulate
node_normal_pages and clear the memory here? We can get rid of the bool and
also keep node_normal_pages defined inside an if (zone_idx(zone) <= ZONE_NORMAL)
check as well.

> + /*
> + * Clear N_NORMAL_MEMORY first to avoid the transient state
> + * "!N_MEMORY && N_NORMAL_MEMORY".
> + */
> + if (need_clear_normal_memory)
> + node_clear_state(node, N_NORMAL_MEMORY);
> /*
> * Make sure to mark the node as memory-less before rebuilding the zone
> * list. Otherwise this node would still appear in the fallback lists.
> --
> 2.50.1
>
>

Thank you for the patch. The rest looks good to me! Have a great day : -)
Joshua