Re: [PATCH v6 30/33] mm: memcontrol: prepare for reparenting non-hierarchical stats

From: Harry Yoo (Oracle)

Date: Mon Mar 23 2026 - 03:53:21 EST


On Thu, Mar 05, 2026 at 07:52:48PM +0800, Qi Zheng wrote:
> From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
>
> To resolve the dying memcg issue, we need to reparent LRU folios of child
> memcg to its parent memcg. This could cause problems for non-hierarchical
> stats.
>
> As Yosry Ahmed pointed out:
>
> ```
> In short, if memory is charged to a dying cgroup at the time of
> reparenting, when the memory gets uncharged the stats updates will occur
> at the parent. This will update both hierarchical and non-hierarchical
> stats of the parent, which would corrupt the parent's non-hierarchical
> stats (because those counters were never incremented when the memory was
> charged).
> ```
>
> Now we have the following two types of non-hierarchical stats, and they
> are only used in CONFIG_MEMCG_V1:
>
> a. memcg->vmstats->state_local[i]
> b. pn->lruvec_stats->state_local[i]
>
> To ensure that these non-hierarchical stats work properly, we need to
> reparent these non-hierarchical stats after reparenting LRU folios. To
> this end, this commit makes the following preparations:
>
> 1. implement reparent_state_local() to reparent non-hierarchical stats
> 2. make css_killed_work_fn() to be called in rcu work, and implement
> get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race
> between mod_memcg_state()/mod_memcg_lruvec_state()
> and reparent_state_local()
>
> Co-developed-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> Signed-off-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> ---
> kernel/cgroup/cgroup.c | 9 ++--
> mm/memcontrol-v1.c | 16 +++++++
> mm/memcontrol-v1.h | 7 +++
> mm/memcontrol.c | 97 ++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 125 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 23b70bd80ddc9..b0519a16f5684 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -473,6 +501,30 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
> return x;
> }
>
> +#ifdef CONFIG_MEMCG_V1
> +static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
> + enum node_stat_item idx, int val);
> +
> +void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
> + struct mem_cgroup *parent, int idx)
> +{
> + int nid;
> +
> + for_each_node(nid) {
> + struct lruvec *child_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid));
> + struct lruvec *parent_lruvec = mem_cgroup_lruvec(parent, NODE_DATA(nid));
> + unsigned long value = lruvec_page_state_local(child_lruvec, idx);
> + struct mem_cgroup_per_node *child_pn, *parent_pn;
> +
> + child_pn = container_of(child_lruvec, struct mem_cgroup_per_node, lruvec);
> + parent_pn = container_of(parent_lruvec, struct mem_cgroup_per_node, lruvec);
> +
> + __mod_memcg_lruvec_state(child_pn, idx, -value);
> + __mod_memcg_lruvec_state(parent_pn, idx, value);

We should probably change the type of `@val` from int to val to avoid
losing non hierarchical stats during reparenting?

> #ifdef CONFIG_MEMCG_V1
> static void __mod_memcg_state(struct mem_cgroup *memcg,
> enum memcg_stat_item idx, int val)
>
> @@ -769,6 +857,15 @@ unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
> #endif
> return x;
> }
> +
> +void reparent_memcg_state_local(struct mem_cgroup *memcg,
> + struct mem_cgroup *parent, int idx)
> +{
> + unsigned long value = memcg_page_state_local(memcg, idx);
> +
> + __mod_memcg_state(memcg, idx, -value);
> + __mod_memcg_state(parent, idx, value);
> +}

Same here.

Otherwise LGTM.

> #endif
>
> static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,

--
Cheers,
Harry / Hyeonggon