Re: [PATCH v6 30/33] mm: memcontrol: prepare for reparenting non-hierarchical stats

From: Qi Zheng

Date: Mon Mar 23 2026 - 05:50:38 EST


Hi Harry,

On 3/23/26 3:53 PM, Harry Yoo (Oracle) wrote:
On Thu, Mar 05, 2026 at 07:52:48PM +0800, Qi Zheng wrote:
From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>

To resolve the dying memcg issue, we need to reparent LRU folios of child
memcg to its parent memcg. This could cause problems for non-hierarchical
stats.

As Yosry Ahmed pointed out:

```
In short, if memory is charged to a dying cgroup at the time of
reparenting, when the memory gets uncharged the stats updates will occur
at the parent. This will update both hierarchical and non-hierarchical
stats of the parent, which would corrupt the parent's non-hierarchical
stats (because those counters were never incremented when the memory was
charged).
```

Now we have the following two types of non-hierarchical stats, and they
are only used in CONFIG_MEMCG_V1:

a. memcg->vmstats->state_local[i]
b. pn->lruvec_stats->state_local[i]

To ensure that these non-hierarchical stats work properly, we need to
reparent these non-hierarchical stats after reparenting LRU folios. To
this end, this commit makes the following preparations:

1. implement reparent_state_local() to reparent non-hierarchical stats
2. make css_killed_work_fn() to be called in rcu work, and implement
get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race
between mod_memcg_state()/mod_memcg_lruvec_state()
and reparent_state_local()

Co-developed-by: Yosry Ahmed <yosry@xxxxxxxxxx>
Signed-off-by: Yosry Ahmed <yosry@xxxxxxxxxx>
Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
---
kernel/cgroup/cgroup.c | 9 ++--
mm/memcontrol-v1.c | 16 +++++++
mm/memcontrol-v1.h | 7 +++
mm/memcontrol.c | 97 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 23b70bd80ddc9..b0519a16f5684 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -473,6 +501,30 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
return x;
}
+#ifdef CONFIG_MEMCG_V1
+static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
+ enum node_stat_item idx, int val);
+
+void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
+ struct mem_cgroup *parent, int idx)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *child_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid));
+ struct lruvec *parent_lruvec = mem_cgroup_lruvec(parent, NODE_DATA(nid));
+ unsigned long value = lruvec_page_state_local(child_lruvec, idx);
+ struct mem_cgroup_per_node *child_pn, *parent_pn;
+
+ child_pn = container_of(child_lruvec, struct mem_cgroup_per_node, lruvec);
+ parent_pn = container_of(parent_lruvec, struct mem_cgroup_per_node, lruvec);
+
+ __mod_memcg_lruvec_state(child_pn, idx, -value);
+ __mod_memcg_lruvec_state(parent_pn, idx, value);

We should probably change the type of `@val` from int to val to avoid
losing non hierarchical stats during reparenting?

The parameter and return value of memcg_state_val_in_pages() are both
of type int, so perhaps we need a cleanup patch to do this?

I will send a cleanup patchset to do this, which includes the following:

https://lore.kernel.org/all/5e178b4e-a9e0-44dc-a18d-8c014365ee2f@xxxxxxxxx/

Thanks,
Qi