Re: [PATCH v6 30/33] mm: memcontrol: prepare for reparenting non-hierarchical stats

From: Harry Yoo (Oracle)

Date: Mon Mar 23 2026 - 08:32:43 EST


On Mon, Mar 23, 2026 at 05:47:27PM +0800, Qi Zheng wrote:
> Hi Harry,
>
> On 3/23/26 3:53 PM, Harry Yoo (Oracle) wrote:
> > On Thu, Mar 05, 2026 at 07:52:48PM +0800, Qi Zheng wrote:
> > > From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> > >
> > > To resolve the dying memcg issue, we need to reparent LRU folios of child
> > > memcg to its parent memcg. This could cause problems for non-hierarchical
> > > stats.
> > >
> > > As Yosry Ahmed pointed out:
> > >
> > > ```
> > > In short, if memory is charged to a dying cgroup at the time of
> > > reparenting, when the memory gets uncharged the stats updates will occur
> > > at the parent. This will update both hierarchical and non-hierarchical
> > > stats of the parent, which would corrupt the parent's non-hierarchical
> > > stats (because those counters were never incremented when the memory was
> > > charged).
> > > ```
> > >
> > > Now we have the following two types of non-hierarchical stats, and they
> > > are only used in CONFIG_MEMCG_V1:
> > >
> > > a. memcg->vmstats->state_local[i]
> > > b. pn->lruvec_stats->state_local[i]
> > >
> > > To ensure that these non-hierarchical stats work properly, we need to
> > > reparent these non-hierarchical stats after reparenting LRU folios. To
> > > this end, this commit makes the following preparations:
> > >
> > > 1. implement reparent_state_local() to reparent non-hierarchical stats
> > > 2. make css_killed_work_fn() to be called in rcu work, and implement
> > > get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race
> > > between mod_memcg_state()/mod_memcg_lruvec_state()
> > > and reparent_state_local()
> > >
> > > Co-developed-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> > > Signed-off-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> > > Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> > > Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> > > ---
> > > kernel/cgroup/cgroup.c | 9 ++--
> > > mm/memcontrol-v1.c | 16 +++++++
> > > mm/memcontrol-v1.h | 7 +++
> > > mm/memcontrol.c | 97 ++++++++++++++++++++++++++++++++++++++++++
> > > 4 files changed, 125 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 23b70bd80ddc9..b0519a16f5684 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -473,6 +501,30 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
> > > return x;
> > > }
> > > +#ifdef CONFIG_MEMCG_V1
> > > +static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
> > > + enum node_stat_item idx, int val);
> > > +
> > > +void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
> > > + struct mem_cgroup *parent, int idx)
> > > +{
> > > + int nid;
> > > +
> > > + for_each_node(nid) {
> > > + struct lruvec *child_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid));
> > > + struct lruvec *parent_lruvec = mem_cgroup_lruvec(parent, NODE_DATA(nid));
> > > + unsigned long value = lruvec_page_state_local(child_lruvec, idx);
> > > + struct mem_cgroup_per_node *child_pn, *parent_pn;
> > > +
> > > + child_pn = container_of(child_lruvec, struct mem_cgroup_per_node, lruvec);
> > > + parent_pn = container_of(parent_lruvec, struct mem_cgroup_per_node, lruvec);
> > > +
> > > + __mod_memcg_lruvec_state(child_pn, idx, -value);
> > > + __mod_memcg_lruvec_state(parent_pn, idx, value);
> >
> > We should probably change the type of `@val` from int to val to avoid
> > losing non hierarchical stats during reparenting?
>
> The parameter and return value of memcg_state_val_in_pages() are both
> of type int, so perhaps we need a cleanup patch to do this?

Yes!

and @val in memcg_rstat_updated() too, I think.

> I will send a cleanup patchset to do this, which includes the following:
>
> https://lore.kernel.org/all/5e178b4e-a9e0-44dc-a18d-8c014365ee2f@xxxxxxxxx/

Thanks!

Should that ideally be applied before this patchset?

--
Cheers,
Harry / Hyeonggon