Re: [PATCH 0/8] per-memcg-per-node kmem accounting

From: Joshua Hahn

Date: Wed May 20 2026 - 23:47:20 EST

On Wed, 20 May 2026 10:39:59 +0200 Alexandre Ghiti <alex@xxxxxxxx> wrote:

> Hi Joshua,
>
> On 5/18/26 16:57, Joshua Hahn wrote:
> > On Mon, 11 May 2026 22:20:35 +0200 Alexandre Ghiti <alex@xxxxxxxx> wrote:
> >
> >> This series pursues the work initiated by Joshua [1]. We need kernel
> >> memory to be accounted on a per-node basis in order to be able to
> >> know the memcg and physical memory association.
> >>
> >> This series takes advantage of the recent introduction of per-node
> >> obj_cgroup [2] and makes those obj_cgroup tied to their numa node.
> >>
> >> The bulk of the series is percpu per-node accounting: percpu
> >> "precharges" the memcg before we know the actual location of the pages
> >> it uses, so charging and accounting had to be split. All other kmem
> >> users (slab, zswap, __memcg_kmem_charge_page) are straightforward
> >> conversions (zswap support is limited in this series because Joshua
> >> is working on it in parallel [3]).
> >>
> >> Thanks Joshua for your early feedbacks!
> > Hello Alex,
> >
> > Thank you for your work!
> >
> > Overall I think the direction makes sense to me. Pre-overcharging makes sense to
> > me as an approach, we would much rather overaccount than underaccount and
> > later have to breach limits.
> >
> > I do have some concerns on performance, though. Namely, I think there are
> > some expensive operations that I think would benefit from some performane
> > benchmarking with this patch added (maybe some simple microbenchmarks that
> > demonstrates kernel allocation overhead could be useful).
> >
> > From what I can tell, there is some additional performance overhead that has
> > to do with iterating over num_possible_cpus() x pages_per_alloc, which
> > doesn't seem trivial to me.
>
> Indeed, let me microbenchmark the overhead on a large system.

Hi Alex,

That sounds great with me : -) Looking forward to the numbers!

> > Another concern that I see is the stock credit system. Maybe we could be
> > bypassing the stock check leading to more time spent doing the atomic
> > operations.
>
> I'm not following on this one, which atomic operations do you see that
> could be bypassed?

So in my initial scan of the patch 7 I had a concern that if we have a nested
stock system (obj_cgroup stock and local credit "stock"), then we could
incur more work if these are out of sync; do extra work in the stock refill
path in obj_cgroup_precharge, and then do extra work on top in the loop
within the pcpu_memcg_post_alloc_hook (obj_cgroup_account_kmem does the
charging atomically I think).

So what I mean is, I'm not sure what the "size" is typically for
pcpu_memcg_post_alloc_hook. But it might be a worthwhile optimization to
do precharge all the pages, then for each cpu iterate over the pages to
figure out how many pages are used per nid (doing just math, not actually
doing the atomic adds), and then outside both of these loops just iterate
over every nid_objcg once to perform the atomic operation.

Maybe this is needed or not (depending on how big "size" typically is
and whether we go from doing O(1000) atomic adds --> O(10) or some
big reduction, but I just wanted to toss it out there as something that
could potentially be expensive.

> > obj_stock caches a single obj_cgroup, which means that if we split the objcg
> > to be per-node (in patch 6), then the obj_stock basically gets invalidated
> > every operation since we iterate over more objcgs (even though we are in
> > the same logical objcg). Maybe I'm missing something?
>
>
> The objcg split comes from commit 01b9da291c49 ("mm: memcontrol: convert
> objcg to be per-memcg per-node type") and the problem you describe is
> exactly what Shakeel is trying to fix [1].

Whoops O_o I completely missed that one. Sorry for flagging it again!

> But I remember trying a microbenchmark and noticed a +5% regression (on
> top of the 67% then...), I'll rebase this series on top of Shakeel's and
> re-run.

Sounds like a great idea! Thanks again Alex, have a great day! : -)
Joshua