Re: [PATCH 2/4] memcg: uint16_t for nr_bytes in obj_stock_pcp
From: Shakeel Butt
Date: Wed May 20 2026 - 21:04:49 EST
On Wed, May 20, 2026 at 02:20:23PM +0100, David Laight wrote:
> On Tue, 19 May 2026 22:31:20 -0700
> Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> > Currently struct obj_stock_pcp stores nr_bytes in an 'unsigned int'
> > which is 4 bytes on 64-bit machines. Switch the field to uint16_t to
> > shrink the per-CPU cache.
> >
> > The kernel supports PAGE_SIZE_4KB, _8KB, _16KB, _32KB, _64KB and
> > _256KB (see HAVE_PAGE_SIZE_* in arch/Kconfig). After the
> > PAGE_SIZE-aligned flush in __refill_obj_stock(), the sub-page
> > remainder fits in uint16_t up through 64KiB pages where PAGE_SIZE - 1
> > == U16_MAX, but on 256KiB pages PAGE_SIZE - 1 == 0x3FFFF exceeds
> > U16_MAX. The accumulator also needs to stay within uint16_t between
> > page-aligned flushes on 64KiB pages where PAGE_SIZE itself is
> > U16_MAX + 1.
> >
> > Accumulate the new total in an 'unsigned int' local, then:
> >
> > 1. Flush whenever the accumulator would hit U16_MAX. Together with
> > the existing allow_uncharge flush at PAGE_SIZE, this keeps the
> > uint16_t safe on PAGE_SIZE <= 64KiB.
> >
> > 2. On configs with PAGE_SHIFT > 16 (PAGE_SIZE_256KB on hexagon and
> > powerpc 44x), push any sub-page remainder above U16_MAX into
> > objcg->nr_charged_bytes via atomic_add before storing back, so
> > the store cannot silently truncate. The PAGE_SHIFT > 16 guard
> > folds the branch out at compile time on smaller page sizes.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> > Tested-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > ---
> > mm/memcontrol.c | 33 +++++++++++++++++++++++++++------
> > 1 file changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d7c162946719..b3d63d9f267c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2019,7 +2019,7 @@ static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) = {
> >
> > struct obj_stock_pcp {
> > local_trylock_t lock;
> > - unsigned int nr_bytes;
> > + uint16_t nr_bytes;
> > struct obj_cgroup *cached_objcg;
> > int16_t node_id;
>
> You might want to move it to this hole.
> The size of 'lock' depends on kernel build options.
Thanks. In the final patch, I am rearranging the fields for better packing.
Please take a look at 4th patch and see if it still need fixing.