Re: [PATCH] cgroup: rstat: relax NMI guard after switch to try_cmpxchg
From: Cunlong Li
Date: Wed May 20 2026 - 22:38:22 EST
On Wed, May 20, 2026 at 03:41:02PM -0700, Shakeel Butt wrote:
> On Wed, May 20, 2026 at 11:30:54AM +0800, Cunlong Li wrote:
> > Commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe") used
> > this_cpu_cmpxchg() for the lockless insertion, and therefore required
> > both ARCH_HAVE_NMI_SAFE_CMPXCHG and ARCH_HAS_NMI_SAFE_THIS_CPU_OPS in
> > the NMI guard: on archs without the latter, this_cpu_cmpxchg() falls
> > back to "local_irq_save() + plain cmpxchg", and local_irq_save()
> > cannot mask NMIs.
> >
> > Commit 3309b63a2281 ("cgroup: rstat: use LOCK CMPXCHG in
> > css_rstat_updated") later replaced this_cpu_cmpxchg() with plain
> > try_cmpxchg() to fix cross-CPU lockless-list corruption, but left the
> > NMI guard untouched. After that switch, css_rstat_updated() no longer
> > performs any this_cpu_*() RMW operations and only relies on the arch
> > having NMI-safe cmpxchg, so ARCH_HAS_NMI_SAFE_THIS_CPU_OPS is no
> > longer required in the guard.
> >
> > Relax the guard accordingly so that archs which have HAVE_NMI and
> > ARCH_HAVE_NMI_SAFE_CMPXCHG but not ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> > (e.g. sparc, powerpc on PPC64/BOOK3S) can benefit from the existing
> > CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC path. Without this, the css
> > is never queued in NMI on those archs, and the atomics staged by
> > account_{slab,kmem}_nmi_safe() are not drained by flush_nmi_stats().
> >
> > Fixes: 3309b63a2281 ("cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated")
> > Signed-off-by: Cunlong Li <shenxiaogll@xxxxxxxxx>
>
> Looks fine but how did you find this? AI?
>
> Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
>
Yes, AI-assisted.
I'm new to kernel development and was studying the memcg code.
When I came across the guard in css_rstat_updated():
if ((!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) ||
!IS_ENABLED(CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS)) && in_nmi())
return;
I asked Opus what those two CONFIGs mean and why the function
returns when in_nmi(). It suggested ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
may no longer be required after the switch from this_cpu_cmpxchg()
to try_cmpxchg(). I then went through the related commit history
and confirmed the analysis.
Thanks for the ack!