Re: [RFC PATCH] cgroup/rstat: convert rstat lock from spinlock to rwlock

From: Michal Koutný

Date: Fri May 22 2026 - 12:12:32 EST


Hi Thomas.

On Tue, May 19, 2026 at 12:31:34PM -0500, Thomas Falcon <thomas.falcon@xxxxxxxxx> wrote:
> @@ -414,7 +427,7 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
> struct cgroup_subsys_state *pos;
>
> /* Reacquire for each CPU to avoid disabling IRQs too long */
> - __css_rstat_lock(css, cpu);
> + __css_rstat_lock(css, cpu, true);
> pos = css_rstat_updated_list(css, cpu);
> for (; pos; pos = pos->rstat_flush_next) {
> if (is_self) {
> @@ -424,7 +437,7 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
> } else
> pos->ss->css_rstat_flush(pos, cpu);
> }
> - __css_rstat_unlock(css, cpu);
> + __css_rstat_unlock(css, cpu, true);
> if (!cond_resched())
> cpu_relax();
> }


> @@ -717,11 +730,11 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq)
>
> if (cgroup_parent(cgrp)) {
> css_rstat_flush(&cgrp->self);
> - __css_rstat_lock(&cgrp->self, -1);
> + __css_rstat_lock(&cgrp->self, -1, false);
> bstat = cgrp->bstat;
> cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime,
> &bstat.cputime.utime, &bstat.cputime.stime);
> - __css_rstat_unlock(&cgrp->self, -1);
> + __css_rstat_unlock(&cgrp->self, -1, false);

I was wondering where these distinctions of readers vs writers stem from
and here I see that it's mainly the per-subsys vs rstat_base_lock.
Given that cputime_adjust() is here only modifying the local bstat
value, the read-like lock makes sense.

However, there's still the cgroup's flush right above which would take
the per-subsys locks in write-mode anyway.
Can you add some more explanation why this works?

More generally, I'm wondering where are the opportunities for replacing
per-subsys lock with an RW lock (or seqcount).

Thanks for looking into cpu.stat scalability,
Michal

Attachment: signature.asc
Description: PGP signature