Re: [syzbot] [mm?] [cgroups?] WARNING: bad unlock balance in lruvec_stat_mod_folio

From: Shakeel Butt

Date: Tue Apr 14 2026 - 13:19:17 EST

On Tue, Apr 14, 2026 at 11:52:13AM +0800, Qi Zheng wrote:
> Hi Shakeel,
>
> On 4/14/26 6:28 AM, Shakeel Butt wrote:
> > +Qi & Yosry
> >
> > On Tue, Apr 07, 2026 at 10:53:24AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: cc13002a9f98 Add linux-next specific files for 20260402
> > > git tree: linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=10d8946a580000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=4e6c8be618ab359
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=1a3353a77896e73a8f53
> > > compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Let's wait for the reproducer. I can only think of cgroup_subsys_on_dfl() check
> > returning different value in get_non_dying_memcg_start() and
> > get_non_dying_memcg_end() to cause this uneven rcu unlock. However I can't think
> > why and how that can happen.
> >
>
> My AI bot told me that the cgroup_subsys_on_dfl_key can be dynamically
> modified at runtime during a rebind:
>
> rebind_subsystems()
> --> if (dst_root == &cgrp_dfl_root) {
> static_branch_enable(cgroup_subsys_on_dfl_key[ssid]);
> } else {
> dcgrp->subtree_control |= 1 << ssid;
> static_branch_disable(cgroup_subsys_on_dfl_key[ssid]);
> }
>
> However, when I actually tested it, I hit the following error:
>
> mount: /tmp/cg-rb-repro: mount point is busy.
>
> Indeed, there are already many child cgroups under the cgroup v2 root
> (the VM just booted):
>
> root@localhost:~# find /sys/fs/cgroup -mindepth 1 -maxdepth 2 -type d | head
> -50
> /sys/fs/cgroup/sys-kernel-debug.mount
> /sys/fs/cgroup/dev-mqueue.mount
> /sys/fs/cgroup/user.slice
> /sys/fs/cgroup/user.slice/user-0.slice
> /sys/fs/cgroup/sys-kernel-tracing.mount
> /sys/fs/cgroup/init.scope
> /sys/fs/cgroup/system.slice
> /sys/fs/cgroup/system.slice/systemd-networkd.service
> /sys/fs/cgroup/system.slice/systemd-udevd.service
> /sys/fs/cgroup/system.slice/system-serial\x2dgetty.slice
> /sys/fs/cgroup/system.slice/wpa_supplicant.service
> /sys/fs/cgroup/system.slice/system-modprobe.slice
> /sys/fs/cgroup/system.slice/systemd-journald.service
> /sys/fs/cgroup/system.slice/unattended-upgrades.service
> /sys/fs/cgroup/system.slice/system-systemd\x2dgrowfs.slice
> /sys/fs/cgroup/system.slice/ssh.service
> /sys/fs/cgroup/system.slice/dhcpcd.service
> /sys/fs/cgroup/system.slice/systemd-resolved.service
> /sys/fs/cgroup/system.slice/dbus.service
> /sys/fs/cgroup/system.slice/systemd-timesyncd.service
> /sys/fs/cgroup/system.slice/system-getty.slice
> /sys/fs/cgroup/system.slice/systemd-logind.service
> /sys/fs/cgroup/dev-hugepages.mount
>
> So it seems impossible to rebind memory in a production environment
> using systemd?
>
> Then I disabled systemd:
>
> set `init=/bin/bash`
>
> and found that I could successfully run the following commands:
>
> root@(none):/# mkdir -p /tmp/cg-rb-repro
> root@(none):/# mount -t cgroup -o none,name=rb none /tmp/cg-rb-repro
> root@(none):/# mount -t cgroup -o remount,memory none /tmp/cg-rb-repro
> [ 65.903125][ T241] option changes via remount are deprecated (pid=241
> comm=mount)
> root@(none):/# mount -t cgroup -o remount,name=rb none /tmp/cg-rb-repro
> [ 73.405829][ T242] option changes via remount are deprecated (pid=242
> comm=mount)
> root@(none):/# umount /tmp/cg-rb-repro
>
> So it seems this race condition does exist. Should we fix it?

This only succeeded because there weren't any active cgroups. Were you able to
trigger the warning as well. If not, I think we should just wait for
reproducer from syzbot before doing anything.