Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
From: Bert Karwatzki
Date: Thu May 21 2026 - 06:16:02 EST
Am Donnerstag, dem 21.05.2026 um 11:25 +0200 schrieb Mateusz Guzik:
> On Thu, May 21, 2026 at 11:21 AM Bert Karwatzki <spasswolf@xxxxxx> wrote:
> >
> > Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> > > On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> > > > Christian, can you fold this in please.
> > > >
> > > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > > index 771fc31a69b8..8f17c0abbc95 100644
> > > > --- a/fs/filesystems.c
> > > > +++ b/fs/filesystems.c
> > > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > > * Did someone beat us to it?
> > > > */
> > > > if (old && old->gen == file_systems_gen) {
> > > > + spin_unlock(&file_systems_lock);
> > > > kfree(new);
> > > > return 0;
> > > > }
> > > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > > * Did the list change in the meantime?
> > > > */
> > > > if (gen != file_systems_gen) {
> > > > + spin_unlock(&file_systems_lock);
> > > > kfree(new);
> > > > goto retry;
> > > > }
> > > >
> > > >
> > >
> > > Even better, I got the above fixup + some polish listed below:
> > > - removed an extra space in newlen calculation
> > > - the WARN_ON_ONCE case needs to free 'new', not 'old'
> > > - there is no READ_ONCE anymore in filesystems_proc_show()
> > >
> > > goes into the "fs: cache the string generated by reading /proc/filesystems"
> > > commit.
> > >
> > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > index 771fc31a69b8..712316a1e3e0 100644
> > > --- a/fs/filesystems.c
> > > +++ b/fs/filesystems.c
> > > @@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > hlist_for_each_entry_rcu(p, &file_systems, list) {
> > > if (!(p->fs_flags & FS_REQUIRES_DEV))
> > > newlen += strlen("nodev");
> > > - newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > > + newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > > }
> > > spin_unlock(&file_systems_lock);
> > >
> > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did someone beat us to it?
> > > */
> > > if (old && old->gen == file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > return 0;
> > > }
> > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did the list change in the meantime?
> > > */
> > > if (gen != file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > goto retry;
> > > }
> > > @@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
> > > * generation above and messes it up.
> > > */
> > > spin_unlock(&file_systems_lock);
> > > - if (old)
> > > - kfree_rcu(old, rcu);
> > > + kfree(new);
> > > return -EINVAL;
> > > }
> > >
> > > /*
> > > - * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
> > > + * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
> > > */
> > > smp_store_release(&file_systems_string, new);
> > > spin_unlock(&file_systems_lock);
> > >
> >
> > So it was commit 36b3306779ea
> > ("fs: cache the string generated by reading /proc/filesystems")
> > which caused the problem. If I had finished the bisection properly instead
> > of cutting I probably would have noticed this...
> >
> > So I tested
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 771fc31a69b8..8f17c0abbc95 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did someone beat us to it?
> > */
> > if (old && old->gen == file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > return 0;
> > }
> > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did the list change in the meantime?
> > */
> > if (gen != file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > goto retry;
> > }
> >
> > with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> > crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
> >
> > [ 2.793416] [ T331] ------------[ cut here ]------------
> > [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> > [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
>
> Just so that we are on the same page: if you take the -next tag as is
> and revert the "fs: cache the string generated by reading
> /proc/filesystems" commit alone things work fine?
Yes this works fine:
16ff8d6e7c28 (HEAD) Revert "fs: cache the string generated by reading /proc/filesystems"
6a50ba100ace (tag: next-20260519, origin/master, origin/HEAD, master) Add linux-next specific files for 20260519
(tested only without RT as the bug seems to be easier to trigger ...)
Bert Karwatzki