Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT

From: Mateusz Guzik

Date: Thu May 21 2026 - 05:42:39 EST


On Thu, May 21, 2026 at 11:21 AM Bert Karwatzki <spasswolf@xxxxxx> wrote:
>
> Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> > On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> > > Christian, can you fold this in please.
> > >
> > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > index 771fc31a69b8..8f17c0abbc95 100644
> > > --- a/fs/filesystems.c
> > > +++ b/fs/filesystems.c
> > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did someone beat us to it?
> > > */
> > > if (old && old->gen == file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > return 0;
> > > }
> > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did the list change in the meantime?
> > > */
> > > if (gen != file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > goto retry;
> > > }
> > >
> > >
> >
> > Even better, I got the above fixup + some polish listed below:
> > - removed an extra space in newlen calculation
> > - the WARN_ON_ONCE case needs to free 'new', not 'old'
> > - there is no READ_ONCE anymore in filesystems_proc_show()
> >
> > goes into the "fs: cache the string generated by reading /proc/filesystems"
> > commit.
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 771fc31a69b8..712316a1e3e0 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
> > hlist_for_each_entry_rcu(p, &file_systems, list) {
> > if (!(p->fs_flags & FS_REQUIRES_DEV))
> > newlen += strlen("nodev");
> > - newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > + newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > }
> > spin_unlock(&file_systems_lock);
> >
> > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did someone beat us to it?
> > */
> > if (old && old->gen == file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > return 0;
> > }
> > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did the list change in the meantime?
> > */
> > if (gen != file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > goto retry;
> > }
> > @@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
> > * generation above and messes it up.
> > */
> > spin_unlock(&file_systems_lock);
> > - if (old)
> > - kfree_rcu(old, rcu);
> > + kfree(new);
> > return -EINVAL;
> > }
> >
> > /*
> > - * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
> > + * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
> > */
> > smp_store_release(&file_systems_string, new);
> > spin_unlock(&file_systems_lock);
> >
>
> So it was commit 36b3306779ea
> ("fs: cache the string generated by reading /proc/filesystems")
> which caused the problem. If I had finished the bisection properly instead
> of cutting I probably would have noticed this...
>
> So I tested
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 771fc31a69b8..8f17c0abbc95 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did someone beat us to it?
> */
> if (old && old->gen == file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> return 0;
> }
> @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did the list change in the meantime?
> */
> if (gen != file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> goto retry;
> }
>
> with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
>
> [ 2.793416] [ T331] ------------[ cut here ]------------
> [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331

Just so that we are on the same page: if you take the -next tag as is
and revert the "fs: cache the string generated by reading
/proc/filesystems" commit alone things work fine?