Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
From: Jeff Layton
Date: Wed Jun 03 2026 - 13:53:23 EST
On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
>
> On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> > (get_net) and release it in nfsd_file_free() (put_net), so that
> > nf_net is always valid for files that the GC or shrinker has isolated
> > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> > see.
> >
> > Without this, nf_net can dangle for in-flight files whose net namespace
> > is torn down concurrently, causing a use-after-free when
> > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
> >
> > Only files entering the delayed-dispose path need the net reference;
> > files freed synchronously (non-GC stateful opens, purge, direct put)
> > are always freed while the net namespace is still alive, so they skip
> > get_net()/put_net() entirely. A new NFSD_FILE_NET_HELD flag tracks
> > whether a given nfsd_file holds a net reference.
> >
> > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> > nfsd_file_put_local() pattern of returning nf->nf_net after
> > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> > last net namespace reference, leaving the returned pointer stale.
> > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> > The function now returns void and the caller no longer needs to handle
> > the net reference.
> >
> > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> > Assisted-by: Claude:claude-opus-4-6
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> > fs/nfsd/filecache.c | 34 ++++++++++++++++++++++++++--------
> > fs/nfsd/filecache.h | 3 ++-
> > fs/nfsd/localio.c | 4 ++--
> > include/linux/nfslocalio.h | 9 ++-------
> > 4 files changed, 32 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 03f01a0beced..957fe57be063 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
> > if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
> > return;
> >
> > + if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> > + put_net(nf->nf_net);
> > +
> > call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
> > }
> >
> > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
> > }
> >
> > /**
> > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
> > * @pnf: nfsd_file of which to put the reference
> > *
> > - * First save the associated net to return to caller, then put
> > - * the reference of the nfsd_file.
> > + * Drops both the nfsd_file reference and the associated nfsd_net
> > + * reference. The nfsd_net ref is released before the file ref so
> > + * that put_net() inside nfsd_file_free() cannot drop the last net
> > + * namespace reference while the caller still needs it.
> > */
> > -struct net *
> > +void
> > nfsd_file_put_local(struct nfsd_file __rcu **pnf)
> > {
> > struct nfsd_file *nf;
> > - struct net *net = NULL;
> >
> > nf = unrcu_pointer(xchg(pnf, NULL));
> > if (nf) {
> > - net = nf->nf_net;
> > + struct net *net = nf->nf_net;
> > +
> > + rcu_read_lock();
> > + nfsd_net_put(net);
> > + rcu_read_unlock();
> > nfsd_file_put(nf);
> > }
> > - return net;
> > }
> >
> > /**
> > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > while (!list_empty(dispose)) {
> > struct nfsd_file *nf = list_first_entry(dispose,
> > struct nfsd_file, nf_gc);
> > - struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> > + struct nfsd_net *nn;
> > struct svc_serv *serv;
> >
> > + /*
> > + * Pin the net namespace so nf_net stays valid while the
> > + * file sits on the per-net dispose list. Callers (the GC
> > + * worker, shrinker, and fsnotify callbacks) always run
> > + * before nfsd_net_exit(), so nf_net is still live here.
> > + * The matching put_net() is in nfsd_file_free().
> > + */
> > + get_net(nf->nf_net);
> > + set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> > +
> > + nn = net_generic(nf->nf_net, nfsd_net_id);
> > spin_lock(&nn->fcache_dispose_lock);
> > list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
> > spin_unlock(&nn->fcache_dispose_lock);
> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > index 683b6437cacc..7ae3c0ea0a2a 100644
> > --- a/fs/nfsd/filecache.h
> > +++ b/fs/nfsd/filecache.h
> > @@ -45,6 +45,7 @@ struct nfsd_file {
> > #define NFSD_FILE_REFERENCED (2)
> > #define NFSD_FILE_GC (3)
> > #define NFSD_FILE_RECENT (4)
> > +#define NFSD_FILE_NET_HELD (5)
> > unsigned long nf_flags;
> > refcount_t nf_ref;
> > unsigned char nf_may;
> > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
> > int nfsd_file_cache_start_net(struct net *net);
> > void nfsd_file_cache_shutdown_net(struct net *net);
> > void nfsd_file_put(struct nfsd_file *nf);
> > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > struct file *nfsd_file_file(struct nfsd_file *nf);
> > void nfsd_file_close_inode_sync(struct inode *inode);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > index c3eb0557b3e1..e3295bae75a4 100644
> > --- a/fs/nfsd/localio.c
> > +++ b/fs/nfsd/localio.c
> > @@ -40,8 +40,8 @@
> > * avoid all the NFS overhead with reads, writes and commits.
> > *
> > * On successful return, returned nfsd_file will have its nf_net member
> > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net
> > ref.
> > */
> > static struct nfsd_file *
> > nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > index 3d91043254e6..7267a69092d1 100644
> > --- a/include/linux/nfslocalio.h
> > +++ b/include/linux/nfslocalio.h
> > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
> > const struct nfs_fh *,
> > struct nfsd_file __rcu **pnf,
> > const fmode_t);
> > - struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > + void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > struct file *(*nfsd_file_file)(struct nfsd_file *);
> > void (*nfsd_file_dio_alignment)(struct nfsd_file *,
> > u32 *, u32 *, u32 *);
> > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct
> > nfsd_file __rcu **localio)
> > * must prevent nfsd shutdown from completing as nfs_close_local_fh()
> > * does by blocking the nfs_uuid from being finally put.
> > */
> > - struct net *net;
> > -
> > - net = nfs_to->nfsd_file_put_local(localio);
> > -
> > - if (net)
> > - nfs_to_nfsd_net_put(net);
> > + nfs_to->nfsd_file_put_local(localio);
> > }
> >
> > #else /* CONFIG_NFS_LOCALIO */
> >
> > --
> > 2.54.0
>
> It seems that all of the LLM reviewers have difficulty with this patch.
> This is a consolidated review of the issue from Claude and Codex:
>
> > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> > nfsd_file_put() -- introduces a shutdown race.
> >
> > The nfsd_net_ref percpu refcount is taken only by LOCALIO
> > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> > and through the latter, nfsd_file_cache_shutdown(), which runs
> > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> > the fsnotify groups, and the rhltable.
> >
> > Per-I/O references are not covered by the nfs_uuid handshake. Each
> > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> > and releases it at completion through nfsd_file_put_local(). An
> > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> > does not wait for it; only the drain wait does. Meanwhile
> > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> > cannot free it (the iocb ref keeps refcount elevated in
> > nfsd_file_cond_queue()).
> >
> > So with one I/O in flight when the server is stopped: the shutdown
> > thread parks at the drain wait; the I/O completion thread enters
> > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> > complete() before nfsd_file_put() has executed. The shutdown thread
> > then proceeds through nfsd_file_cache_shutdown() concurrently with
> > the final nfsd_file_free(): the call_rcu() is queued after the
> > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> > because the nfsd_file is still allocated.
> >
> > The old ordering was the mechanism that prevented this: the caller
> > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> > guarantees the release callback runs only after every ref is
> > dropped, so global teardown strictly followed the file free and the
> > rcu_barrier() flushed its call_rcu().
> >
> > The hazard the commit message cites for the reorder cannot occur on
> > this path: NFSD_FILE_NET_HELD is set only in
> > nfsd_file_dispose_list_delayed(), reachable only through
> > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> > A file with an outstanding LOCALIO reference has refcount >= 2, so
> > a file whose final put arrives via nfsd_file_put_local() never has
> > NET_HELD set and its nfsd_file_free() never calls put_net().
> >
> > Suggest keeping the void API but restoring the put order:
> >
> > nf = unrcu_pointer(xchg(pnf, NULL));
> > if (nf) {
> > struct net *net = nf->nf_net;
> >
> > nfsd_file_put(nf);
> > rcu_read_lock();
> > nfsd_net_put(net);
> > rcu_read_unlock();
> > }
> >
> > with the kdoc comment and the commit message paragraph about the
> > old ordering being unsafe adjusted to match.
>
I had claude review this and it says:
● This is the same concern I just addressed for the previous patch's Finding 3, restated as a
critical bug. The answer is the same: this is a false positive.
The reviewer's scenario requires:
1. The global shrinker unhashes and isolates an nfsd_file onto a local dispose list
2. The net namespace teardown completes and struct net is freed
3. The shrinker resumes and calls get_net() on freed memory
Step 2 cannot happen before step 3. Here's why:
The net namespace struct net is reference-counted. It isn't freed when nfsd_net_exit() runs --
nfsd_net_exit() is a pernet subsys callback that runs during net namespace teardown, but the
struct net itself stays alive until its own refcount drops to zero. The struct net outlives all
pernet exit callbacks.
More importantly, nfsd_file_dispose_list_delayed() is called from exactly three places:
- nfsd_file_gc() -- the GC delayed work
- nfsd_file_lru_scan() -- the shrinker
- nfsd_file_close_inode() -- fsnotify
All three isolate items and call nfsd_file_dispose_list_delayed() in a single, synchronous
function call. There's no point where a local dispose list sits around waiting while the net
namespace is torn down. The isolation and the get_net() happen in the same execution context.
The shrinker and GC worker are global and remain registered until nfsd_file_cache_shutdown()
(global teardown). Per-net teardown via nfsd_file_cache_shutdown_net() does not unregister them.
But the files they process were allocated while the net namespace was alive, and nf->nf_net was
set at allocation time in nfsd_file_alloc(). The struct net cannot have a zero refcount while nfsd
files referencing it still exist in the LRU -- the nfsd server itself holds net references
through its operation.
The get_net() here is not protecting against the current dereference being unsafe. It's ensuring
that nf_net remains valid for the arbitrarily long time the file sits on the per-net dispose list
waiting for an nfsd thread to close it. That's exactly the right pattern.
Verdict: False positive. The get_net() placement is correct and the struct net is guaranteed to be
live at this point.
--
Jeff Layton <jlayton@xxxxxxxxxx>