Re: [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
From: NeilBrown
Date: Wed Mar 18 2026 - 17:11:01 EST
[[ CC list trimmed ]]
On Mon, 16 Mar 2026, Amir Goldstein wrote:
> On Thu, Mar 12, 2026 at 10:49 PM NeilBrown <neilb@xxxxxxxxxxx> wrote:
> >
> > From: NeilBrown <neil@xxxxxxxxxx>
> >
> > When performing an "impure" readdir, ovl needs to perform a lookup on some
> > of the names that it found.
> > With proposed locking changes it will not be possible to perform this
> > lookup (in particular, not safe to wait for d_alloc_parallel()) while
> > holding a lock on the directory.
> >
> > ovl doesn't really need the lock at this point.
>
> Not exactly. see below.
>
> > It has already iterated
> > the directory and has cached a list of the contents. It now needs to
> > gather extra information about some contents. It can do this without
> > the lock.
> >
> > After gathering that info it needs to retake the lock for API
> > correctness. After doing this it must check IS_DEADDIR() again to
> > ensure readdir always returns -ENOENT on a removed directory.
> >
> > Note that while ->iterate_shared is called with a shared lock, ovl uses
> > WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
> > that exclusive lock.
> >
> > As the directory is no longer locked in ovl_cache_update() we need
> > dget_parent() to get a reference to the parent.
> >
> > Signed-off-by: NeilBrown <neil@xxxxxxxxxx>
> > ---
> > fs/overlayfs/readdir.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 1dcc75b3a90f..d5123b37921c 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
> > goto get;
> > }
> > if (p->len == 2) {
> > - /* we shall not be moved */
> > - this = dget(dir->d_parent);
> > + this = dget_parent(dir);
> > goto get;
> > }
> > }
> > /* This checks also for xwhiteouts */
> > - this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> > + this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
>
> ovl_cache_update() is also called from ovl_iterate_merged() where inode
> is locked.
>
> > if (IS_ERR_OR_NULL(this) || !this->d_inode) {
> > /* Mark a stale entry */
> > p->is_whiteout = true;
> > @@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path, struct list_head *list,
> > if (err)
> > return err;
> >
> > + inode_unlock(path->dentry->d_inode);
> > list_for_each_entry_safe(p, n, list, l_node) {
> > if (!name_is_dot_dotdot(p->name, p->len)) {
> > err = ovl_cache_update(path, p, true);
> > if (err)
> > - return err;
> > + break;
> > }
> > if (p->ino == p->real_ino) {
> > list_del(&p->l_node);
> > @@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path, struct list_head *list,
> > struct rb_node *parent = NULL;
> >
> > if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
> > - &newp, &parent)))
> > - return -EIO;
> > + &newp, &parent))) {
> > + err = -EIO;
> > + break;
> > + }
> >
> > rb_link_node(&p->node, parent, newp);
> > rb_insert_color(&p->node, root);
> > }
> > }
> > - return 0;
> > + inode_lock(path->dentry->d_inode);
> > + if (IS_DEADDIR(path->dentry->d_inode))
> > + err = -ENOENT;
> > + return err;
> > }
> >
> > static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
> > --
>
> You missed the fact that overlayfs uses the dir inode lock
> to protect the readdir inode cache, so your patch introduces
> a risk for storing a stale readdir cache when dir modify operations
> invalidate the readdir cache version while lock is dropped
> and also introduces memory leak when cache is stomped
> without freeing cache created by a competing thread.
> I think something like the untested patch below should fix this.
Yes, I did miss that - thanks. I think I missed a few other details too.
I no longer think it can be safe to drop the lock without substantial
rewrites - and even then maybe not.
So I'm considering a different approach.
This patch demonstrates what I'm thinking, though it still needs work I
think.
Thanks,
NeilBrown
From: NeilBrown <neil@xxxxxxxxxx>
Subject: [PATCH] ovl: stop using lookup_one() in iterate_shared() handling.
lookup_one() is expected to be removed as it does not fit well with
proposed changes to directory locking.
Specifically d_alloc_parallel() will be ordered outside of i_rwsem
and as iterate_shared() is called with i_rwsem held it is not safe
to call d_alloc_parallel().
We can instead call d_alloc_noblock() and then call the ->lookup, but
that can fail if there is a lookup attempt concurrent with the
readdir().
ovl cannot afford for the lookup to fail as that could produce incorrect
results, and it cannot safely drop i_rwsem temporarily and that could
introduce races with handling of the directory cache.
Instead we rely on the fact that ovl_iterate() has an exclusive lock on
the directory, so any concurrent lookup will wait for the ovl_iterate()
call to complete. We allocate a separate dentry and if the lookup is
successful, it is hashed with the result.
When the concurrent lookup gets i_rwsem it mustn't do its own lookup -
it must use the existing dentry. This is done using
try_lookup_noperm(). To manage overheads we keep a counter of the
number of "Stray dentries" there might be on each directory and only
check for one when this count is non zero.
If a 'stray dentry' were discarded for any reason before the concurrent
lookup completed, the count would never reach zero. That might be a problem.
Signed-off-by: NeilBrown <neil@xxxxxxxxxx>
---
fs/overlayfs/namei.c | 12 ++++++++++++
fs/overlayfs/ovl_entry.h | 1 +
fs/overlayfs/readdir.c | 26 ++++++++++++++++++++++++--
fs/overlayfs/super.c | 1 +
4 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index d8dd4b052984..c3ff57047712 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1399,6 +1399,18 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
if (dentry->d_name.len > ofs->namelen)
return ERR_PTR(-ENAMETOOLONG);
+ if (atomic_read(&OVL_I(dir)->stray_dentries) && d_in_lookup(dentry)) {
+ /* This dentry might have forced readdir to do the lookup */
+ struct dentry *alias =
+ try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
+ dentry->d_name.len),
+ dentry->d_parent);
+ if (alias && !IS_ERR(alias)) {
+ atomic_dec(&OVL_I(dir)->stray_dentries);
+ return alias;
+ }
+ }
+
with_ovl_creds(dentry->d_sb)
err = ovl_lookup_layers(&ctx, &d);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1d4828dbcf7a..0e7751d5dfca 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -172,6 +172,7 @@ struct ovl_inode {
struct inode vfs_inode;
struct dentry *__upperdentry;
struct ovl_entry *oe;
+ atomic_t stray_dentries; /* directory */
/* synchronize copy up and more */
struct mutex lock;
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1dcc75b3a90f..add556a0a2b6 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -557,6 +557,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
enum ovl_path_type type;
u64 ino = p->real_ino;
int xinobits = ovl_xino_bits(ofs);
+ bool did_alloc = false;
int err = 0;
if (!ovl_same_dev(ofs) && !p->check_xwhiteout)
@@ -574,8 +575,29 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
}
}
/* This checks also for xwhiteouts */
- this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
- if (IS_ERR_OR_NULL(this) || !this->d_inode) {
+ this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
+ if (this == ERR_PTR(-EWOULDBLOCK)) {
+ /*
+ * Some other thread is looking up this name and will block
+ * on i_rwsem before they can complete the lookup.
+ * We will do the lookup and when that lookup gets a turn it
+ * will return this dentry.
+ */
+ this = d_alloc_name(dir, p->name);
+ did_alloc = true;
+ }
+ if (!IS_ERR(this) && !d_unhashed(this)) {
+ /* Either we got in-lookup or we made our own unhashed */
+ struct dentry *alias = ovl_lookup(dir->d_inode, this, 0);
+ if (alias) {
+ d_lookup_done(this);
+ dput(this);
+ this = alias;
+ } else if (did_alloc) {
+ atomic_inc(&OVL_I(dir->d_inode)->stray_dentries);
+ }
+ }
+ if (IS_ERR(this) || !this->d_inode) {
/* Mark a stale entry */
p->is_whiteout = true;
if (IS_ERR(this)) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d4c12feec039..172d3ac7d3e2 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -195,6 +195,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
oi->__upperdentry = NULL;
oi->lowerdata_redirect = NULL;
oi->oe = NULL;
+ atomic_set(&oi->stray_dentries, 0);
mutex_init(&oi->lock);
return &oi->vfs_inode;
--
2.50.0.107.gf914562f5916.dirty