Re: [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops

From: NeilBrown

Date: Thu Apr 30 2026 - 05:10:26 EST


On Thu, 30 Apr 2026, Jeff Layton wrote:
> On Thu, 2026-04-30 at 12:03 +1000, NeilBrown wrote:
> > Following are 7 VFS patches which modify or introduce APIs that will
> > allow modifying filesystems so that they will work with a proposed
> > change to move d_alloc_paralle() out from the parent i_rw_sem lock.
> >
> > If these can land in a non-rebasing tree, I can work with individual
> > filesystem maintainers to start using these APIs.
> >
> > I haven't included d_alloc_noblock_return() as it is only needed for one
> > fs (ovl) and it is not yet clear that it is the best approach.
> >
> > I also haven't included the change to d_alloc_name() as that is only
> > needed so that I can deprecate d_alloc() and there is no rush for that.
> >
> > Patch 2/7 is exactly the patch Al proposed in the conversation for v3.
> > I have taken the libery of adding a Signed-off-by from Al to match the
> > Co-developed-by. I hope that was not inappropriate.
> >
> > I have been testing this series over NFS mounts from XFS so patches 2
> > and 3 don't seem to be causing any problems. The changes in 4/5/6/7
> > won't be tested by this, and some cannot be tested until filesystems
> > start using new interfaces.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > [PATCH v4 1/7] VFS: fix various typos in documentation for
> > [PATCH v4 2/7] VFS: use wait_var_event for waiting in
> > [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup
> > [PATCH v4 4/7] VFS: introduce d_alloc_noblock()
> > [PATCH v4 5/7] VFS: add d_duplicate()
> > [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag.
> > [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across
>
> I pointed Claude at the version of this in your tree and it spotted a
> regression that I think looks legitimate:
>
> 2. Lock imbalance on early return: The parent lock is dropped unconditionally before
> d_alloc_parallel()/d_alloc(), but three early return paths exit without reacquiring it:
> - IS_ERR(found) from d_alloc_parallel()
> - !d_in_lookup(found) from d_alloc_parallel()
> - !found from d_alloc()
>
> The callers (lookup_slow(), lookup_slow_killable()) unconditionally call inode_unlock_shared()
> after ->lookup() returns. If d_add_ci() returns without the lock held, the caller unlocks an
> unheld rwsem — corrupting its state.

Thanks for that - yes that was careless.

The unlock/relock is only needed around d_alloc_parallel() so I've put
it there which make the problem go away.

I've updated the github repp.

New patch below.

Thanks,
NeilBrown

From: NeilBrown <neil@xxxxxxxxxx>
Subject: [PATCH] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in
d_add_ci()

A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.

We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.

So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.

Signed-off-by: NeilBrown <neil@xxxxxxxxxx>
---
Documentation/filesystems/porting.rst | 7 +++++++
fs/dcache.c | 21 +++++++++++++++++++--
fs/ntfs/namei.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
include/linux/dcache.h | 3 ++-
5 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 5cc6ae19845c..146720fc9f6f 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1391,3 +1391,10 @@ either form of manual loop.
**mandatory**

d_alloc_parallel() no longer requires a waitqueue_head.
+
+---
+
+**mandatory**
+
+d_add_ci() must now be passed the flags arguemnt that was given to ->lookup
+
diff --git a/fs/dcache.c b/fs/dcache.c
index 1943607f7547..665ce74eaadc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2275,6 +2275,7 @@ EXPORT_SYMBOL(d_obtain_root);
* @dentry: the negative dentry that was passed to the parent's lookup func
* @inode: the inode case-insensitive lookup has found
* @name: the case-exact name to be associated with the returned dentry
+ * @lookup_flags: flags passed to ->lookup
*
* This is to avoid filling the dcache with case-insensitive names to the
* same inode, only the actual correct case is stored in the dcache for
@@ -2287,7 +2288,7 @@ EXPORT_SYMBOL(d_obtain_root);
* the exact case, and return the spliced entry.
*/
struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
- struct qstr *name)
+ struct qstr *name, unsigned int lookup_flags)
{
struct dentry *found, *res;

@@ -2301,7 +2302,23 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return found;
}
if (d_in_lookup(dentry)) {
+ /*
+ * We are holding parent lock and so don't want to wait
+ * for a d_in_lookup() dentry. We can safely drop the
+ * parent lock and reclaim it as we have exclusive
+ * access to dentry as it is d_in_lookup() (so
+ * ->d_parent is stable) and we are near the end
+ * ->lookup() and will shortly drop the lock anyway.
+ */
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_unlock_shared(d_inode(dentry->d_parent));
+ else
+ inode_unlock(d_inode(dentry->d_parent));
found = d_alloc_parallel(dentry->d_parent, name);
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_lock_shared(d_inode(dentry->d_parent));
+ else
+ inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
if (IS_ERR(found) || !d_in_lookup(found)) {
iput(inode);
return found;
@@ -2311,7 +2328,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
if (!found) {
iput(inode);
return ERR_PTR(-ENOMEM);
- }
+ }
}
res = d_splice_alias(inode, found);
if (res) {
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 10894de519c3..e2f3430c2e6d 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -310,7 +310,7 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
}
nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);

- dent = d_add_ci(dent, dent_inode, &nls_name);
+ dent = d_add_ci(dent, dent_inode, &nls_name, flags);
kfree(nls_name.name);
return dent;

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 325c2200c501..db0beb3831a9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -369,7 +369,7 @@ xfs_vn_ci_lookup(
/* else case-insensitive match... */
dname.name = ci_name.name;
dname.len = ci_name.len;
- dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+ dentry = d_add_ci(dentry, VFS_I(ip), &dname, flags);
kfree(ci_name.name);
return dentry;
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b4663a1a0636..9553bffbb098 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -263,7 +263,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+ unsigned int);
extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name);
extern struct dentry *d_find_any_alias(struct inode *inode);
--
2.50.0.107.gf914562f5916.dirty