Re: [PATCH v2 2/2] nfs: update inode ctime after removexattr operation
From: Jeff Layton
Date: Tue Mar 31 2026 - 07:52:10 EST
On Fri, 2026-03-27 at 11:22 -0700, Thomas Haynes wrote:
> On Fri, Mar 27, 2026 at 12:59:54PM -0800, Jeff Layton wrote:
> > On Fri, 2026-03-27 at 12:20 -0400, Olga Kornievskaia wrote:
> > > On Fri, Mar 27, 2026 at 11:50 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, 2026-03-27 at 11:11 -0400, Olga Kornievskaia wrote:
> > > > > On Tue, Mar 24, 2026 at 1:32 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > xfstest generic/728 fails with delegated timestamps. The client does a
> > > > > > removexattr and then a stat to test the ctime, which doesn't change. The
> > > > > > stat() doesn't trigger a GETATTR because of the delegated timestamps, so
> > > > > > it relies on the cached ctime, which is wrong.
> > > > > >
> > > > > > The setxattr compound has a trailing GETATTR, which ensures that its
> > > > > > ctime gets updated. Follow the same strategy with removexattr.
> > > > >
> > > > > This approach relies on the fact that the server the serves delegated
> > > > > attributes would update change_attr on operations which might now
> > > > > necessarily happen (ie, linux server does not update change_attribute
> > > > > on writes or clone). I propose an alternative fix for the failing
> > > > > generic/728.
> > > > >
> > > > > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > > > > index 7b3ca68fb4bb..ede1835a45b3 100644
> > > > > --- a/fs/nfs/nfs42proc.c
> > > > > +++ b/fs/nfs/nfs42proc.c
> > > > > @@ -1389,7 +1389,13 @@ static int _nfs42_proc_removexattr(struct inode
> > > > > *inode, const char *name)
> > > > > &res.seq_res, 1);
> > > > > trace_nfs4_removexattr(inode, name, ret);
> > > > > if (!ret)
> > > > > - nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
> > > > > + if (nfs_have_delegated_attributes(inode)) {
> > > > > + nfs_update_delegated_mtime(inode);
> > > > > + spin_lock(&inode->i_lock);
> > > > > + nfs_set_cache_invalid(inode, NFS_INO_INVALID_BLOCKS);
> > > > > + spin_unlock(&inode->i_lock);
> > > > > + } else
> > > > > + nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
> > > > >
> > > > > return ret;
> > > > > }
> > > > >
> > > >
> > > > What's the advantage of doing it this way?
> > > >
> > > > You just sent a REMOVEXATTR operation to the server that will change
> > > > the mtime there. The server has the most up-to-date version of the
> > > > mtime and ctime at that point.
> > >
> > > In presence of delegated attributes, Is the server required to update
> > > its mtime/ctime on an operation? As I mentioned, the linux server does
> > > not update its ctime/mtime for WRITE, CLONE, COPY.
> > >
> > > Is possible that
> > > some implementations might be different and also do not update the
> > > ctime/mtime on REMOVEXATTR?
> > >
> > > Therefore I was suggesting that the patch
> > > relies on the fact that it would receive an updated value. Of course
> > > perhaps all implementations are done the same as the linux server and
> > > my point is moot. I didn't see anything in the spec that clarifies
> > > what the server supposed to do (and client rely on).
> > >
> >
> > (cc'ing Tom)
> >
> > That is a very good point.
> >
> > My interpretation was that delegated timestamps generally covered
> > writes, but SETATTR style operations that do anything beyond only
> > changing the mtime can't be cached.
> >
> > We probably need some delstid spec clarification: for what operations
> > is the server required to disable timestamp updates when a write
> > delegation is outstanding?
> >
> > In the case of nfsd, we disable timestamp updates for WRITE/COPY/CLONE
> > but not SETATTR/SETXATTR/REMOVEXATTR.
> >
> > How does the Hammerspace anvil behave? Does it disable c/mtime updates
> > for writes when there is an outstanding timestamp delegation like we're
> > doing in nfsd? If so, does it do the same for
> > SETATTR/SETXATTR/REMOVEXATTR operations as well?
>
> Jeff,
>
> I think the right way to look at this is closer to how size is
> handled under delegation in RFC8881, rather than as a per-op rule.
>
> In our implementation, because we are acting as an MDS and data I/O
> goes to DSes, we already treat size as effectively delegated when
> a write layout is outstanding. The MDS does not maintain authoritative
> size locally in that case. We may refresh size/timestamps internally
> (e.g., on GETATTR by querying DSes), but we don’t treat that as
> overriding the delegated authority.
>
> For timestamps, our behavior is effectively the same model. When
> the client holds the relevant delegation, the server does not
> consider itself authoritative for ctime/mtime. If current values
> are needed, we can obtain them from the client (e.g., via CB_GETATTR),
> and the client must present the delegation stateid to demonstrate
> that authority. So the authority follows the delegation, not the
> specific operation.
>
> That said, I don’t think we’ve fully resolved the semantics for all
> metadata-style ops either. WRITE and SETATTR are clear in our model,
> but for things like CLONE/COPY/SETXATTR/REMOVEXATTR, we’ve likely
> been relying on assumptions rather than a fully consistent rule.
> I.e., CLONE and COPY we just pass through to the DS and we don't
> implement SETXATTR/REMOVEXATTR.
>
> So the spec question, as I see it, is not whether REMOVEXATTR (or
> any particular op) should update ctime/mtime, but whether delegated
> timestamps are meant to follow the same attribute-authority model
> as delegated size in RFC8881. If so, then we expect that the server
> should query the client via CB_GETATTR to return updated ctime/mtime
> after such operations while the delegation is outstanding.
>
The dilemma we have is: because we _do_ allow local processes to stat()
files that have an outstanding write delegation, we can never allow the
ctime in particular to roll backward (modulo clock jumps).
If we're dealing with changes that have been cached in the client and
are being lazily flushed out, then we can't update the timestamp when
that operation occurs. The time of the RPC to flush the changes will
almost certainly be later than the cached timestamps on the client that
will eventually be set, so when the client comes back we'd end up
violating the rollback rule.
Our only option is to freeze timestamp updates on anything that might
represent such an operation. So far, we only do that on WRITE and COPY
operations -- in general, operations that require an open file, since
FMODE_NOCMTIME is attached to the file.
Some SETATTRs that only update the mtime and atime can be cached on the
client by virtue of the fact that it's authoritative for timestamps.
There are some exceptions though:
- atime-only updates can't be cached since the ctime won't change with
a timestamp update if the mtime didn't change
- if you set the mtime to a time that is later than the time you got
the delegation from the server, but earlier than the current time, you
can't cache that. The ctime would be later than the mtime in that case,
and we don't have a mechanism to handle that in a delegated timestamp
SETATTR.
I don't see how you could reasonably buffer a SETXATTR or REMOVEXATTR
operation to be sent later. These need to be done synchronously since
they could always fail for some reason and we don't have a mechanism at
the syscall layer to handle a deferred error. They also only update the
ctime and not the mtime, and we have no mechanism to do that with
delegated timestamps.
Based on that, I think the client and server both need to ignore the
timestamp delegation on a SETXATTR or REMOVEXATTR. The server should
update the ctime and the client needs to send a trailing GETATTR on the
REMOVEXATTR compound in order to get it and the change attr.
Exactly what this patch does, fwiw...
--
Jeff Layton <jlayton@xxxxxxxxxx>