[BUG]nfs_writepages may loop forever with -EBADF after state recovery failure

From: Li Lingfeng

Date: Tue Mar 17 2026 - 02:32:06 EST

We have encountered an issue where the NFS client gets stuck in an
infinite loop in nfs_writepages after a server restart and state recovery
failure. This causes mount operations to hang because the superblock lock
is held by the looping writeback process.

Problem Description:
When the NFS server is restarted, the client's state manager attempts to
reclaim open files. If the server returns errors such as EROFS, EIO, or
ESTALE during reclamation, the affected file's state is marked as bad (via
nfs4_state_mark_open_context_bad). Subsequently, when the writeback work
(wb_workfn) tries to flush dirty pages for that inode, nfs_writepages
enters a loop because nfs_page_create returns -EBADF, and nfs_writepages
does not treat -EBADF as a fatal error, so it retries indefinitely.

The call chain is:
nfs4_do_reclaim
nfs4_reclaim_open_state
__nfs4_reclaim_open_state // get -ESTALE
nfs4_open_reclaim // ops->recover_open
nfs4_do_open_reclaim
_nfs4_do_open_reclaim
nfs4_open_recover
nfs4_open_recover_helper // return -ESTALE
nfs4_opendata_to_nfs4_state
_nfs4_opendata_reclaim_to_nfs4_state
nfs_refresh_inode
nfs4_state_mark_recovery_failed
nfs4_state_mark_open_context_bad
set_bit // NFS_CONTEXT_BAD

wb_workfn
wb_do_writeback
wb_writeback
writeback_sb_inodes
__writeback_single_inode
do_writepages
nfs_writepages // loop here
write_cache_pages
nfs_writepages_callback
nfs_do_writepage
nfs_page_async_flush
nfs_pageio_add_request
nfs_pageio_add_request_mirror
__nfs_pageio_add_request
nfs_create_subreq
nfs_page_create // return -EBADF

nfs_writepages retries the loop as long as the error is not fatal
according to nfs_error_is_fatal(). Since -EBADF is not considered fatal,
it keeps retrying forever. This prevents the superblock lock from being
released, causing any concurrent mount operation to hang.

Steps to Reproduce:
We have a reliable reproducer on a recent kernel (Linux 7.0-rc4, commit
2d1373e4246da3b58e1df058374ed6b101804e07).

1) Prepare a server with an export:
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt/sdb
echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
systemctl restart nfs-server
dd if=/dev/random of=/mnt/sdb/testfile bs=1k count=4 oflag=direct

2) On the client, mount the export and start a writer that holds a file
open and creates dirty pages:
mount -t nfs -o rw,vers=4.1,rsize=1024,wsize=1024 127.0.0.1:/sdb /mnt/sdbb

Run the following Python script in one terminal:
import os, time
fd = os.open("/mnt/sdbb/testfile", os.O_CREAT|os.O_WRONLY|os.O_TRUNC, 0o644)
buf = b'A' * 4096
for i in range(1024): # ~1GB
os.write(fd, buf)
print("dirty pages created, fd kept open, sleeping...")
time.sleep(10**9)

3) In another terminal, restart the server and wipe the underlying
filesystem to force ESTALE:
systemctl stop nfs-server
umount /dev/sdb
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt/sdb
echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
systemctl restart nfs-server

Temporary Workaround:
We have applied the following patch to break the loop by treating -EBADF
as fatal in nfs_writepages
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index dc57e67cefcd..0147f7a7a1a3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -781,7 +781,7 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
&pgio);
pgio.pg_error = 0;
nfs_pageio_complete(&pgio);
- } while (err < 0 && !nfs_error_is_fatal(err));
+ } while (err < 0 && !nfs_error_is_fatal(err) && (err != -EBADF));
nfs_io_completion_put(ioc);

if (err < 0)

While the patch above avoids the hang, we wonder if a more comprehensive
fix is needed. For instance, perhaps nfs_error_is_fatal() should include
-EBADF in its fatal list, or the state manager should actively abort
pending I/O for contexts marked bad. We are not sure whether -EBADF should
always be considered fatal in writeback paths.

We would appreciate your insights and any suggestions for a proper fix.

Thanks,
Lingfeng