[BUG]nfs_writepages may loop forever with -EBADF after state recovery failure

From: Li Lingfeng

Date: Tue Mar 17 2026 - 02:32:06 EST


We have encountered an issue where the NFS client gets stuck in an
infinite loop in nfs_writepages after a server restart and state recovery
failure. This causes mount operations to hang because the superblock lock
is held by the looping writeback process.

Problem Description:
When the NFS server is restarted, the client's state manager attempts to
reclaim open files. If the server returns errors such as EROFS, EIO, or
ESTALE during reclamation, the affected file's state is marked as bad (via
nfs4_state_mark_open_context_bad). Subsequently, when the writeback work
(wb_workfn) tries to flush dirty pages for that inode, nfs_writepages
enters a loop because nfs_page_create returns -EBADF, and nfs_writepages
does not treat -EBADF as a fatal error, so it retries indefinitely.

The call chain is:
nfs4_do_reclaim
 nfs4_reclaim_open_state
  __nfs4_reclaim_open_state // get -ESTALE
   nfs4_open_reclaim // ops->recover_open
    nfs4_do_open_reclaim
     _nfs4_do_open_reclaim
      nfs4_open_recover
       nfs4_open_recover_helper // return -ESTALE
        nfs4_opendata_to_nfs4_state
         _nfs4_opendata_reclaim_to_nfs4_state
          nfs_refresh_inode
  nfs4_state_mark_recovery_failed
   nfs4_state_mark_open_context_bad
    set_bit // NFS_CONTEXT_BAD

wb_workfn
 wb_do_writeback
  wb_writeback
   writeback_sb_inodes
    __writeback_single_inode
     do_writepages
      nfs_writepages // loop here
       write_cache_pages
        nfs_writepages_callback
         nfs_do_writepage
          nfs_page_async_flush
           nfs_pageio_add_request
            nfs_pageio_add_request_mirror
             __nfs_pageio_add_request
              nfs_create_subreq
               nfs_page_create // return -EBADF

nfs_writepages retries the loop as long as the error is not fatal
according to nfs_error_is_fatal(). Since -EBADF is not considered fatal,
it keeps retrying forever. This prevents the superblock lock from being
released, causing any concurrent mount operation to hang.

Steps to Reproduce:
We have a reliable reproducer on a recent kernel (Linux 7.0-rc4, commit
2d1373e4246da3b58e1df058374ed6b101804e07).

1) Prepare a server with an export:
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt/sdb
echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
systemctl restart nfs-server
dd if=/dev/random of=/mnt/sdb/testfile bs=1k count=4 oflag=direct

2) On the client, mount the export and start a writer that holds a file
open and creates dirty pages:
mount -t nfs -o rw,vers=4.1,rsize=1024,wsize=1024 127.0.0.1:/sdb /mnt/sdbb

Run the following Python script in one terminal:
import os, time
fd = os.open("/mnt/sdbb/testfile", os.O_CREAT|os.O_WRONLY|os.O_TRUNC, 0o644)
buf = b'A' * 4096
for i in range(1024):  # ~1GB
    os.write(fd, buf)
print("dirty pages created, fd kept open, sleeping...")
time.sleep(10**9)

3) In another terminal, restart the server and wipe the underlying
filesystem to force ESTALE:
systemctl stop nfs-server
umount /dev/sdb
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt/sdb
echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
systemctl restart nfs-server

Temporary Workaround:
We have applied the following patch to break the loop by treating -EBADF
as fatal in nfs_writepages
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index dc57e67cefcd..0147f7a7a1a3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -781,7 +781,7 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
                                        &pgio);
                pgio.pg_error = 0;
                nfs_pageio_complete(&pgio);
-       } while (err < 0 && !nfs_error_is_fatal(err));
+       } while (err < 0 && !nfs_error_is_fatal(err) && (err != -EBADF));
        nfs_io_completion_put(ioc);

        if (err < 0)

While the patch above avoids the hang, we wonder if a more comprehensive
fix is needed. For instance, perhaps nfs_error_is_fatal() should include
-EBADF in its fatal list, or the state manager should actively abort
pending I/O for contexts marked bad. We are not sure whether -EBADF should
always be considered fatal in writeback paths.

We would appreciate your insights and any suggestions for a proper fix.

Thanks,
Lingfeng