Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
From: 郭玲兴
Date: Sun May 17 2026 - 21:08:50 EST
Hi Rick, hi Lionel
Below are the environment details.
Server:
Windows Server 2022
Version 10.0.20348.587
User/account setup:
No user mapping is configured.
No AD, LDAP, or passwd-based mapping is used.
Unmapped users are handled by the default "Everyone" account.
Authentication:
sec=sys (AUTH_SYS), as reported by nfsstat -m
Architecture:
Linux clients: x86_64
Windows server: x86_64
Memory:
Each Linux client VM has 16 GB RAM
We also observed the following on two independent clients:
Client A:
age: 498061
lease_time: 120
lease_expired: 497941
Client B:
age: 69598
lease_time: 120
lease_expired: 69478
In both cases, lease_expired is approximately equal to
age - lease_time, which suggests that the lease expired
shortly after mount and was not renewed afterward.
At hang time:
- both clients hang under concurrent workload
- both clients are blocked in nfs4_drain_slot_tbl
- no NFS RPC traffic is observed, only TCP ACKs
- nfsstat reports retrans=0
- on the Windows server side, the session state is reported
as "Initialized"
We are tracing the RPC lifecycle to identify which RPC does
not complete.
Regarding the "soft" mount option: understood. We will retest
with a hard mount as well.
One question is whether the observed behavior is expected.
Even if a soft mount contributes to the problem, is it expected
that a single RPC timeout can leave the client in a state with
no forward progress, blocked in nfs4_drain_slot_tbl, and with
lease renewal no longer occurring? Or would that more likely
indicate a client-side recovery bug?
Thanks,
Guo Lingxing
> -----原始邮件-----
> 发件人: "Rick Macklem" <rick.macklem@xxxxxxxxx>
> 发送时间:2026-05-16 22:23:51 (星期六)
> 收件人: "Lionel Cons" <lionelcons1972@xxxxxxxxx>
> 抄送: 郭玲兴 <guolingxing@xxxxxxxxxx>, linux-nfs@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
> 主题: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
>
> On Wed, May 6, 2026 at 6:32 AM Lionel Cons <lionelcons1972@xxxxxxxxx> wrote:
> >
> > On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@xxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > >
> > > We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
> > >
> > >
> > > Environment:
> > > - Two independent Linux clients (VMs)
> > > - Both mount the same Windows NFS server (NFSv4.1)
> > > - Kernel version: 6.1.78
> > > - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
> Just fyi, "soft" mounts are often going to be troublesome for NFSv4.1.
> (Whenever an RPC times out and doesn't wait for a reply from the server,
> it will leave a session slot messed up.)
>
> rick
>
> >
> > Which version of WindowsServer do you use, e.g what does the "ver"
> > command in cmd.exe output? How did you set up the user accounts, and
> > which authentication (AUTH_SYS, GSS, ...) do you use?
> > Which CPU architecture do you use? How much memory do you have on the
> > Linux NFS client?
> >
> > Lionel
> >