Fwd: kernel 6.4/6.5 nfs 4.1 unresponsive

From: Bagas Sanjaya
Date: Wed Aug 23 2023 - 07:14:15 EST


Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I have two Synology Disk station NAS devices with NFS mounts present on Gentoo servers with the following fstab mount configuration:
>
> 10.200.1.247:/volume1/filer02-sata /mnt/filer02-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0
> 10.200.1.247:/volume1/filer03-sata /mnt/filer03-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0
> 10.200.1.246:/volume1/filer04-sata /mnt/filer04-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0
>
>
> On Linux Kernel 6.3.6 these work perfectly fine.
>
> As soon as I upgrade to 6.4 (tested 6.4.7 through 6.4.11) or 6.5-rc7 NFS mounts randomly hang and block system operation with high load times eventually resulting in a system freeze.
>
> dmesg/syslog:
>
> Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
> Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK
>
>
> The box in question i have been testing the kernel upgrades on has 1 x 10G NIC set with MTU 9000 for NFS volumes and i can successfully ping the nfs host with 9000 byte packets:
>
> sjc-www2 ~ # ping -4 -s 9000 10.200.1.247
> PING 10.200.1.247 (10.200.1.247) 9000(9028) bytes of data.
> 9008 bytes from 10.200.1.247: icmp_seq=1 ttl=64 time=0.205 ms
> 9008 bytes from 10.200.1.247: icmp_seq=2 ttl=64 time=0.279 ms
> 9008 bytes from 10.200.1.247: icmp_seq=3 ttl=64 time=0.402 ms

See Bugzilla for the full thread.

Anyway, I'm adding this regression to be tracked by regzbot:

#regzbot introduced: v6.3..v6.4 https://bugzilla.kernel.org/show_bug.cgi?id=217815
#regzbot title: nfs server not responding loop on Synology NAS devices

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217815

--
An old man doll... just what I always wanted! - Clara