knfsd flakiness

Wai-Sun Chia (waisun.chia@msa.dec.com)
Mon, 19 Apr 1999 11:38:16 +0800


Folks,
This morning I came into office and found our nfs master totally dead.
Dead in terms of nis, nfs, sendmail, inetd, etc. i.e. almost all
services are down eventhough host is still ping'able..

When we try to manually restart it says something like "named cannot
start due to insufficient memory...". Reluctantly I rebooted the
machine. And everything went back to normal.

But even then, "normal" still means I get a lot of scary nfs errors (see
below). I'm running nis and user home directories are mounted to the
various user systems from this host.

I'm running Alpha/Debian 2.1/knfsd 1.2/kernel 2.2.5-ac6.

******************************* knfsd server
**********************************
Apr 18 15:20:46 host kernel: EXT2-fs warning (device sd(8,80)):
ext2_free_inode
: bit already cleared for inode 12352
Apr 18 15:20:46 slinky kernel: lookup_by_inode: ino 12352 not found in
mqueue
Apr 18 15:20:46 slinky kernel: find_fh_dentry: 08:50/12352 dir/12289 not
found!
Apr 18 15:20:46 slinky kernel: nfs_revalidate_inode: mqueue/dfPAA14186
getattr fa
iled, ino=12352, error=-2 Apr 19 11:17:27 slinky kernel: fh_verify:
mqueue/xfLAA30833 permission failure, acc=2, error=13
Apr 19 11:19:49 slinky kernel: __nfs_fhget: inode 12367 busy, i_count=2,
i_nlink=1
Apr 19 11:19:49 slinky kernel: nfs_free_dentries: found
mqueue/qfLAA30829, d_count=0, hashed=1
Apr 19 11:19:49 slinky kernel: nfs_dentry_delete: mqueue/qfLAA30829:
ino=12367, count=2, nlink=1
Apr 19 11:20:09 slinky kernel: EXT2-fs warning (device sd(8,80)):
ext2_free_inode: bit already cleared for inode 12352
Apr 19 11:17:27 slinky kernel: fh_verify: mqueue/xfLAA30833 permission
failure, acc=2, error=13
Apr 19 11:19:49 slinky kernel: __nfs_fhget: inode 12367 busy, i_count=2,
i_nlink=1
Apr 19 11:19:49 slinky kernel: nfs_free_dentries: found
mqueue/qfLAA30829, d_count=0, hashed=1
Apr 19 11:19:49 slinky kernel: nfs_dentry_delete: mqueue/qfLAA30829:
ino=12367, count=2, nlink=1
Apr 19 11:20:09 slinky kernel: EXT2-fs warning (device sd(8,80)):
ext2_free_inode: bit already cleared for inode 12352
Apr 19 11:20:09 slinky kernel: lookup_by_inode: ino 12352 not found in
mqueue
Apr 19 11:20:09 slinky kernel: find_fh_dentry: 08:50/12352 dir/12289 not
found!
Apr 19 11:25:08 slinky kernel: fh_verify: mqueue/xfLAA30841 permission
failure, acc=2, error=13
Apr 19 11:26:23 slinky kernel: fh_verify: mqueue/xfLAA08300 permission
failure, acc=2, error=13
Apr 19 11:27:17 slinky kernel: fh_verify: mqueue/xfLAA30845 permission
failure, acc=2, error=13

***************************** knfsd client
************************************

Apr 18 07:17:20 client kernel: nfs_revalidate_inode: waisun/.kderc
getattr failed, ino=51223, error=-70
Apr 18 07:17:20 client kernel: NFS: bad fh
0000000000000000000000000000000000000000000000000000000000000000
Apr 18 07:17:20 client kernel:
feebbaca000000000000c8170000c80100000830000008300000000200000001
Apr 18 07:17:20 client kernel: nfs_revalidate_inode: waisun/.kderc
getattr failed, ino=51223, error=-70
Apr 18 07:17:20 client kernel: NFS: bad fh
0000000000000000000000000000000000000000000000000000000000000000
Apr 18 07:17:20 client kernel: Apr 19 07:45:50 nexus kernel:
nfs_revalidate_inode: waisun/.kderc getattr failed
, ino=51223, error=-13
Apr 19 07:45:58 nexus last message repeated 2451 times
Apr 19 07:45:58 nexus kernel: nfs_revalidate_inode: /// getattr failed,
ino=2, e
rror=-13
Apr 19 07:45:58 nexus kernel: nfs_revalidate_inode: waisun/.kderc
getattr failed
, ino=51223, error=-13
Apr 19 07:46:28 nexus last message repeated 8700 times
Apr 19 07:47:30 nexus last message repeated 19366 times
Apr 19 07:48:31 nexus last message repeated 21296 times
Apr 19 07:49:32 nexus last message repeated 21424 times
Apr 19 07:50:33 nexus last message repeated 21449 times
Apr 19 07:51:34 nexus last message repeated 21452 times
Apr 19 07:52:35 nexus last message repeated 21396 times
Apr 19 07:53:36 nexus last message repeated 21354 times
Apr 19 07:54:37 nexus last message repeated 21418 times
Apr 19 07:55:38 nexus last message repeated 21193 times
Apr 19 07:56:39 nexus last message repeated 21287 times
Apr 19 07:57:40 nexus last message repeated 21164 times
Apr 19 07:58:40 nexus last message repeated 21008 times

This particular client busted its /var partition and froze, due to the
logs.

Initially I was running userland nfsd and has problems with locking.
After consulting the list, some of gurus recommended going over to knfsd
as only it provides nfs locking. Now the lock related problems have
dissapeared only to be replace by this problem.

Any ideas? This is a fairly busy environment with about 300 user
population.

-- 
Wai-Sun "squidster" Chia

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/