NFS corruption revisited

Jimmie Mayfield (mayfield@sackheads.org)
Wed, 15 Sep 1999 08:13:46 -0700


Hi all. Not to beat a dead horse, but I'm also running into problems with
NFS corruption while compiling programs in an NFS-mounted directory.
The corruption does not occur when building locally.

Setup:
NFS server: AIX 4.3.2 box (PPC 604e) with the most recent
NFS server patches.

NFS client: Debian Potato, 2.2.12 kernel, running on a P-II 400.

Mount params: sync, rsize=8192, wsize=8192, rw, nosuid

The following is an example of an executable built in an NFS directory
and the same executable built locally (this is just a portion of the
mismatches):

OFFSET NFS LOCAL
-------------------------
0x1ffd: 0xff 0xff
0x1ffe: 0xff 0xff
0x1fff: 0x89 0x89
0x2000: 0x00 0xd0 <---- corruption begins here
0x2001: 0x00 0x50
0x2002: 0xd0 0xe8
0x2003: 0x50 0x5d
0x2004: 0xe8 0xe6
0x2005: 0x5d 0xff
0x2006: 0xe6 0xff
0x2007: 0xff 0x83

The following is a ethereal dump of the NFS traffic for the particular
bytes above:

packet #303:
04b0 89 c0 50 8d 93 f0 fe ff ff 89 d0 50 8b 93 4c 00 ..P........P..L.
04c0 00 00 89 d0 50 e8 ae e6 ff ff 83 c4 0c 8d 93 07 ....P...........
04d0 ff ff ff 89 00 00 ......
^^^^^
packet #305:
00a0 d0 50 e8 5d e6 ff ff 83 c4 04 89 c0 50 8d ...P.]........P.
00b0 93 07 ff ff ff 89 d0 50 8b 93 4c 00 00 00 8d 42 .......P..L....B

Notice the NULL pair in packet #303 at offset 0x04d4

Observations:
1) the kernel seems to be inserting NULL bytes. The number of bytes
inserted is usually 1 or 2.

2) it occurs at a page boundary. in other cases, I think I've seen the
corruption occur a few bytes before the boundary.

3) everything seems to work okay for a short time after initially mounting
the filesystem. I don't have exact number but it seemed
on the order of 30 minutes. So perhaps the problem is somehow triggered
when the amount of data that has been transferred via NFS collectively
reaches a threshhold?

4) Changing NFS read/write block sizes didn't solve the problem

5) Though this example shows a corrupted executable, I've seen corruption
at the individual object file level too.

If additional data is required, please let me know.

Jimmie

-- 
Jimmie Mayfield  
http://www.sackheads.org/mayfield       email: mayfield+usenet@sackheads.org
My mail provider does not welcome UCE -- http://www.sackheads.org/uce

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/