nfs data corruption and other problems in 2.2.12 (as client)

Mark Buda (hermit@clark.net)
10 Sep 1999 13:39:45 -0400


Under 2.2.12, this perl script causes strange NFS problems when its
output is redirected to a file on an NFS-mounted filesystem.

#!/usr/bin/perl
$| = 1;
print "x" x 58;
if (fork) {
wait;
} else {
print "x" x 4096;
print "x" x 4039;
exit 0;
}

With a Solaris 2.6 server, there is silent data corruption, generally
taking the form of one or two NULs and one or two missing characters -
as though the offset in the NFS write request was wrong sometimes. It
seems that when the file system is first mounted, it will work okay a
couple times, and then start corrupting the files. Unmounting and
remounting the file system returns it to normal momentarily.

If you take the fork out, it appears to stop malfunctioning.

Through the magic of

echo 8 > /proc/sys/sunrpc/nfs_debug

I have obtained the following for a correct run:
Sep 10 12:48:48 pazuzu kernel: NFS: nfs_updatepage(faked/6 58@0, sync=0)
Sep 10 12:48:48 pazuzu kernel: NFS: find_write_request(5/76082, c02e4f18)
Sep 10 12:48:48 pazuzu kernel: NFS: create_write_request(faked/6, 0+58)
Sep 10 12:48:48 pazuzu kernel: NFS: append_write_request(c103d71c, c4fd9830)
Sep 10 12:48:48 pazuzu kernel: NFS: 1349 schedule_write_request (async)
Sep 10 12:48:48 pazuzu kernel: NFS: nfs_updatepage(faked/6 4038@58, sync=0)
Sep 10 12:48:48 pazuzu kernel: NFS: find_write_request(5/76082, c02e4f18)
Sep 10 12:48:48 pazuzu kernel: NFS: create_write_request(faked/6, 58+4038)
Sep 10 12:48:48 pazuzu kernel: NFS: append_write_request(c103d71c, c4fd9720)
Sep 10 12:48:48 pazuzu kernel: NFS: 1350 schedule_write_request (async)
Sep 10 12:48:48 pazuzu kernel: NFS: nfs_updatepage(faked/6 58@4096, sync=0)
Sep 10 12:48:48 pazuzu kernel: NFS: find_write_request(5/76082, c030fe48)
Sep 10 12:48:48 pazuzu kernel: NFS: create_write_request(faked/6, 4096+58)
Sep 10 12:48:48 pazuzu kernel: NFS: append_write_request(c103d71c, c4fd9d80)
Sep 10 12:48:48 pazuzu kernel: NFS: 1351 schedule_write_request (async)
Sep 10 12:48:48 pazuzu kernel: NFS: 1349 nfs_wback_begin (faked/6, status=0 flags=0)
Sep 10 12:48:48 pazuzu kernel: NFS: 1350 nfs_wback_begin (faked/6, status=0 flags=0)
Sep 10 12:48:48 pazuzu kernel: NFS: nfs_updatepage(faked/6 4038@4154, sync=0)
Sep 10 12:48:48 pazuzu kernel: NFS: find_write_request(5/76082, c030fe48)
Sep 10 12:48:48 pazuzu kernel: nfs: trying to update write request c4fd9d80
Sep 10 12:48:48 pazuzu kernel: NFS: nfs_updatepage(faked/6 1@8192, sync=0)
Sep 10 12:48:48 pazuzu kernel: NFS: find_write_request(5/76082, c03445f8)
Sep 10 12:48:48 pazuzu kernel: NFS: create_write_request(faked/6, 8192+1)
Sep 10 12:48:48 pazuzu kernel: NFS: append_write_request(c103d71c, c4fd9c70)
Sep 10 12:48:48 pazuzu kernel: NFS: 1352 schedule_write_request (sync)
Sep 10 12:48:48 pazuzu kernel: NFS: 1352 nfs_wback_begin (faked/6, status=0 flags=0)
Sep 10 12:48:48 pazuzu kernel: NFS: 1351 nfs_wback_begin (faked/6, status=0 flags=0)
Sep 10 12:48:48 pazuzu kernel: NFS: 1349 nfs_wback_result (faked/6, status=0, flags=100)
Sep 10 12:48:48 pazuzu kernel: NFS: remove_write_request(c103d71c, c4fd9830)
Sep 10 12:48:48 pazuzu kernel: NFS: 1350 nfs_wback_result (faked/6, status=0, flags=100)
Sep 10 12:48:48 pazuzu kernel: NFS: remove_write_request(c103d71c, c4fd9720)
Sep 10 12:48:48 pazuzu kernel: NFS: 1352 nfs_wback_result (faked/6, status=0, flags=100)
Sep 10 12:48:48 pazuzu kernel: NFS: remove_write_request(c103d71c, c4fd9c70)
Sep 10 12:48:48 pazuzu kernel: NFS: 1351 nfs_wback_result (faked/6, status=0, flags=100)
Sep 10 12:48:48 pazuzu kernel: NFS: remove_write_request(c103d71c, c4fd9d80)

And this for an incorrect run:
Sep 10 12:48:50 pazuzu kernel: NFS: nfs_updatepage(faked/7 58@0, sync=0)
Sep 10 12:48:50 pazuzu kernel: NFS: find_write_request(5/76083, c0345a20)
Sep 10 12:48:50 pazuzu kernel: NFS: create_write_request(faked/7, 0+58)
Sep 10 12:48:50 pazuzu kernel: NFS: append_write_request(c103d82c, c4fd9d80)
Sep 10 12:48:50 pazuzu kernel: NFS: 1355 schedule_write_request (async)
Sep 10 12:48:50 pazuzu kernel: NFS: nfs_updatepage(faked/7 4038@58, sync=0)
Sep 10 12:48:50 pazuzu kernel: NFS: find_write_request(5/76083, c0345a20)
Sep 10 12:48:50 pazuzu kernel: NFS: create_write_request(faked/7, 58+4038)
Sep 10 12:48:50 pazuzu kernel: NFS: append_write_request(c103d82c, c4fd9c70)
Sep 10 12:48:50 pazuzu kernel: NFS: 1356 schedule_write_request (async)
Sep 10 12:48:50 pazuzu kernel: NFS: nfs_updatepage(faked/7 58@4096, sync=0)
Sep 10 12:48:50 pazuzu kernel: NFS: find_write_request(5/76083, c02c87a0)
Sep 10 12:48:50 pazuzu kernel: NFS: create_write_request(faked/7, 4096+58)
Sep 10 12:48:50 pazuzu kernel: NFS: append_write_request(c103d82c, c4fd9720)
Sep 10 12:48:50 pazuzu kernel: NFS: 1357 schedule_write_request (async)
Sep 10 12:48:50 pazuzu kernel: NFS: 1355 nfs_wback_begin (faked/7, status=0 flags=0)
Sep 10 12:48:50 pazuzu kernel: NFS: 1356 nfs_wback_begin (faked/7, status=0 flags=0)
Sep 10 12:48:50 pazuzu kernel: NFS: nfs_updatepage(faked/7 4038@4154, sync=0)
Sep 10 12:48:50 pazuzu kernel: NFS: find_write_request(5/76083, c02c87a0)
Sep 10 12:48:50 pazuzu kernel: nfs: trying to update write request c4fd9720
Sep 10 12:48:50 pazuzu kernel: NFS: nfs_updatepage(faked/7 1@8192, sync=0)
Sep 10 12:48:50 pazuzu kernel: NFS: find_write_request(5/76083, c03458b8)
Sep 10 12:48:50 pazuzu kernel: NFS: create_write_request(faked/7, 8192+1)
Sep 10 12:48:50 pazuzu kernel: NFS: append_write_request(c103d82c, c4fd9830)
Sep 10 12:48:50 pazuzu kernel: NFS: 1358 schedule_write_request (async)
Sep 10 12:48:50 pazuzu kernel: NFS: 1357 nfs_wback_begin (faked/7, status=0 flags=0)
Sep 10 12:48:50 pazuzu kernel: NFS: 1355 nfs_wback_result (faked/7, status=0, flags=100)
Sep 10 12:48:50 pazuzu kernel: NFS: remove_write_request(c103d82c, c4fd9d80)
Sep 10 12:48:50 pazuzu kernel: NFS: 1356 nfs_wback_result (faked/7, status=0, flags=100)
Sep 10 12:48:50 pazuzu kernel: NFS: remove_write_request(c103d82c, c4fd9c70)
Sep 10 12:48:50 pazuzu kernel: NFS: 1357 nfs_wback_result (faked/7, status=0, flags=100)
Sep 10 12:48:50 pazuzu kernel: NFS: remove_write_request(c103d82c, c4fd9720)
Sep 10 12:48:50 pazuzu kernel: NFS: 1358 nfs_wback_begin (faked/7, status=0 flags=0)
Sep 10 12:48:50 pazuzu kernel: NFS: 1358 nfs_wback_result (faked/7, status=0, flags=100)
Sep 10 12:48:50 pazuzu kernel: NFS: remove_write_request(c103d82c, c4fd9830)

Note that the correct run has schedule_write_request() deciding that
it should do a synchronous RPC call the fourth time, and in the
incorrect run it doesn't.

Talking to a Linux 2.0.36 box, I get the following (but it takes a
very long time). (I also haven't gotten it to fail talking to the
2.0.36 NFS server, but that's mainly because I haven't the
patience. It could very well be I'm having some other unrelated
problem here.)

Sep 10 13:25:59 pazuzu kernel: NFS: nfs_updatepage(//linux 58@0, sync=0)
Sep 10 13:25:59 pazuzu kernel: NFS: find_write_request(4/151992398, c0344b98)
Sep 10 13:25:59 pazuzu kernel: NFS: create_write_request(//linux, 0+58)
Sep 10 13:25:59 pazuzu kernel: NFS: append_write_request(c1033a4c, c4fd9d80)
Sep 10 13:25:59 pazuzu kernel: NFS: 1447 schedule_write_request (async)
Sep 10 13:25:59 pazuzu kernel: NFS: nfs_updatepage(//linux 4038@58, sync=0)
Sep 10 13:25:59 pazuzu kernel: NFS: find_write_request(4/151992398, c0344b98)
Sep 10 13:25:59 pazuzu kernel: NFS: create_write_request(//linux, 58+4038)
Sep 10 13:25:59 pazuzu kernel: NFS: append_write_request(c1033a4c, c4fd9830)
Sep 10 13:25:59 pazuzu kernel: NFS: 1448 schedule_write_request (async)
Sep 10 13:25:59 pazuzu kernel: NFS: nfs_updatepage(//linux 58@4096, sync=0)
Sep 10 13:25:59 pazuzu kernel: NFS: find_write_request(4/151992398, c02be7a0)
Sep 10 13:25:59 pazuzu kernel: NFS: create_write_request(//linux, 4096+58)
Sep 10 13:25:59 pazuzu kernel: NFS: append_write_request(c1033a4c, c4fd9c70)
Sep 10 13:25:59 pazuzu kernel: NFS: 1449 schedule_write_request (async)
Sep 10 13:25:59 pazuzu kernel: NFS: 1447 nfs_wback_begin (//linux, status=0 flags=0)
Sep 10 13:25:59 pazuzu kernel: NFS: 1448 nfs_wback_begin (//linux, status=0 flags=0)
Sep 10 13:25:59 pazuzu kernel: NFS: 1447 nfs_wback_result (//linux, status=0, flags=100)
Sep 10 13:25:59 pazuzu kernel: NFS: remove_write_request(c1033a4c, c4fd9d80)
Sep 10 13:25:59 pazuzu kernel: NFS: 1448 nfs_wback_result (//linux, status=0, flags=100)
Sep 10 13:25:59 pazuzu kernel: NFS: remove_write_request(c1033a4c, c4fd9830)
Sep 10 13:25:59 pazuzu kernel: NFS: nfs_updatepage(//linux 4038@4154, sync=0)
Sep 10 13:25:59 pazuzu kernel: NFS: find_write_request(4/151992398, c02be7a0)
Sep 10 13:25:59 pazuzu kernel: nfs: trying to update write request c4fd9c70
Sep 10 13:25:59 pazuzu kernel: NFS: nfs_updatepage(//linux 1@8192, sync=0)
Sep 10 13:25:59 pazuzu kernel: NFS: find_write_request(4/151992398, c0310e60)
Sep 10 13:25:59 pazuzu kernel: NFS: create_write_request(//linux, 8192+1)
Sep 10 13:25:59 pazuzu kernel: NFS: append_write_request(c1033a4c, c4fd9830)
Sep 10 13:25:59 pazuzu kernel: NFS: 1450 schedule_write_request (async)
Sep 10 13:25:59 pazuzu kernel: NFS: 1449 nfs_wback_begin (//linux, status=0 flags=0)
Sep 10 13:26:09 pazuzu kernel: NFS: 1450 nfs_wback_begin (//linux, status=-110 flags=0)
Sep 10 13:26:10 pazuzu kernel: nfs: server arioch not responding, still trying
Sep 10 13:26:21 pazuzu kernel: nfs: task 1450 can't get a request slot
Sep 10 13:27:33 pazuzu kernel: nfs: server arioch OK
Sep 10 13:27:33 pazuzu kernel: NFS: 1449 nfs_wback_result (//linux, status=0, flags=100)
Sep 10 13:27:33 pazuzu kernel: NFS: remove_write_request(c1033a4c, c4fd9c70)
Sep 10 13:27:33 pazuzu kernel: nfs: server arioch OK
Sep 10 13:27:33 pazuzu kernel: NFS: 1450 nfs_wback_result (//linux, status=0, flags=100)
Sep 10 13:27:33 pazuzu kernel: NFS: remove_write_request(c1033a4c, c4fd9830)

So. Is this a known bug? If so, is there a fix? If not, help!

-- 
I get my monkeys for nothing and my chimps for free.
http://www.clark.net/pub/hermit/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/