Re: NFS corruption on 2.6.11.7

From: Kenneth Johansson
Date: Tue May 24 2005 - 05:29:34 EST


On Mon, 2005-05-23 at 20:35 -0400, Trond Myklebust wrote:
> ty den 24.05.2005 Klokka 00:47 (+0200) skreiv Kenneth Johansson:
> > I have both the server and client running 2.6.11.7 and have some severe
> > data corruption when reading from the server (maybe on write also I have
> > not tested).
> >
> > If I copy the data over with scp or ftp I get correct data. Also nfs
> > works OK with a mac os x 10.4 client.
> >
> > Running gen.sh on the server and then cmp.sh on the client results in a
> > md5 checksum difference on 5-12 files I have never done one run where
> > there was no errors.
> >
> > This is what cat /proc/mounts reports on the nfs mount
> >
> > :/export/home/ken /home/ken nfs rw,v3,rsize=32768,wsize=32768,hard,udp,lock,addr=amd 0 0
> >
>
> I'm seeing no problems at all with this on a loopback mount with
> 2.6.12-rc4. Mind giving us some more details on your setup?
>
> Cheers,
> Trond

I did some more investigation what type of data error I get and it looks
a bit strange. I always get 28 bytes wrong in a sequence some times this
is data repeated from previous in the file but not always. Anybody know
what cache line size this cpu has?

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(TM) XP 2200+
stepping : 0
cpu MHz : 1802.998
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips : 3547.13

Here is a sample if three files with errors in them.

file 13 "od -Ax -tx1z"


-924dc0 df b3 0c 89 2d a2 83 da 1c 08 f2 66 da f6 6b f4 >....-......f..k.<
+924dc0 43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd >C.*....v.&..$=..<

-924dd0 af c2 44 57 9a 13 01 43 84 bf 99 c3 1b 16 8a 00 >..DW...C........<
+924dd0 3e 64 d7 bd 4f 8d 26 cf 4f 4f 2c 62 1b 16 8a 00 >>d..O.&.OO,b....<


28 bytes wrong in a sequence
The data is a repeat from previous data in the file.

>grep "43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd" 13_org
924d40 43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd >C.*....v.&..$=..<

>grep "43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd" 13_err
924d40 43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd >C.*....v.&..$=..<
924dc0 43 11 2a f4 98 09 d5 76 aa 26 83 00 24 3d 11 fd >C.*....v.&..$=..<

924dc0 is a copy of 924d40
128 bytes offset


file 14 "od -Ax -tx1z"

-0912f0 91 45 bb cd eb 4f 01 d3 69 27 88 b5 7d 7d 17 8d >.E...O..i'..}}..<
+0912f0 b8 3f 4e 5d 2e 86 ed c0 51 79 fe ec 3e 53 c9 29 >.?N]....Qy..>S.)<

-091300 7d 94 8e f9 81 d0 c2 4a b5 8e c6 af b0 03 4c 16 >}......J......L.<
+091300 d9 05 ac 0d fc eb 00 71 17 bd fb 3e b0 03 4c 16 >.......q...>..L.<

>grep "b8 3f 4e 5d 2e 86 ed c0 51 79 fe ec 3e 53 c9 29" 14_err
0912b0 b8 3f 4e 5d 2e 86 ed c0 51 79 fe ec 3e 53 c9 29 >.?N]....Qy..>S.)<
0912f0 b8 3f 4e 5d 2e 86 ed c0 51 79 fe ec 3e 53 c9 29 >.?N]....Qy..>S.)<

28 bytes wrong
64 bytes offset


file 16 "od -Ax -tx1z"

-635200 c3 1d f2 b8 c4 d5 12 c1 3f 48 e6 9d dc 98 1f e5 >........?H......<
+635200 c3 1d f2 b8 c4 d5 12 c1 00 10 00 00 00 d0 ec 08 >................<

-635210 9e 54 e7 f1 49 5b 1e d0 9f e2 7c 26 24 cb 98 24 >.T..I[....|&$..$<
+635210 00 10 00 00 00 90 14 08 00 10 00 00 00 50 25 06 >.............P%.<

-635220 25 fc 63 2a bf 07 b4 c0 cf a1 67 9b ef 01 5d 6d >%.c*......g...]m<
+635220 00 10 00 00 bf 07 b4 c0 cf a1 67 9b ef 01 5d 6d >..........g...]m<

28 bytes wrong
This time the data is not from this file.




Attachment: signature.asc
Description: This is a digitally signed message part