2.0.x bug/patch - NFS - RPC: rpc_send sending evil packet:

Nigel Metheringham (Nigel.Metheringham@ThePLAnet.net)
Thu, 22 Jan 1998 11:34:56 +0000


This is a multipart MIME message.

--==_Exmh_4898912900
Content-Type: text/plain; charset=us-ascii

[Could the LMP people look at taking this up for 2.0.34]

We have seen a number of NFS related crashes in 2.0.x (seen in 2.0.30 RH
patches, rather worse for some reason in 2.0.33). In each case a number
of these messages are seen:-

Jan 12 19:28:21 svr-a-02 kernel: RPC: rpc_doio sending evil packet:
Jan 12 19:28:21 svr-a-02 kernel: 19cca34a 00000001 00000000 00000000
00000000 00000000 00000000 00200002
Jan 12 19:28:21 svr-a-02 kernel: RPC: rpc_send sending evil packet:
Jan 12 19:28:21 svr-a-02 kernel: 19cca34a 00000001 00000000 00000000
00000000 00000000 00000000 00200002

Then more "kernel: NFS server server_mail not responding", followed by
lots of fork failed and the system becoming pretty solidly locked up.

We finally found the following message on linux-kernel from last July
which appears to be the only mention of this problem, from Olaf Kirch:-
[ see http://linuxwww.db.erau.edu/mail_archives/linux-kernel/Jul_97/0470.ht
ml ]

Olaf Kirch <okir@monad.swb.de> said:
} This is definitely a bug in the NFS client. Let me explain what's
} happening and why the packet is evil:

} The client sees a timeout (or some other problem, e.g. garbage reply).
} While it correctly figures that it should retransmit the packet, it
} does not rebuild the packet but resends the current network buffer
} which, for some reason, already contains an RPC *reply* (you can see
} that from the second long word - a call has a 0 there, while replies
} have a 1). The first long word is the transmission id (XID).

} It appears there's a race condition somewhere in the RPC socket
} handling where the client doesn't notice that some packet has already
} been received when it starts waiting for the reply.

} Can you please apply the enclosed patch and check what happens?

} May I also request that people put `NFS' somewhere in the subject when
} they report a problem to linux-kernel that they suspect is
} nfs-related? Quite frequently, I scan the lists only by subject, so
} threads like this one usually escape me.

} Cheers Olaf

We used the patch given in that message (attached below) and it appears to
have fixed the problem for us.

Could this patch be evaluated for taking forward to the next 2.0.x
release...

Nigel.

--==_Exmh_4898912900
Content-Type: application/x-patch ; name="linux-2.0.30-olaf-nfs.patch"
Content-Description: linux-2.0.30-olaf-nfs.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="linux-2.0.30-olaf-nfs.patch"

SW5kZXg6IGZzL25mcy9ycGNzb2NrLmMKLS0tIGxpbnV4L2ZzL25mcy9ycGNzb2NrLmMub3Jp
ZyAgICAgIFdlZCBKdWwgIDkgMTE6MzI6MzAgMTk5NworKysgbGludXgvZnMvbmZzL3JwY3Nv
Y2suYyAgIFdlZCBKdWwgIDkgMTI6MDQ6MjEgMTk5NwpAQCAtNjAsMTkgKzYwLDI3IEBACgoK
IC8qCi0gKiBJbnNlcnQgbmV3IHJlcXVlc3QgaW50byB3YWl0IGxpc3QuIFdlIG1ha2Ugc3Vy
ZSBsaXN0IGlzIHNvcnRlZCBieQotICogaW5jcmVhc2luZyB0aW1lb3V0IHZhbHVlLgorICog
SW5zZXJ0IG5ldyByZXF1ZXN0IGludG8gd2FpdCBsaXN0LgorICogSWYgdGhlcmUncyBhbHJl
YWR5IGEgcmVxdWVzdCBpbiB0aGUgZmlyc3QgcG9zaXRpb24sIHRoaXMgaXMgdGhlIHJlY2Vp
dmVyCisgKiBzaXR0aW5nIGluIHJwY19zZWxlY3QuIEluc2VydCBuZXcgcmVxdWVzdCBhZnRl
ciB0aGUgcmVjZWl2ZXIuCiAgKi8KIHN0YXRpYyBpbmxpbmUgdm9pZAogcnBjX2luc3F1ZShz
dHJ1Y3QgcnBjX3NvY2sgKnJzb2NrLCBzdHJ1Y3QgcnBjX3dhaXQgKnNsb3QpCiB7Ci0gICAg
ICAgc3RydWN0IHJwY193YWl0ICpuZXh0ID0gcnNvY2stPnBlbmRpbmc7CisgICAgICAgc3Ry
dWN0IHJwY193YWl0ICpuZXh0LCAqcHJldjsKCi0gICAgICAgc2xvdC0+d19uZXh0ID0gbmV4
dDsKLSAgICAgICBzbG90LT53X3ByZXYgPSBOVUxMOwotICAgICAgIGlmIChuZXh0KQotICAg
ICAgICAgICAgICAgbmV4dC0+d19wcmV2ID0gc2xvdDsKLSAgICAgICByc29jay0+cGVuZGlu
ZyA9IHNsb3Q7CisgICAgICAgaWYgKChwcmV2ID0gcnNvY2stPnBlbmRpbmcpICE9IE5VTEwp
IHsKKyAgICAgICAgICAgICAgIG5leHQgPSBwcmV2LT53X25leHQ7CisgICAgICAgICAgICAg
ICBzbG90LT53X25leHQgPSBuZXh0OworICAgICAgICAgICAgICAgc2xvdC0+d19wcmV2ID0g
cHJldjsKKyAgICAgICAgICAgICAgIHByZXYtPndfbmV4dCA9IHNsb3Q7CisgICAgICAgICAg
ICAgICBpZiAobmV4dCkKKyAgICAgICAgICAgICAgICAgICAgICAgbmV4dC0+d19wcmV2ID0g
c2xvdDsKKyAgICAgICB9IGVsc2UgeworICAgICAgICAgICAgICAgc2xvdC0+d19uZXh0ID0g
TlVMTDsKKyAgICAgICAgICAgICAgIHNsb3QtPndfcHJldiA9IE5VTEw7CisgICAgICAgICAg
ICAgICByc29jay0+cGVuZGluZyA9IHNsb3Q7CisgICAgICAgfQogICAgICAgIHNsb3QtPndf
cXVldWVkID0gMTsKCiAgICAgICAgZHByaW50aygiUlBDOiBpbnNlcnRlZCAlcCBpbnRvIHF1
ZXVlXG4iLCBzbG90KTsKQEAgLTQxMiw4ICs0MjAsNiBAQAogICAgICAgICAgICAgICAgd2hp
bGUgKHJzb2NrLT5wZW5kaW5nICE9IHNsb3QpIHsKICAgICAgICAgICAgICAgICAgICAgICAg
aWYgKCFzbG90LT53X2dvdGl0KQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGlu
dGVycnVwdGlibGVfc2xlZXBfb24oJnNsb3QtPndfd2FpdCk7Ci0gICAgICAgICAgICAgICAg
ICAgICAgIGlmIChzbG90LT53X2dvdGl0KQotICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgIHJldHVybiBzbG90LT53X3Jlc3VsdDsgLyogcXVpdGUgaW1wb3J0YW50ICovCiAgICAg
ICAgICAgICAgICAgICAgICAgIGlmIChjdXJyZW50LT5zaWduYWwgJiB+Y3VycmVudC0+Ymxv
Y2tlZCkKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICByZXR1cm4gLUVSRVNUQVJU
U1lTOwogICAgICAgICAgICAgICAgICAgICAgICBpZiAocnNvY2stPnNodXRkb3duKQpAQCAt
NDIxLDYgKzQyNyw5IEBACiAgICAgICAgICAgICAgICAgICAgICAgIGlmIChjdXJyZW50LT50
aW1lb3V0ID09IDApCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgcmV0dXJuIC1F
VElNRURPVVQ7CiAgICAgICAgICAgICAgICB9CisKKyAgICAgICAgICAgICAgIGlmIChzbG90
LT53X2dvdGl0KQorICAgICAgICAgICAgICAgICAgICAgICByZXR1cm4gc2xvdC0+d19yZXN1
bHQ7IC8qIHF1aXRlIGltcG9ydGFudCAqLwoKICAgICAgICAgICAgICAgIC8qIFdhaXQgZm9y
IGRhdGEgdG8gYXJyaXZlICovCiAgICAgICAgICAgICAgICBpZiAoKHJlc3VsdCA9IHJwY19z
ZWxlY3QocnNvY2spKSA8IDApIHsK

--==_Exmh_4898912900
Content-Type: text/plain; charset=us-ascii

[ Nigel.Metheringham@theplanet.net - Systems Software Engineer ]
[ Tel : +44 113 251 6012 Fax : +44 113 234 6065 ]
[ Real life is but a pale imitation of a Dilbert strip ]

--==_Exmh_4898912900--