We saw very similar problems with a similar setup. Things improved a bit
if we disabled the RPC congestion algorithm, or if we switched to using
Tulip-based cards instead of eeepro cards, and the ultimate solution was
to cut the amount of traffic on our NFS network. The eepro driver does
not appear to handle load very gracefully. (And we tried a number of
variations of driver versions, from stock kernel ones to Donald Becker's
later versions, and even older versions of the driver.)
We were able to get the driver to fail fairly reliably with a test program
(running on a second machine) that simply forked off a bunch of processes
that would make a bunch of web requests and read them slowly. Taking
all of the interfaces down, removing the eepro modules, and bringing
the interfaces back up would resolve the problem for a short time. (This
would cause the eepro driver to fail quite reliably, and would sometimes
cause the RPC congestion problem. We were never able to pin down a good
way of causing the RPC congestion problem to happen before we just solved
the root problem (warez kiddiez) and moved on to real work.)
Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/