Re: [NFS] NFS Client in 2.2.13 (eepro100 problem)

Jim Winstead (jim@homepagecorp.com)
Tue, 23 Nov 1999 11:46:54 -0800


On Nov 23, Trond Myklebust wrote:
> >>>>> " " == slack <jhansen@xmission.com> writes:
>
> > I've got a Linux mail server mounting /home from a NetApp F720.
>
> > Client: Dual PIII 500 with 512MB ram. Linux mail 2.2.13 #11
> > SMP Fri Nov 19 16:56:29 MST 1999 i686 unknown Using
> > nfs-utils-0.1.2. Intel EtherExpress Pro 100+ w/ driver vserion
> > 1.09l. Full duplex 100mbit. Mount options:
> ^^^^^^
>
> > NetApp:
> > Release 5.2.3D1: Tue Aug 17 13:35:10 PDT 1999 slot 0:
> > System Board (NetApp System Board V L0)
> > Model Name: F720 Firmware release: 2.2_a2
> > Memory Size: 256 MB
>
> > The problem is, that under high access to the /home mount RPC
> > gets congested, with error "nfs: task 29025 can't get a request
> > slot" and NFS processes will start to block. In some cases
>
> I have the exact same problem with the EEpro100 1.09j driver. It makes
> testing on the linux-2.3.x series a major pain.
>
> Please try with the stock 1.06 from linux-2.2.13. It seems to be a lot
> more stable under high load.

We saw very similar problems with a similar setup. Things improved a bit
if we disabled the RPC congestion algorithm, or if we switched to using
Tulip-based cards instead of eeepro cards, and the ultimate solution was
to cut the amount of traffic on our NFS network. The eepro driver does
not appear to handle load very gracefully. (And we tried a number of
variations of driver versions, from stock kernel ones to Donald Becker's
later versions, and even older versions of the driver.)

We were able to get the driver to fail fairly reliably with a test program
(running on a second machine) that simply forked off a bunch of processes
that would make a bunch of web requests and read them slowly. Taking
all of the interfaces down, removing the eepro modules, and bringing
the interfaces back up would resolve the problem for a short time. (This
would cause the eepro driver to fail quite reliably, and would sometimes
cause the RPC congestion problem. We were never able to pin down a good
way of causing the RPC congestion problem to happen before we just solved
the root problem (warez kiddiez) and moved on to real work.)

Jim

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/