Sporatic ethernet problems with 2.2.x

Paul Doom (pauldoom@webcreate.net)
Fri, 5 Feb 1999 02:11:37 -0600 (CST)


In my recent work on building up a Linux firewall, I've been running
NetPipe to compare throughut to/through the firewall with varying kernels.
I have three machines. One with a Tulip based card, one with a 3c595, and
the firewall with a pair of EEpro100s. All three machines are connected
via crossed twisted pair.

Network performance is measurably better in Linux 2.2 than in 2.0.36,
however, I have run into two major problems. When benching to/from
the box with the 3c595 the performance goes south (from about 50Mb/s
to 2MB/s or so) above certain, repeatble packet sizes. (Depending on
how the driver is compiled, that break off point is different.)
The performance hit is not permanent, just related to packet size.
Much worse than that, when I compile with egcs-1.1.1, the connection just
locks up at said breakpoints. Bouncing the interface with ifconfig
restores the conection, until the next big packet hits. Running dmesg
produces shows varoious errors like:

eth0: Transmit error, Tx status register d0.
eth0: Transmit error, Tx status register 90.
eth0: transmit timed out, tx_status 00 status e000.

I first suspected hardware and swapped in a new 3c595, then I swapped
network cables. No change. I tried various things while compiling,
suspecting egcs1.1.1. gcc dudn't help much, except when the kernel is
compiled with egcs and the NIC driver with gcc, in which case the porr big
packet performance seems to be the only problem.

Another (related?) problem springs up on the machine with the Tulip card.
Benchmarking runs fine, till the machine has been up for a while. At that
point, and at random under load, the connection will take on a weird
"bursty" quality, where strait 1 second pings to the machine come back
in a lump 10-39seconds later and out of order. Bouncing the interface
returns it to normal. Nothing shows up in logs to indicate any problem.

The box with the eepro100's has been rebooted more often than the others.
I think that the same problems I see with the Tulip card showed up
on the eepro100's, but I destroyed the evidence by bouncing the interfaces
on both boxes that experieced the communication loss during benchmarking.

I wanted to get this out quickly so others could chime in with similar
reports, or ideas. I will work on isolating the problem and collecting
useful data. Sorry this was so long winded.

-Paul

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
< Paul Doom pauldoom@webcreate.net >
> http://www.webcreate.net/~pauldoom <
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/