Re: Possible TCP Problem with RH6.2 talking to Solaris2.6/2.7

From: shane (shane@bratnet.net)
Date: Sun May 07 2000 - 12:57:10 EST


Well I have done more testing and I am still sorting through the results,
but here is the basics.

I am starting to believe that the CPU needed for running the Ethernet card
a Full Speed is more that I thought.

I replaced my P200 server with a DELL PowerEdge 6300/550 w/500MHZ XEON
processor, it's Fast. I transfered my Diablo application to this server
and it now has better performance to the Solaris Box. No I have a bunch
more netperf stats for this new setup. I have also installed a 3com
100BaseTX card and am testing it also.

Summary of netperf stats, loaded dell -> solaris performance is ok.
loaded compaq P200 -> solaris performance is bad.
loaded compaq P200 -> dell performance is bad.

This points to BUS restrictions, I would guess.

Now with the eepro100 running the interface connected to the Switched
Network Connection, the eepro is having stats like this:
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
TX-OVR Flg
eth0 1500 0 65251461 34 0 12 36945441 0 0
0 BRU
eth1 1500 0 6591536 0 0 0 31409901 0 0
0 BRU
lo 3924 0 2538041 0 0 0 2538041 0 0
0 LRU

And every once and a while I am getting
May 7 06:46:42 newspeer2 kernel: eth0: card reports no resources.
May 7 07:15:32 newspeer2 kernel: eth0: card reports no resources.
May 7 08:23:02 newspeer2 kernel: eth0: card reports no resources.
May 7 11:03:16 newspeer2 kernel: eth0: card reports no resources.

>From the eepro card.

Well my guess is that while the Solaris box is under load, and the linux
box is under load, It causes the Case 1, in the original posting to occur?

Case 1)
5 packets were dropped by Linux and had to be re-transmitted. This
contributed to about 30% of the elapsed time. Looking at the traces in
detail, the receiver isn't using multiple ACKS to indicate a dropped
packet. Instead, it stops ACKing new packets. The Sun has to timeout
(100+ milliseconds), and then send the missing packets. When I've
watched a Sun detect a missing packet, it does multiple ACKS of the
last good packet to indicate a packet is missing. The Linux box
doesn't do that. Delays were from 100 mseconds to 400 milliseconds in
this condition. As I said, it happend 5 times (out of 14258 packets),
but contributed to 30% of the total time..

Would you belive that the lack of multiple ACK's is a Solaris problem,
that sould be reported as a Solaris Bug, or something that like sujected
is a error on the linux side?

shane brath
                                              

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun May 07 2000 - 21:00:21 EST