Re: Tx TCP rates down > 20% - A report.

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 2 May 1996 07:06:42 +0300 (EET DST)


On Wed, 1 May 1996, Alan Cox wrote:
>
> > Tx speeds are way down in 1.3.97 as compared to v1.2.13 -- in fact there
> > appears to be a "glass-ceiling" effect, where v1.3.97 can't Tx any
> > better than sending 820 -> 830kB/s even with decent hardware. However
>
> Its probably generating some stupid retransmits like the current
> hacked around code does on PPP links. The other thing that may do it
> is windows filling because of ack problems.

I don't think it's unnecessary packets - I've been tcpdumping linux on
ethernet, and it looks generally clean (there seems to be some silly
problem with zero-window probing, but that only shows up if we end up
doign a probe in the first place, and if that happens it means that the
receiver is so slow that the whole performance thing doesn't really enter
the picture in the first place ;-)

> > Rx with v1.3.97 is good, as I can consistently jam >1100kB/s into a
> > wd8013 and about 1040kB/s into a soft-config Winbond based ne2000 when
> > Tx'ing from a 1.2.13 kernel.
>
> Thats nice to know.

I get 900+kB with a 3c509, and that's over a bridge to a Sun machine
(only early in the morning - it goes down to 500-600kB during normal
working hours when the network fills up).

There _does_ seem to be some bad effects with the drivers under some
circumstances, though. Notably, the "tbusy" handling in the ethernet driver
interface looks like it's pretty broken - it's used for two things: (a)
serializing the ethernet driver (which was the original reason for it, but is
unnecessary these days when the network layer makes sure it's all serialized
anyway) and (b) as a send throttle to tell the network layer that the
card is busy.

The (b) case is the only thing it does any more, and I suspect it is also
the thing that makes you see bad performance. The TCP side is much faster
in the later 1.3.x kernels, and the network cards can no longer keep up
so the throttle is essentially in effect _all_ the time. What you see is
probably due to:

- TCP layer has a few packets queued up, sends one to the network driver
- network driver puts out the packet, sets tbusy
- TCP layer sees tbusy, and doesn't send any more
- network driver gets a "tx complete interrupt" and does a callback to
net layer with mark_bh(NET_BH), and the cycle starts up again..

Essentially, the tbusy thing may result in a _single_ packet being sent
and then we go away and come back only next time around. Broken, broken,
broken. I haven't touched it because I don't know the network drivers
well enough.

In short, the problem is _not_ in the network layer.

The reason 1.2.13 does better is probably two-fold
- the TCP layer wasn't very fast, so it was entirely possible that the
driver got the packet send out quickly enough that there wasn't much
of a throttling effect.
- the "net_bh()" handler used to do multiple calls to "dev_transmit()".
You still see that in net/core/dev.t - look at the things that are
#ifdef'ed with "XMIT_EVERY" and "XMIT_AFTER".

You can probably get better performance by enablign XMIT_EVERY and
XMIT_AFTER, but that is not the fix to the problem - they are probably
there exactly because somebody noticed that they improved performance and
that was the "easy fix" instead of the _real_ fix which is to make the
network drivers a bit more streamlined and maybe have a one-packet
send-queue inside the driver or whatever.

> > It is also worth noting that similar Tx rates were obtained by ftp'ing
> > a large file (cached) to /dev/null which indicates that the TTCP
> > measurements are not subjet to some huge systematic error.
>
> Does changing the large windows option affect it. Can a third box log the
> link and see what kind of mess you see in duplicates. How does 1.3.59 compare.
>
> We need to pin this down - its I think got to be extra frames or window/ack
> problems. The base performance goes well over 40Mbit/second on 100baseT

As I said, I'm more-or-less certain that the problem is _not_ the packets
on the wire, but the drivers. They need to be updated a bit - the 3c509
driver gets reasonable performance because it has a internal packet queue
on the card, so the tbusy thing works correctly (instead of throttling
every packet it throttles maybe every ten packets, which is roughly what
we'd want.

Linus