Re: strange TCP stack behiviour with write()es in pieces

From: David Schwartz (davids@webmaster.com)
Date: Wed Jan 02 2002 - 16:49:56 EST

Next message: Christian Koenig: "Re: How can one get System.map w/o vmlinux?"
Previous message: Dave Jones: "Re: ISA slot detection on PCI systems?"
In reply to: Michal Moskal: "strange TCP stack behiviour with write()es in pieces"
Next in thread: Michal Moskal: "Re: strange TCP stack behiviour with write()es in pieces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2 Jan 2002 17:28:06 +0100, Michal Moskal wrote:

>So, it occurs in programs doing packet communication over TCP, when peer
>waits for a packet to send an answer. If they send data with two write()
>calls (for example to write packet header and packet data), the performance
>dramaticly decrases (down to exactly 100 (2.2.19)
>or 25 (2.4.[57]) packet exchanges per second on x86, from several thousands.
>100 seems to be related to HZ variable, see also AXP results, where HZ is 10
>times bigger).

That's why you should never, ever do anything that stupid. What should the
kernel do? When it sees the first write, it has no idea there's going to be a
second write, so it sends a packet. It gives you the benefit of the doubt and
assumes that you know how to use TCP. When it sees the second write
immediately thereafter and they're both small, it no longer trusts you and it
has no idea there isn't going to be a third write a microsecond later, so it
doesn't send a packet.

>I, personally, would expect the second version to be at most two times
>slower (as there might be need to send two IP packets instead of one).
>Also note, that as it is obvious that version with copying to buffer on the
>stack should be faster, it is not so obvious if there is need to malloc()
>buffer before sending (for example if there is no upper limit on len).
>However even if we need to malloc() buffer, second version is still by
>orders of magnitude faster.

If you can design an algorithm that makes that only two times slower, then
the world will be excited and interested and perhaps that algorithm will
replace TCP. But until that time, we're stuck with what we have.

>I found it during work with client/server program that worked horribly slow
>just becouse of this. (of course I fixed it, but that's not the point).

THAT IS THE POINT. The problem wasn't in the kernel, it was in the program,
and you fixed it. If you do smart buffering, TCP can behave efficiently. If
you don't, it has to guess when to send packets, and it can't possibly
predict the future and behave in the way you think is optimum.

How does it know you care about latency rather than throughput? And what
should it do if it sees a steady stream of one byte writes, one every tenth
of a second?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christian Koenig: "Re: How can one get System.map w/o vmlinux?"
Previous message: Dave Jones: "Re: ISA slot detection on PCI systems?"
In reply to: Michal Moskal: "strange TCP stack behiviour with write()es in pieces"
Next in thread: Michal Moskal: "Re: strange TCP stack behiviour with write()es in pieces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 07 2002 - 21:00:18 EST