Re: Max tcp connections

Jim Gettys (jg@pa.dec.com)
Wed, 17 Nov 1999 07:23:56 -0800 (PST)


> Sender: owner-linux-kernel@vger.rutgers.edu
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Date: Wed, 17 Nov 1999 12:07:53 +0000 (GMT)
> To: davids@webmaster.com (David Schwartz)
> Cc: sct@redhat.com, linux-kernel@vger.rutgers.edu
> Subject: Re: Max tcp connections
> -----
> > still means that the amount of time we spend in '1' and '2' is O(n). While
> > each call to poll takes O(n) time, it returns O(n) ready file descriptors.
> > Overall, poll is O(1) for high-load applications.
>
> Oh dear. Tell you what. You go write an http accelerator thats faster than
> phttpd then come back
>
>

Alan, you are being too flip. I'll give you an example of an existing
server you use every day heavily that exploits this behavior, deliberately (and
has for nearly 15 years).

This principle is why X works well under load (actually, will work alot
better interactively in XFree86 4.0 due to recent X dispatcher changes
Keith Packard has made): in fact, in X, you get WAY under one system call
per operation (or even client with requests to process), do to a combination
of batching requests, and this behavior of select.

It means that under load, the X server gets more and more efficient, a
property that is highly desirable for network servers in general.
You have cycles to burn when not highly loaded, but they become increasingly
precious. X is at its most efficient exactly when you need it most:
when one or more clients are asking for everything it can possibly
give you.

A quick description of the X dispatcher is something like the following:

while (1)
select (on all the descriptors) {
If descriptor has come or gone - startup or get rid of connection
foreach (descriptor with work to do) {
non-blocking read of as many requests into buffer as you can
if (full request available) {
call out to the X routine that does the work
update pointer to next request
}
}
}

It is more complicated than this, but this catches the essense. You might
look at it sometime: the code is only a couple hundred lines of code total
for the dispatcher, as I remember, including all the corner cases; the
inner loop itself is a small number of lines of code. That part of X
is really simple (has to be, or it wouldn't be able to do .75 million
no-op requests/second on the lowly 333mhz celeron I happen to be composing
this on.

Note: the more loaded you are, the less often you call select, and the
more descriptors you find work you can productively do per system call.

Keith's recent changes do two things which will improve interactive feel:
a) since the cursor logic is not part of the device driver in XFree86,
(where it is on most good workstation X implementations), he's switched
over to a signal driven scheme for moving the cursor,
b) he's added a maximum number of requests to be processed each time through
the inner loop, and an extra loop around the inner loop to pick up whether
any requests are left over from a first pass through the inner loop.
This will make it less of a problem for a single graphics bound client
to interfere with other clients (e.g. your window manager).

And since we fondly fondle the device memory/registers of the graphics
hardware directly, we get the average well under one system call/request.

A koan of mine I used to use in early X talks is "X is an exercise in
avoiding system calls".

And, btw, HTTP/1.1 with suitable clients does batching similar to what
X does, and may show more similar behavior than it has in the past (the
clients still aren't all that good at exploiting what is possible in
the HTTP/1.1 protocol).

So select telling you all of the descriptors on which work can be done
turns out to be a major win in my first hand experience, for some
applications.
- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/