RE: Max tcp connections

Richard Gooch (rgooch@ras.ucalgary.ca)
Tue, 16 Nov 1999 19:39:07 -0700


David Schwartz writes:
> > Hi,
> >
> > On Fri, 12 Nov 1999 12:28:26 -0800, "David Schwartz"
> > <davids@webmaster.com> said:
> >
> > > But the thing is, if events fall back to poll under high load,
> > > how can they be more scalable than poll? That strikes me as an absurd
> > > argument.
> >
> > Think about it --- the amount of work a server can do is bounded by the
> > cost of doing each unit of work. That cost is O(1) for the event model
> > and O(n) for poll.
>
> It is O(1) for poll.

Gads, you keep saying this but it's completely false! poll(2) is O(n),
because it *must* scan the entire fd list at least once per
call. Thus, the more fds you have (irrespective of whether they are
busy or not), the longer it will take to run. This is what O(n) means.

Furthermore, because the current poll(2) implementation uses calls to
function pointers to determine the fd state, this scan is quite
expensive per fd. On a Pentium 100, the cost is about 4 us per fd.
http://www.atnf.csiro.au/~rgooch/benchmarks/polling.html

That 4 us is not far from the cost a single syscall! So you're already
lost most of the gains of avoiding a sigwaitinfo(2) per active fd,
even if *all* fds were *always* active.

> > The event model only ever uses poll if the event queue overflows, which
> > is expected to be a rare occurrence (it *never* happens during heavy
> > local-load benchmarking with phhttpd).
>
> Then you are using an O(n) mechanism when an O(1) mechanism is
> available.

There is no other O(1) mechanism available. Certainly poll(2) isn't
O(1).

> To repeat, consider a typical use of poll:
>
> 1) Call poll.

Kernel does O(n) work.

> 2) Do the work that poll found.
> 3) Repeat.
>
> Now, assume that we are fully loaded, so we never block. Each
> call to poll takes O(n) time, where 'n' is the number of connections
> we are handling. Each call to poll returns O(n) ready connections
> to operate on. Step 2 takes O(n) time. So the number of calls to
> poll has order O(1/n). The overall overhead of poll is O(1) --
> constant.

This argument is fallacious. The kernel is doing O(n) work in
poll(2). You are in effect ignoring the other costs in your loop.
Furthermore, you are being inconsistent, because you count other
overheads (particularly syscall overheads) against the signal-based
scheme.

> Of course, the time spent in step 2 is going to be the
> predominant factor here. So the overhead of poll will drop into the
> noise as load increases. In fact, the percentage of file descriptors
> that are ready per call to poll is not constant. It goes _up_ with
> load, since it takes us longer to get back to 'poll'. So poll is
> actually even better than this analysis suggests.

Poll(2) is not as cheap per fd as you might think. Also, as poll(2)
takes longer to complete (due to the kernel scanning more fds), the
latency of your application will increase. Not to mention more cache
pollution.

> Of course, most real world applications won't push the limits
> in either case. So events lower CPU cost at typical loads that
> should make it much better. However, it is _not_ more scalable than
> poll. In fact, when you push it to the limits, it falls back on poll
> precisely because poll is more ultimately scalable.

Of course events are more scalable than poll(2), because there is no
need to scan a huge list of fds for activity. The only reasonable
argument you can make is that doing a syscall to return one event is
an overhead that might matter to some applications. And the total cost
is O(m) (where m is the number of active fds, which is usually much
less than n, the total number of fds).

Only where m is a significant fraction of n will this be noticeable.
If you really have an application where this is true, then the correct
solution is to write a syscall which returns multiple events rather
than just one. But I strongly doubt that even your 16k connections on
a LAN application will have m anywhere near to n.

Remember that the timescales things are happening at are quite
small. What matters is the number of active connections during a short
period of time (of the order of milliseconds, not seconds as you might
think). For short enough timescales (such as the time taken to service
an active fd), there will probably be only a small fraction of fds
which are active.

Instead of making the (obviously false) assertion that poll(2) is
better than events, do some benchmarking instead if you're convinced
we've all got rocks in our heads.

Regards,

Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/