Re: Overscheduling DOES happen with high web server load.

Phillip Ezolt (ezolt@perf.zko.dec.com)
Thu, 6 May 1999 14:01:16 -0400 (EDT)


On Thu, 6 May 1999, Dean Gaudet wrote:

>
>
> On Thu, 6 May 1999, Phillip Ezolt wrote:
>
> > Dean,
> > > As shipped, apache-1.3.6 on linux uses fcntl() file locking to prevent
> > > more than one process from being inside accept(). I'm not sure if the dec
> > > folks have rebuilt the server with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT...
> > > if they have, then there's no protection around accept() in servers
> > > listening on a single port.
> >
> > Apache was built with the following flags:
> > (-DSINGLE_LIST_UNSERIALIZED_ACCEPT is not amoung them. )
> >
> > ./configure --prefix=/usr --libexecdir=/usr/lib/apache --sysconfdir=/etc/httpd/conf --datadir=/home/httpd --includedir=/usr/include/apache --logfiledir=/var/log/httpd --localstatedir=/var --runtimedir=/var/run --proxycachedir=/var/cache/httpd --enab
le-m
> > odule=all --enable-shared=max --disable-rule=WANTHSREGEX
> >
> > Would it be better to build with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT? Is
> > this something that would help or hurt the "thundering herd" problem?
>
> The mindcraft folks did build with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT
> (you stick it into the CFLAGS environment variable before invoking
> ./configure). My guess is that you'll see the same results.

Is that "same results" WORSE or BETTER at thundering herd than without it?

>
> I suspect that with accept() or fcntl() we'll need an option to enable the
> wake_one() behaviour -- otherwise it's a pain to deal with process
> priorities and such. Essentially a flag which says "all processes waiting
> on this are of equal priority, no matter what their cpu time, when their
> last time slice was, blah blah blah". (And then the kernel should wake
> the one which went to sleep most recently ;)
>
> Dean
>
>
>
>

While fixing apache to play nice with linux may be a good solution to the
SPECWeb problem. I think that this test, in general, reveals a critical flaw
in the linux scheduler.

I have seen the same kind of problem here with other programs that have a
large number of running processes. Linux slows to a crawl, and it is NOT
a result of swapping. This is really a scalability issue. If linux is
going to move beyond its current segment of the market, (low-cost servers),
these things have to be fixed.

This problem will not go away with small tweaks, such as Richard Gooch's
seperate real-time queue.

They will buy some time, but it won't get rid of the real issue. This linear
search is bad.

Right now we are recalculating goodnesses again and again and again.
Really, only a few values will change. This is extra work that is a waste
of CPU time. In an ideal world, we can find the next runnable process in
O(1), not O(runqueue len) time.

I've worked a little here with Greg Gaertner, who designed a scheduler for
Cray version of Unix, and we have come up with an alternate solution.

However, it would involves a major rethinking of how linux scheduling works.
(Priority is NOT stored as tick values, processes age as time goes on, realtime
processes don't age and have a high priority). The current version limps, but
it doesn't fully work, or deal with SMP correctly. It is basically an array of
linked lists that keep track of the current highest priority. It can tell
immediately what the next process to run is. I'll present the proposed
solution at LE, and hopefully, we'll be able to talk about it more then.

--Phil

Digital/Compaq: HPSD/Benchmark Performance Engineering
Phillip.Ezolt@compaq.com ezolt@perf.zko.dec.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/