Re: Some very thought-provoking ideas about OS architecture.

Jim Gettys (jg@pa.dec.com)
Mon, 21 Jun 1999 12:38:59 -0700

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Otel Florian-Daniel: "Re: version.h in 2.2.10"
Previous message: Marcelo Tosatti: "[patch 2.2.10ac3]"

> Sender: owner-linux-kernel@vger.rutgers.edu
> From: Bernd Paysan <bernd.paysan@gmx.de>
> Date: Mon, 21 Jun 1999 11:36:54 +0000
> To: linux-kernel@vger.rutgers.edu
> Subject: Re: Some very thought-provoking ideas about OS architecture.
> -----
> Linus Torvalds wrote:
> > In short: message passing as the fundamental operation of the OS is just
> > an excercise in computer science masturbation. It may feel good, but
> > you don't actually get anything DONE. Nobody has ever shown that it
> > made sense in the real world. It's basically just much simpler and
> > saner to have a function call interface, and for operations that are
> > non-local it gets transparently _promoted_ to a message. There's no
> > reason why it should be considered to be a message when it starts out.
>
> This is partly right, and partly wrong. Actually, some light-weight
> message passing protocols can be cheaper than function calls. A function
> call to a kernel function requires two state transitions (user->kernel
> and back), and two context switches; since the kernel wants to massage
> different data than the user process, it also has an effect on the
> cache.
>
> Message passing can make this cheaper, if (and only if) your messages
> are handled asynchronously. Put a bunch of messages into your shared
> memory message buffer (simple *ptr++ = id; *ptr++ = arg; ...), and when
> you are done, do the one state transition and context switch, and let
> the kernel handle all the requests at once (also simple: next message is
> goto *handlers[*ptr++]; Message code does arg = *ptr++;...). The
> communication overhead in a good designed active message system can be
> below the state transition overhead in a classical OS.
>
> Counterside: works only if your OS is designed to deliver asynchronous
> results, and if your app is programmed to use that. In other words:
> forget about blocking read/write calls. Works much better for services
> like X Window than for Unix-like OS services. The point is that you must
> restructure your app to be message-handling, too. I.e. a web server's
> frame would look like
>

Buffering and batching (or pipelining) in systems design is a well known
technique (by some) for improving system performance, particularly when
latencies are high. X has done this for a long time (approaching 15 years),
it is part of HTTP/1.1, and appears elsewhere. We didn't consider it rocket
science even when we were doing the early X work over 14 years ago. Look
at VMS QIO and AST delivery for another (ugly) approach, but in my view
one that throws away almost all of the benefits as it still is doing
the system call transitions without the benefits of the buffering to
amortize the expense.

An X request has an instruction budget of 100 instructions or so total;
the only way that this is feasible is to avoide system calls like the
plague, and to amortize such expensive operations as read/write and select
over many X requests. I used to regularly characterize X an exercise
in avoiding system calls.

I will note, however, that interface (protocol) design has a major
impact in how this technique can/will work and it is hard to retrofit.
We worked pretty hard in X Version 11 design to avoid these problems,
but history has shown we didn't work hard enough.

An example is the X request "InternAtom", which is heavily used (much
more so that we originally thought it would be) and is the basis of alot
of X's extensibility for client/client and client/window manager
communication. InternAtom gives you a short "atom" name for a string
(and is used as an extensible type system for communcation). This is a
synchronous call, and has turned into a bottleneck (we built in alot of
basic atoms. With 20-20 hindsight, we should have chosen a suitable sized
hash function, and just always sent a hash, which would have allowed them
to always be client generated.

Here's the moral: buffering/batching can work REALLY well, but is BEST done
at design time, and hard/painful/impossible to retrofit later. It can often
cause VERY great performance increments (for HTTP/1.1, for example, where
it turned out to be possible to retrofit to some extent, it can allow
for a factor of 2-10 performance improvement from our measurments). Whether
it would make any sense to try to retrofit anything approximating UNIX
system call semantics onto such a base is far from clear to me at all...

So if you want to do this when designing a system, think about it first,
not later, and think about it hard!

- Jim

-- Jim Gettys Compaq Computer Corporation Visting Scientist, World Wide Web Consortium, M.I.T. http://www.w3.org/People/Gettys/ jg@w3.org, jg@pa.dec.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Otel Florian-Daniel: "Re: version.h in 2.2.10"
Previous message: Marcelo Tosatti: "[patch 2.2.10ac3]"