Re: Now, why didn't we think of this before?

Andrew Vanderstock (ajv@greebo.svhm.org.au)
Tue, 25 Mar 1997 16:11:50 +1000


> From: David S. Miller <davem@jenolan.rutgers.edu>
> Let's implement system calls via RPC while we are at it...
>
> A few things:
>
> 1) threaded kernel is being worked on, we should be scaling nicely
> on SMP as a result some time in the near future

Good! :-) How can I help? I have my two cpu HP XU 6/200 coming to me this
week. I've done pre-emptive thread development before, and know how to
program thread based stuff.

> 2) re-entrant kernel, in the cases where it even makes sense to do
> so we will get it for free via #1, if you want a pre-emptive kernel
> that is another story all together

Understood.

> Memory may be cheap, however on chip cache is not. This is one of the
> biggest arguments against persistant resizable storage, and why real
> systems continue to be monolithic and written in C/Assembler.

Bzzzt. You've missed my point. Would you prefer a machine that is unstable
due to a programmer limiting the algorithm under moderate to high load
situations, or a machine that is rock solid, never complains about low
level stuff, and occasionally thinks for a few microseconds (or less) when
it resizes some collection of objects that it has exhausted? A major ISP,
relying on Linux for a 500,000 hit a day web server would take resizable
any day of the week. They don't want to re-compile the kernel to do this,
they just want to use it. Done right, a expandable resizable collection
storage algorightm will be faster than a badly implemented (ie read
linearly accessed lists) non-expandable algorithms.

Using a decent algorithm, means you start low enough that in the most
cases, the current situation is satisfied without any automatic growth. If
a administrator decides to connect a workstation to a T1 and load up
Apache, Squid and 10,000 mail accounts using sendmail, automatically, more
objects are created - and the system stays up as long as VM is not
exhausted. Much better, no recompilation. Balls to wall speed is in most
cases preserved, at a small penalty when growth needs to be accomplished.
If a machine experiences no growth during a time period T, then no
automatic growth is done, and thus the time to access is only marginally
less or the same as before.

NB: The following is just an example, not to be confused with the greater
issue.

Start with enough room for 256 objects. Need more? Allocate another 256.
Need more? Allocate 512. Need more? Allocate 1024. You get the picture; the
main thing is to do background garbage collection and keep some room to
grow. Even this may not be the best strategy - that would require profiling
on low, medium and high load machines to see what works best. Speed maybe
gained from the GC routines - ie if no (or few) nodes were freed this time,
don't run for another T period x 2 (ie backoff). Speed degradation - the
occasional time when growth is needed (would have caused a crash in a
non-expandable machine, so a definite positive to this system, and I'm sure
the users and/or administrator wouldn't mind that we are going to take a
few microseconds of his/her time), and the occasional time when gc takes
significant time. Intelligent GC design and scheduling can reduce this to
very low levels, particularly if the data structures are designed very
well. End example.

I will admit right now I don't know what stays in L1 cache. I personally
doubt that a kernel structure would stay there for very long, even on roomy
32KB separate I/D caches. Anyway, a decent algorithm, with decent locality
will be fast enough, for example keeping all the node ptrs near each other
and in order will speed structure traversal. Then you don't waste D cache
with data that will never be accessed, leaving room for something that
will. Win some, lose some.

> People who want a Linux kernel in Java VM, go ahead be my guest and
> implement such a thing. However I will never encourage anyone to use
> it who happens to care about performance. [snip]

I wasn't advocating a Java version of Linux; too much is missing from Java
for a successful OS to be written in Java right now.

--
"Hate is not a family value" - bumper sticker
Andrew Vanderstock (ajv@greebo.svhm.org.au)