Re: real kernel bloat

Malcolm Beattie (malcolm.beattie@computing-services.oxford.ac.uk)
27 Jun 1996 10:27:49 GMT


In article <199606261805.SAA00791@magneton-ra.swmed.edu>,
Alex Krimkevich <alex@magneton-ra.swmed.edu> wrote:
>Alan Cox writes:
> > > almost anything. It is true that DEC's kernel is several Megs in size,
> > > but don't forget it is capable of much more than Linux is, and,
> > > arguably, will ever be. The lacking capabilities are of no concern
> > > to most people, however the truth remains: Digital UNIX is a better
> > > multitasking, multiuser OS than Linux. I am no DEC's or Sun's fan,
> >
> > On what measurements. which facilities, what hardware size.
>
>Well, my personal experience has been, that Linux' performance
>deteriorates pretty rapidly as the load increases (be it one user
>running more jobs or more users being logged in). You, probably,
>know, that Digital Unix 4.0 is capable of supporting 4000 users
>(so they claim). It has got fail over features (clustering), it
>scales to more processors than Linux does, it's got logical
>volume manager, journaling file system, and the list goes on
>and on. And it knew how to do those things 5 years ago.

Not 5 years ago, it couldn't. Less than two years ago, SMP was
introduced in OSF/1 3.0 (we field-tested it) and we had kernel
panics on average three times a week for over 6 months. Things
have finally got a lot better with Digital UNIX 3.2C + extra
patches but we had it very bad for a long time. We're up to
vmcore.66 and that's after dropping the count by about 20 after
one re-installation. The journaling file system you mention,
AdvFS, is useless before 4.0 in many SMP environments since it's
not SMP-aware: everything is funnelled to the boot CPU. LVM/LSM
also had problems and were not as useful with UFS instead of AdvFS:
although you could dynamically expand the underlying "disk", UFS
couldn't itself be expanded.

The clustering stuff is neat: the underlying technology is a fast PCI
memory channel (~800Mbps or possibly bytes per sec: damn fast anyway)
between two machines (or with a hub if you want to scale up). The
necessary kernel support mirrors a few pages between the hosts to
keep state and runs IP across the memory channel. For clustering
disks there's a limited amount of sharing of real disks (in practice,
SCSI is pretty fragile about that sort of thing) but you mostly NFS
mount your disks across the memory channel (and looback mount your
own disks). There's enough kernel support to get fail over: when one
host goes down the other picks up the physical disk and the upper
layers see nothing because it's NFS mounted anyway.

There's sugar coating on top of it all and all the usual IP aliasing
tricks and glue to make it a product, but it's mostly just nice
hardware.

>As of this moment Linux can't do any of these. Sure,
>it will, eventually. But this is a catching up mode. When Linux
>catches up with DEC on these or other features, DEC will have
>something else up their sleeves. I hope I am wrong, but so
>far, my impression has been, that very little innovation comes from
>the Linux camp: all we do is trying to outperform others in
>things they had been able to do years before Linux' ( or even
>Linus' :-) ) conception.

What, you mean like fast IP aliases that Linux has had for a while
and that DEC have only just put into 4.0 (and are planning to supply
a patch for for 3.2x--it may be in 3.2G though)? Linux has more
flexible routing tables than Digital UNIX. Linux has a greater number
of dynamically loadable kernel modules than Digital UNIX and better
support for run-time configuration of them. Digital UNIX has
"sysconfig" hooks for run-time kernel subsystem configuration but you
still have to use sysconfigtab and reboot for most configuration
parameters. The quirks of Mach memory management and SMP support are
causing problems these days. DEC had to completely rewrite the SMP
stuff in OSF/1 and overhaul the memory subsystem which is a weird
hybrid of BSD and Mach. In doing so, they've done an extremely good
job and they've been very innovative. But Linux has also had its
memory management overhauled and it too has been equally good and
innovative. Score draw. Based on what I hear about Linux SMP, Digital
UNIX is still currently ahead there, especially for more than a few
CPUs. The shake-out for serious Linux SMP is on its way and it'll be
interesting to see how it goes.

> And they honestly don't care. Who
>cares that OSF's kernel is 8 MB if AlphaStation comes with
>64 MB minimum, and memory prices keep going down? If you
>were Digital, wouldn't you rather concentrate on the features your
>customers demand, than optimizing every line of code in the
>kernel?

They *have* been optimising the kernel for 4.0 though. They've put a
lot of work into it: lots of memory subsystem changes: page table
granularity hints, making more of the SMP locks more fine-grained,
improving shared memory (because it gets hit a lot by databases).

> > > but let's be honest - there is no way that a group of people, most
> > > of who hold daytime jobs, can compete with the multibillion
> > > corporations, which employ some of the best minds on this planet.
> >
> > I beg to differ. And every benchmark we have floating around says just who
> > is winning.
>
>See above. In addition to that, let me add, that user's perspective
>of what OS is, is quite different from yours, Linux major contributor.
>As a user, I don't run kernel tcp/ip benchmarks, I run ftp instead.
>And what I see is that ftping from Solaris 2.4/Sparc2 to AlphaStation
>on a different subnet delivers 660KB/s on average. The Linux box,
>which hardware wise should beat the crap out of an ancient Sparc
>delivers 250 KB/s.

What kernel version and what ethernet card? You should be seeing at
least 750-850Kb/sec with an equivalent ethernet card/CPU. I run our
web server with Linux 2.0 and NFS export the filestore to the
general purpose 2100 Alphaservers from which our users get at their
web pages. The web server is a Pentium 133MHz running Linux 2.0 with
a 3com 3c590 PCI ethernet card and I get 500Kb/sec with NFS. That's
not even with the latest 3c590 driver which should be even faster,
I believe. If you're getting 250Kb/sec with ftp then you're either
not runing Linux 2.0 or you've got a much worse ethernet card which
you shouldn't be comparing with the (presumably) reasonable one that
Sun installed in their box.

[...]
>to writing it as a means of self expression. And this suggests the way
>how to really improve Linux' chances on BIG success - make it EASY
>for the commercial entities to write for Linux. It means supporting
>Spec 1170, instead of POSIX.

Linux *is* being put through the relevant branding process. Pasting
from the press release
http://caldera.com:80/whatsnew/open_linux.html
it's going to be POSIX.1 (FIPS 151-2) in Q3 1996; XPG4 Base 95 (POSIX.2,
FIPS 186) by Q4 1996; and X/Open brand for UNIX 95 based on the Single
UNIX Specification (formerly known as SPEC 1170) during 1997.
The resulting branded Linux is going to be freely available.

[...]
>I read your answer to Alexey regarding STREAMS. I am sure, you
>are right that it's slower than sockets, but it does not matter.
>If someone recompiles or starts developing an application for
>Linux because of this feature, it will benefit Linux much more
>than a better benchmarks with no applications to use.

For *application* level STREAMS, most people mean TLI/XTI and it's
been said numerous times that library support for those (and even
binary compatibility support for things like iBCS) are either
available now or are going to be available. The fact that the
underlying kernel implementation is a socket API doesn't matter to
the application or the user (except it's faster ;-). Even Digital
UNIX implements some of the standard STREAMS modules with empty stubs.

>Let me sum things up: Linux has not conquered the world yet,
>but keep up a great job, guys.
>
>Alex Krimkevich.
>
>P.S. I just wanted to re-instill the sense of reality on this
>mailing list :-)

I wanted to add a few facts of my own. I really like both Digital UNIX
and Linux so, with any luck, some of this extra information may help
some people make their own informed choices.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Oxford University Computing Services
"Widget. It's got a widget. A lovely widget. A widget it has got." --Jack Dee