Re: Why Linux is doomed (was: Re: FENRIS (nwfs) 1.4.2 Source Code

Jeremy Katz (katzj@linuxpower.org)
Wed, 23 Jun 1999 12:36:40 -0400 (EDT)


Why, oh why am I letting myself be pulled into this?

On 23 Jun 1999 leitner@convergence.de wrote:
[snip stuff on reiserfs]
> Wait, did I say well-publicized kernel API? The kernel API is not just
> not well-publicized, it changes all the time! Hell, how can it be that
> a supposedly stable kernel release does not even _compile_ without
> syntax error on non-x86 platforms?! This is not only embarassing, it is
> much more unprofessional than Windows NT needing BIOS support to install
> on Alpha.

As to the Alpha, I look in /usr/src/linux/MAINTAINERS and I see no one
listed for Alpha. Meaning, there is no one point person to feed changes
to Linus. This is not true for a lot of other archs (I checked Sparc and
Mips and m68k). Now, this doesn't mean that these archs are necessarily
synced. Linus runs an x86. Therefore, this is what he tests on. He
takes arch specific changes and if they are trivial enough, he will port
them to the other architectures. But, to be honest, most of the time
these changes require a better knowledge of what goes on with that
platform. One person can't know everything (even if they are Linus who is
getting close to that God status :)

> Let me reiterate some things that happened to me recently:
>
> 1. I have an Alpha here. The vendor ships it with Redhat 5.2, which
> IMHO stinks. It has an old library, an old egcs, ...
> But my vendor (Quant-X, btw., whom I can not praise enough in
> public) installed a working 2.2.5 for me.
> 2.2.7 came out, I untarred the distribution, it did not compile. I
> had to fix some trivial things that _anyone_ must have seen who
> tried to compile the kernel, like an include file that was not
> included somewhere. I cannot understand how a kernel like this can
> even be put on an FTP site without even trying to compile it on all
> platforms.
> Anyway, 2.2.7 compiled with minor massaging, but randomly hung at
> boot time with a "spinlock stuck" message. I tried to get rid of
> that and tried to install 2.2.8. It would not compile. 2.2.9. It
> would not compile. I then switched to 2.3.6-ac1 that actually
> compiled.

2.2.8 was a bad release. Granted. But, with these compile errors, how
can they be fixed unless they are given on linux-kernel. I don't think
that people can read your mind.

> 2. I have a laptop. pcmcia is a vital kernel part for laptop users.
> And frankly, I cannot understand why this is not part of the
> standard kernel. I marvel at the quick turnaround time of the
> pcmcia guy at stanford who manages to adapt to kernel problems
> within hours after the new kernel comes out. Anyway, once it
> didn't work. Again, the semantics of something changed.

I'm not going to go into this one. Check the recent (and reasonably long)
thread on the subject. But with the wait_queue changes, they weren't just
a problem for pcmcia, a TON of drivers had a problem. That's why it was
done in 2.3.x. The minor version is odd, meaning it is developmental.
Let me quote from /usr/src/linux/README
Linux version 2.3 is a DEVELOPMENT kernel, and not intended for
general public use. Different releases may have various and
sometimes severe bugs.
So, you have been warned.

> 3. ALSA. ALSA is a must for me at home. The kernel sound driver for
> my 1370 not only clicks and pops all the time, it can also suddenly
> die off completely. So, I need ALSA. The ALSA guys recently did a
> new release that also works with the new semaphore stuff, but
> before that release came out, I had to fix things manually. So I
> looked at the kernel diffs but the resulting ALSA still crashed.
> For some enlightenment about how painful their compatibility layers
> have become so that ALSA compiles on many kernel versions, please
> look into the header files of the alsa-driver package. I can
> almost feel their pain when I see those macros.

The plan is for ALSA to become the sound system over the course of 2.3.x.
According to the alsa mailing list, Jaraslov is starting to work on that
now for the merge with Alan who will then upstream it to Linus.

> 4. FAT. 2.3.7 does not compile with FAT file system. Excuse me?
> This is the single most frequently used file system besides ext2,
> and 2.3.7 does not compile with it? Because of a syntax error?!

Would you rather the syntax error be fixed and then have fat cause
filesystem corruption? The page cache code changed majorly in 2.3.7 (and
with warning from Linus in the announcement and after 10 pre-patches).
Even in 2.3.8, I am still getting some file corruption when messing with
large files. Once this gets worked out, the other file systems will be
fixed. Read my quote above from /usr/src/linux/README

> 5. Modules. I get unresolved symbols every other kernel. Does nobody
> try to compile everything in once and everything as module once
> before releasing a kernel? Yes, that is work. But how can you
> tell Ziff Davis that Linux is a professional enterprise-ready
> operating system if blunders like these happen?

No, Linus does not. The main testing is done by the people who run it.
Which is why pre-patches exist (see the thread Linux versioning scheme).
I understand that this may be iffy running on a production system, so have
a devel machine that closely mirrors a production system. Come up with a
way to test against it which simulates the actual load experienced by your
servers to see if you have any problems.

> Management summary: stuff like this sucks. I am but a programmer with a
> SMP box that likes to run the latest kernel. And yes, I expect all the
> kernels to compile out of the box. I don't think that this is too much
> to ask.

Then don't change .config. Linus tests the compile with his .config and
then is nice enough to even include it for you :) Seriously though, 2.2.X
is pretty stable on x86 (I didn't experience any of this weird stuff that
some people have and which is not being easily reproducible in 2.2.9/10).
But, this is what open development is all about. Linux developers don't
have huge test labs of machines that they test everything on before a
release (like some os developers) and it runs on a HUGE range of hardware,
not a small set like some other operating systems.

> Thus, I propose these changes:
>
> 1. No More Function Renaming.
> People are actually using your test kernels.
> Yes, you warned them, but what does that help the user?
> I, for example, have to use a 2.3 kernel, because the 2.2 kernel
> just freezes when I NFS-mount my laptop.

Why does it freeze? Do you have an oops? Have you reported this to l-k.
A bug fix is not a reason to use 2.3.x; it should be backported (and
figured out what actually fixed it if this is not known). The function
renaming in some cases is a "good thing" because it prevents greater
breakage by something compiling and therefore being in the "should work
fine" category.

> 2. No more incompatible changes.
> The whole idea with the VFS was that you could develop a file
> system modularly. If I wrote a file system (and I thought about
> this for a while), I would be royally pissed if I had to spend my
> development time working around changes in the kernel.
> Of course, there is the necessity to change things from time to
> time. But if you rename(!) a function in the kernel, is it too
> much to ask to do a global search-and-replace in the rest of the
> kernel?

See my comments above

> 3. Create some decent abstractions.
> You can change all the semaphore stuff you want when you give the
> programmers some macros to create a consistent, non-changing
> interface.

This has some merit, but it also leads to possibility of breakage for the
exact same reasons stated above.

> 4. When you do a change in the kernel, DOCUMENT IT.
> Let's create a new directory, Documentation/incompatible-changes.
> There you just create a new file for each change that says "if you
> get an undeclared symbol foo_bar then edit this file and rename it
> in your code to bar_foo". Then a driver developer can just grep
> there and be done. You don't actually expect a driver developer to
> read all the kernel patches and see if the same problem hit someone
> else before, do you?

Okay, I'll agree that some sort of ChangeLog like that wouldn't be a
horrible thing to have. But, the question gets into what changes need to
be documented there? And that's a whole other barrel of questions.

Jeremy

-- 
Jeremy Katz
http://linuxpower.org

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/