Re: kernel structures in 2.0.29->2.0.30

Theodore Y. Ts'o (tytso@MIT.EDU)
Fri, 25 Apr 1997 14:53:30 -0400

After reading the various flames and not-so-flamacious messages fly back and
forth, I have to conclude that the real problem is that people have
differring definitions of "stable". Stable can refer to at least one of
the following things:

* Not changing. There should be very few releases of the 2.0
"stable" series.
* Bug free. The latest 2.0 kernel should be as bug-free as
possible; the assumption is that people using the 2.0
kernel want to get work done, not mess with the latest
* Stable interfaces. This can refer to two things:
- The user-mode API --- system calls and libc
- The interfaces for kernel modules, with either or both of:
- source-level compatibility
- binary-level compatibility

Different people on this list have had fundamentally different
expectations of what the 2.0 "stable" release should have.

Although a few people have claimed that there simply shouldn't have been
as many releases of 2.0 as there have been, that's pretty silly. There
will always be bugs in any software package, and as you find them, if
you can make a low-risk fix to remove a bug, you should do so. Users
might not bother taking the release if the bug happens not to affect
them, but there should be no risk in taking the new release.

Some people might think that Linux is unusual for have so many patches,
but the main difference is that we're up-front about them. For example,
Microsoft will quitely make changes in what their Windows 95 or Windows
NT product, so that very few people will realize that the Windows 95
they buy in 1997 may have quite a few changes from the Windows 95 that
they bought in 1995. Marketing is everything. As another example, the
version of Solaris 2.4 that I'm running has 68 patches, including some
"jumbo patches" where a "jumbo patch" might obsolete a hundred or more
smaller patches. (On a Solaris machine, type "showrev -p" and see what
you get!)

As far as stable interfaces are concerned, the main problem is that the
goal that Linus and the other 2.0 kernel developers have apparently had
has been:

* Binary-level compatibility for user-mode programs
* Source-level compatibility for kernel modules
(with an attempt to preserve binary-level compatibility,
but not a strong one)

This has caused people like Derek who distribute a binary-only module
(for whatever reason --- in his case, he doesn't have a choice, since
Transarc makes the code propietary) --- to complain, because they would
prefer to have binary-level compatibility.

The real issue here is that unlike the syscall interface, where extreme
pains are made to keep that interface stable and always backwards
compatible, we don't have that ability now at the kernel module
interface. There are a number of reasons why we haven't done this; some
if is simply that this would require work that we haven't found people
willing to do. The other is that there's a cost to putting in a strong
abstraction layer, and that's performance.

There are a number of things that we could try to do so that in the
future, the kernel module interface is a bit more stable and backwards
compatible; however, people should realize that there are no easy
answers. Fundamentally, the 2.0 kernel has to be useful, and for many
ISP's, SYN attacks were become so frequent that a Linux kernel that
didn't have ways of resisting SYN attacks simply weren't useful.

I also think that Derek has significantly overstated his case in terms
of how much effort it takes to build a new libafs module. It really
isn't all that hard; and if doesn't have the time to do it, perhaps
what's need is some more people at MIT to help build new libafs modules
as necessary. After all, the changes to the 2.0 kernels are such that
most of the time a recompile was all that was necessary, and that's not
really that much extra work.

As far as the suggestions of having three (!) kernel streams, the main
disadvantage of doing this is the work involved in putting out quality
releases on yet another kernel series. It also really confuses people
in terms of what kernel they should run, and it will confuse application
programmers when they are trying to figure out which system they should
test against. I'm not really convinced that it solves the problem;
rather it appears to add another rug where you can sweep the dust

- Ted