Re: Advanced Linux Kernel/Enterprise Linux Kernel

From: Lars Marowsky-Bree (lmb@suse.de)
Date: Tue Nov 14 2000 - 02:49:49 EST


On 2000-11-13T13:56:16,
   Josue Emmanuel Amaro <Josue.Amaro@oracle.com> said:

Good morning Josue,

I hope your certification matrix hasn't driven you mad yet ;-)

> While I do not think it would be productive to enter a discussion whether
> there is a need to fork the kernel to add features that would be beneficial
> to mission/business critical applications, I am curious as to what are the
> features that people consider important to have.

This is in fact the valuable subpart of the discussion.

Working for SuSE on High Availability, especially in the "enterprise" segment:
Here, referring to systems running databases (mostly Oracle, surprise),
ERP-Systems, but also providing services (NFS, Samba, firewalls) in such an
environment.

I personally need features which allow me to keep on running, shut down as
gracefully as possible if an error occurs, and if an error occured, diagnose
it out in the field.

This means: ECC memory, hotpluggable everything, proper error handling and
reporting in the kernel. Yes, christmas and easter do occur on the same day in
the real world, unfortunately.

This can best be summarised as "robustness".

If an error occured, I need to be able to fully diagnose it without having to
reproduce it - no, I do not wish to reproduce the error by crashing my
critical server on purpose, nor is "The error appears to have gone away, we
have no clue what it was" an acceptable answer. (kdb, LKCD, Oopsing to the
network etc: And they must be part of the default kernel as far as possible,
so they stay in sync and get widespread testing)

But also scalability: 2TB is a problem for me in some cases, 32bit just don't
cut it all the time - but I need to circumvent the storage problem even on a
32bit system. And adding disks to the system while running is desireable.

Cluster awareness, again mostly referring to storage: Yes, there is more than
one system accessing my SCSI bus, my FCAL RAID, and the error handling should
be architected in a way that they do not start reset wars.

The LVM should safeguard against multiple nodes changing the metadata. (Ok,
this can be solved in userspace too) LVM must be transactional, so a crash on
a node doesn't corrupt the data.

Basically, the talks in Miami (The Second Annual Linux Storage Management
Workshop) gave a great overview of everything I need.

And: I need all of this as Open Source. Period. No binary kernel modules do me
any good and I will pointedly ignore them.

Oh, and by the way - if any hot kernel hacker, not yet working on this full
time feels inspired to make this happen, contact me. Or any other Linux
company, as long as the job gets done. We'll be glad to make you a fulltime
kernel slave^Whacker! ;-)

> Another problem is how people define Enterprise Systems. Many base it on the
> definitions that go back to S390 systems, others in the context of the 24/7
> nature of the internet. That would also be a healthy discussion to have.
           _
24/7 * 99.99% mission/business critical services with "medium to high" load.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>
    Development HA

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Nov 15 2000 - 21:00:25 EST