Re: oops on SUSE LES9-SP2-smp on dual EM64T processor system

From: Jesper Juhl
Date: Fri Oct 21 2005 - 19:07:51 EST


On 10/22/05, Emmett Lazich <elazich@xxxxxxxxxxxxxxxx> wrote:
> Thanks for your thoughts Jesper. Things are now in progress with
[snip]
>
> I also thought the age of the kernel version in SUSE Enterprise 9 was a bit
> old. I assume that SuSE maintain their own branch of the 2.6 kernel for E9
> and they will port their fixes into the mainstream kernel at some time
> before they pick a newer kernel as the base source for their next major
> release. A lot of work, but I guess they need control in order to offer paid
> support.
>
I don't run SuSE nor use their kernels, so I can't really say anything
about that, but I assume you are right that they maintain their own
tree and backport fixes.


> Can you explain for me: With these kernel opps messages, we can see
> register contents and a stack call trace, but it does not (seem) to state
> what actually went wrong.

My Oops reading skills are not the best you can find. Often you can
tell from the oops if it was a null pointer deref or something else
equally obvious - from your oops message I'm only able to see the call
chain and what function eventually caused the oops - you'll neem
someone more skilled than me to actually decipher what exactely went
wrong - sorry.


>Am I right? If yes, then how can someone resolve
> the fault - particularly after the machine had to be rebooted?
>
I'm pretty sure someone more knowledgable than me can tell more
specifically what went wrong based on that oops.


> As you might guess, I am a little bit nervous about ever getting a fix for
> this one. Believe me I considered running the newest generic kernel on this
> machine. I was going to use /proc/config.gz then compile from source. But

That's definately what I'd do.
SuSE/Novell support aside, reproducing the oops with a recent kernel
(taking /proc/config.gz as a base for "make oldconfig") will surely
result in much better response from LKML since very few people care
about problems with as old a kernel as 2.6.5 - but if the bug is
reproducable with the latest stable kernel or the most recent -git
snapshot (or similar) then it's sure to raise eyebrows and get some
attention - and in the end such a bug report (and eventual fix) helps
all of us.


> knowing how SuSE 9E holds together I figured I might be asking for trouble,
> so I shall try it on the model (dev/test) machine first. Even if it works,
> we will then have no support from Novell.

But at least you'll have a working kernel.

Be aware that you may need to upgrade other stuff besides the kernel.
Things change and stuff regularly need updating to keep up. Take a
look at Documentation/Changes in a recent kernel source (the one you
build) to see the minimum requirements for the most basic tools. A new
kernel with out-of-date, known to not work, tools won't win you any
bug reporting prices.


>So I do not know what to think.
> Management as usual need to pay someone for support and to take
> responsibility. But probably get better stability with "community support".
>
I'd say
a) try the most recent stable kernel.org kernel, see what results you get
b) see what answer Novell/Suse come up with
Then deside what to do.

Son't be a slave to "only doing it the vendors way" - if the most
recent kernel.org kernel solves the problem but Novell/Suse do not,
then personally I know what I'd use.
If Novell/Suse come up with a fix that makes your support contract
still be valid, then that's probably what you want...

In the end it's your call.


--
Jesper Juhl <jesper.juhl@xxxxxxxxx>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/