Re: Oops assist...

Manfred Spraul (manfreds@colorfullife.com)
Thu, 06 May 1999 13:42:52 +0200


Philipp Rumpf wrote:
>
> > i.e. the Oops is dumped to the first (after the header) sector of the
> > swap file. The computer is restarted automatically. I don't
> > think that it a good idea to continue after an Oops:
> > if you kill a kernel thread, then you have memory corruptions,
> > lost spinlocks, lost semaphores: the computer will crash in a few
> > seconds anyway.
>
> *Bzzzt*. It is actually good we do not die on oopses. they are a convenient
> way to test drivers, and the most common oops cause (dereferenced NULL pointer)
> does not cause memory corruption.

This oops does not cause a memory corruption, it is often caused by a
previous
memory corruption...

I don't agree because:
a) spinlocks & semaphores are still a big problem. This problem gets
bigger
as we further deserialize the kernel.
b) if you test drivers, it would be possible to use the old behaviour:
the problem is that if you have a oops on a production machine, then
you don't know what happened, i.e. internal ext2 structures could be
corrupted, and if you don't restart immediately, then you risk that
the harddisk gets corrupted further, or wrong data could be sent over
the network. Look at a journaling filesystem: it can recover quickly,
because it knows that all incomplete writes are described in the log:
if one sector write is lost, and the computer crashes 60 seconds later,
then the fsck will not be able to detect the error by examining the log.

--
	Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/