Re: Oops assist...

David Weinehall (tao@acc.umu.se)
Thu, 6 May 1999 08:30:20 +0200 (MET DST)


On Thu, 6 May 1999, Manfred Spraul wrote:

> Philipp Rumpf wrote:
> >
> > > i.e. the Oops is dumped to the first (after the header) sector of the
> > > swap file. The computer is restarted automatically. I don't
> > > think that it a good idea to continue after an Oops:
> > > if you kill a kernel thread, then you have memory corruptions,
> > > lost spinlocks, lost semaphores: the computer will crash in a few
> > > seconds anyway.
> >
> > *Bzzzt*. It is actually good we do not die on oopses. they are a convenient
> > way to test drivers, and the most common oops cause (dereferenced NULL pointer)
> > does not cause memory corruption.
>
> This oops does not cause a memory corruption, it is often caused by a
> previous
> memory corruption...
>
> I don't agree because:
> a) spinlocks & semaphores are still a big problem. This problem gets
> bigger
> as we further deserialize the kernel.
> b) if you test drivers, it would be possible to use the old behaviour:
> the problem is that if you have a oops on a production machine, then
> you don't know what happened, i.e. internal ext2 structures could be
> corrupted, and if you don't restart immediately, then you risk that
> the harddisk gets corrupted further, or wrong data could be sent over
> the network. Look at a journaling filesystem: it can recover quickly,
> because it knows that all incomplete writes are described in the log:
> if one sector write is lost, and the computer crashes 60 seconds later,
> then the fsck will not be able to detect the error by examining the log.

Wouldn't it be quite reasonable to make "Reboot on Oops" an config-option
or a sysctl?

/David Weinehall
_ _
// David Weinehall <tao@acc.umu.se> /> Northern lights wander \\
// Project MCA Linux hacker // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/