Re: Some very thought-provoking ideas about OS architecture.

shapj@us.ibm.com
Mon, 21 Jun 1999 12:50:50 -0400


Remi writes:

>I missed a few mails, so don't shoot me if this has already been said
>(or is completely stupid), but I can imagine the following situation:
> - An error happens (i.e. a mm bug in EROS)
> - EROS doesn't discover it and writes VM to harddisk.
> - A few minutes later the system crashes do to the mem-corruption.
> - After booting, the mem-corruption has already happened, so it will
> crash again.

Remi is certainly correct that bad state can be checkpointed. The scenario he
suggests can't happen, however.

Only a very limited amount of kernel state (the running thread list) is included
in the checkpoint, thus, a *kernel* memory error is not reinstated after a
restart.

Second, prior to a checkpoint a consistency pass is made across the system to
verify that all of the likely pointers make sense and that all modified objects
have actually been marked writable.

Third, the kernel implements a very limited number of data structures. This
facilitates the consistency check.

Finally, the kernel is not impacted by errors in user memory state. At worst
(and with low likelihood), the error will cause corruption of other user state.
In practice, we have not observed this to occur.

In practice, outside of early development kernels or periods when we have been
messing with the checkpoint logic, we have never observed an unrecoverable state
to go to the disk.

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595

Remi Turk <coder@a2zis.com> on 06/21/99 10:33:17 AM

To: lkml <linux-kernel@vger.rutgers.edu>
cc: (bcc: Jonathan S Shapiro/Watson/IBM)
Subject: Re: Some very thought-provoking ideas about OS architecture.

Hello,
I missed a few mails, so don't shoot me if this has already been said
(or is completely stupid), but I can imagine the following situation:
- An error happens (i.e. a mm bug in EROS)
- EROS doesn't discover it and writes VM to harddisk.
- A few minutes later the system crashes do to the mem-corruption.
- After booting, the mem-corruption has already happened, so it will
crash again.

--
Advantages of Windows NT over Linux:
 * It's easier to explain the crash was not your fault.
 * You've to remember only one solution all to your problems: Reboot.
 * You can use your mouse to type an email-message.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/