Re: Core dumps & restarting

Riccardo Facchetti (fizban@mbox.vol.it)
Tue, 29 Oct 1996 10:42:24 +0100 (MET)


On Mon, 28 Oct 1996 lists-nicholas@binary9.net wrote:

>
> #
> # > What about the system save-state that SCO uses to recover from UPS initiated
> # > shutdown? They write out the machine state into the swap partition.
> # > When you restart your system, it picks up right where it left off. Nice.
> # >
> # > I have run an application which used over 300Mb virtual memory - wonder
> # > what the SCO box would have done with that beast <distributed simulation>.
> #
> # The big problem with freezing processes or machine state and restoring
> # it later is that the context gets partially lost like non-local network
>
> The really big problem is saving and restoring the state of hardware.
>

Yes. This is the main problem.
Software status can be, more or less, saved in a safe way. You have to
dump the process image, eventually asking the memory manager to save
description and data of every memory chunk the process owns. When
reloading (rerunning, resurrecting) the process, you have just to ask the
memory manager all and every piece of chunk you have to own to restore the
exact state of the process. Of course you have to dump in a position
independance way, so that the loader of the dump can make the correct
fixups to the code and data chunks. You have to save the context too,
because you have to restore all the CPU registers as well as the memory
dump. The memory dump without the context is useless. Anyway I think the
only way to do something like this is to have some kind of helper code
into the kernel, that can provide other informations such as inodes of
all the open files and position of the file handle into the file. The
operations of re-opening the file and setting the correct values of all
the file handles, can be done by the "resurrector" that will sys_clone()
to restart the old dumped process.

For the state of the hardware, I think there may be some real problems
only when the process to be reloaded have modified the state of the
hardware before the memory dump. I think this is a problem only on the
program side, because on the kernel side, if the kernel modifies the
hardware it do it usually at initialization (at kernel boot time).
If the program have modified the hardware registers of, say, the video
card, how can you force the reloaded process to return to the previous
initialization code that need to be executed before trying to use the
video hardware ? Of course the process will not do it and will be sure
enought that it have alredy initialized the hardware, that you are very
likely to have some kind of crash.
An idea on how to resolve the hardware problem may be to track all the
hardware registers modified by the user program (more, ?? libc ??, helper
functions) and save the hardware state with all the other informations.
The "resurrector" can then restore the hardware state before clone()ing.

The idea of dumping/reloading a running process is really appealing, but I
think Linus will never accept to have all this crap into "his" kernel :)

Oh ... and yes, you can do most of the information collection with helper
functions in user space (say libc) instead of kernel space, but some of
them should be kept in kernel space so the problem is the same. One or
many, this is appearing like crap code into the kernel :)

Ciao,
Riccardo.