Re: Feature idea: restarting processes

Roger Espel Llima (espel@llaic.u-clermont1.fr)
Mon, 16 Feb 1998 15:14:27 +0100


Bradley Ward Allen wrote:
> On occasion, I'll want to save the state of a process for later use.
> This can happen if I'm running something important and I need to do
> something else with the system temporarily. Also, I may want to move the
> process to a different system.
>
> Other uses could be as a recovery method after some sort of system crash
> or problem or solution which requires a reboot.
>
> The main problem that comes to mind is all the absolute resources the
> kernel manages that may not be available when the process comes back.
> Two options come to mind off-hand: (1) just wait until that resource
> becomes available again, getting in line for that particular resource
> (say, a file descriptor # or what have you);

actually, a file descriptor # is not a problem in itself: by restoring
you\re creating a process so you can assign its fds like you want.

> (2) mapping the old resources
> to new resources. (2) seems a bit slow and prone to all sorts of problems.
> Perhaps a total system redesign would help, but that doesn't seem practical.
> However, (2) may be ok if it would also employ (1), connected the mapped
> resource to the unmapped resource when it comes available; things would
> slowly speed up as resource assignments shuffle.
>
> Other features would be to save a process and all children; to integrate
> this with the core dump code; and of course, you should have a way to restore
> this stuff.

This is more difficult than it appears; consider these cases:

. saved process had two ttys open; which ones do we open to restore it?
(can't really make it "the same ones", or you lose the ability to
restart a process over a telnet when you stopped it at the console).
In this case, you'd probably have to let userspace specify, making for
a complicated kernel interface.

. opened or mmaped files: you can save the path, inode number and
pointer position, but what do you do on restore if the file exists but
has another inode number? or if it doens't exist? (okay, fail)

. network connections: you can't save the state of a TCP connection,
because that would require the other side to do the same, and when you
restore it it might not be on the same IP. when saving the process,
you really have to close any open connections, so when you try to
restore it, you have a problem. (so fail if there's any network
connections... there go all X11 programs).

. shared memory. you can't save a process that has shared memory segments,
unless you save all the processes that share them, together (whether
it's thru SysVIPC or mmap).

. threads and processes: if the process has forked, or cloned, sometimes
you'll need to save them all, sometimes not. any process that manages
other processes will rely on their pids staying the same, so you can't
restore unless the pids are available.

there are probably even more issues than these...

still, it would be nice to be able to save and restore simple number
crunchers that don't use anything fancy, just stdin/out and files. so
why not have a go at it :)

-- 
Roger Espel Llima
espel@llaic.u-clermont1.fr, espel@unix.bigots.org
http://www.eleves.ens.fr:8080/home/espel/index.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu