Re: Remote fork() and Parallel Programming

Perry Harrington (pedward@sun4.apsoft.com)
Thu, 11 Jun 1998 22:02:06 -0700 (PDT)


>
> > Principle 2: Research. How do you expect not to completely fsck something
> > up if you live in a box? Don't just make up stuff, research what
> > people have done prior and determine what the heck is going on.
>
> Ok given #2 explain the questions below
>
> > First, checkpoint/restart, does anyone else support it? If they do, how? More
> > importantly, is it an integral architectural feature? How much work would have to go
> > into Linux to support such things as they are?
>
> See #2

I was suggesting this as an obvious question that *people* should ask.
I'm not asking it myself.

>
> > What about an abstraction layer? You could have processes run within an environment
>
> See #2

This is more of the hypothetical. I don't pretend to know how other
OSes do it. I'm simply trying to probe some thought by the people
that care; given what I've read about and hacked OSes, I would say
that an abstraction layer is neccessary, where exactly that boundary is,
is the hard part. More importantly, Linux wan't written to do this,
and do you want to put full checkpointing in the kernel or change
some semantics and implement the rest as a userland or loadable module
layer.

>
> > We all can agree that you cannot checkpoint "sockets" for later use, therefore their
>
> No we cant - see #2

Ok, let me kick into "spock" mode: Can we agree that the following
scenario is impractical:

server process sshd gets checkpointed. client process ssh is connected
via TCP. sshd proccess is restarted 29 hours later. You can only checkpoint
a TCP connection for the timeout interval, after that the connection
is deemed dead and the connection to the client is nonexistant.

UDP can be checkpointed, IP can be checkpointed, ICMP can be checkpointed,
TCP server bindings w/o open connections can be checkpointed, et al. As
Larry pointed out, stateful protocols are difficult.

I was just pointing out that anyone who expects Linux to provide: kill
network process with 129 open TCP connections now, throw it on a DAT and
"replay" the thing 6 months from now is barking up the wrong tree.
Some things just will not happen due to the inherent incompatibilities.

>
> Its all in the literature and OS manuals ;)
>

Yeah, and my point is that people who WANT to implement this should read
those books.

I'm not interested in implementing checkpointing, I'm not going to
make authoritative statements about how to do it and the semantics.

End of discussion, I'm shutting up now.

--Perry

-- 
Perry Harrington       Linux rules all OSes.    APSoft      ()
email: perry@apsoft.com 			Think Blue. /\

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu