Re: Remote fork() and Parallel Programming

Alan Cox (alan@lxorguk.ukuu.org.uk)
Sat, 13 Jun 1998 03:29:18 +0100 (BST)


> If we want to implement checkpoint/restore for the common case, then
> in my opinion the only useable way of doing this is to make applications
> that know how to create/restore snapshots of themselves. Which again
> produces interesting security issues - which luckily aren't of interest
> for most apps but still want to be solved.

For a lot of applications this issue is actually fairly trivial to the
application. The secret is to push the problem into a transaction server
within or outside of a database. Then the application loop just adds

while(x=next_job())
{
if(transaction_completed(x))
continue;
if(transaction_began(x))
rollback(x)
do(x)
}

What this actually amounts to really is pushing all the really hard stuff
into the transaction server - cos you gotta replicate that too[1]. It makes
a very nice, but sometimes a bit heavyweight, approach to help end users
of such systems, and best of all its all userspace and libraries.

Of course someone has to write the replicated transaction server but there
is only one of them and if its right the apps should be ok.

Alan
[1] If you are serious about the fault tolerant side you can rule out shared
disk solutions, shared anything. There are no real shortcuts on this part
of the game

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu