Re: Remote fork() and Parallel Programming

Perry Harrington (pedward@sun4.apsoft.com)
Wed, 10 Jun 1998 23:05:24 -0700 (PDT)


It seems that a lot of people have completely overlooked some obvious things.

Principle 1: Simplicity leads to elegance.

Principle 2: Research. How do you expect not to completely fsck something
up if you live in a box? Don't just make up stuff, research what
people have done prior and determine what the heck is going on.

Principle 3: Don't argue with people that can help you. You wouldn't very
well argue how to write a kernel with Linus if you hadn't even read
a book on OS design! Those people that *seem* like they are pains
in the arse are really your most valuable resource, they've been
there and done that. Show a little respect; you wouldn't argue the hell
out of this in person or on the phone would you?

Principle 4: Accept that your wrong. If you constantly believe that every
decision you make is right, you are deluded. Step outside of your
argument for a minute and think about what your saying, then think
about what they're saying.

Principle 5: Be humble. Try to present ideas in a concise and unbiased manner.
If you start spouting stuff about distros, people will flame you. If
you bring up a feature of a distro and compare it to others, with a
complete, concise message, people will be less hostile. Don't be
making statements about architecture when you are admittedly unfamiliar.
Leave it at the suggestion stage, don't harp on it.

Okay, now that I've said my peace about this rediculous argument, I'd like to
point out some important things.

First, checkpoint/restart, does anyone else support it? If they do, how? More
importantly, is it an integral architectural feature? How much work would have to go
into Linux to support such things as they are?

What about an abstraction layer? You could have processes run within an environment
that controls their interaction with the world. The kernel has *no* smarts; it's
just a big state machine. You need a wrapper that can freeze processes and handle
the resources to transfer it to another machine.

We all can agree that you cannot checkpoint "sockets" for later use, therefore their
usefulness is limited. However, you could concoct a file descriptor passing
mechanism for a cluster. You could pass all of the socket meta info to another machine,
the socket has no *real* persistent connection, it's implemented on top of a
datagram protocol (IP).

Files: You need to develop a locking semantic for moving checkpointed processes to another
machine. Either that or you need a sophisticated realtime version control
mechanism for syncing the files. File descriptors can be abstracted in the emulator
layer and managed transparently.

Shared memory and mmap: You would need to checkpoint the process group or disallow
migration for these processes; this can be a ratio decision: if the number of attaches
or maps is > expense on this processor, don't checkpoint.

The bottom line is that you will *not* be able to implement checkpointing with a
process that has no knowledge of it. You cannot be blind in userspace; it works
just like threads, they're an extension that doesn't "magically" speed up your
process, you must be aware of them. To rephrase that: libc would need to know
these things if the functionality was available; libc would be the userland
counterpart of checkpointing/restart/realtime migration.

Larry has some good points (I spoke with him privately) and is on the right
track to get things like this supported under Linux. You have to remember
that Linux was not written to do this from day one, it's going to mean a lot
of achitecture change to do it right.

Ultimately the question is: Does Linus want Linux to become this or should it focus
on one arena and do it WELL?

This is why I don't particularly like Perl, it tries to be too many things to
too many people; it's lost it's focus on doing one thing well.

--Perry

-- 
Perry Harrington       Linux rules all OSes.    APSoft      ()
email: perry@apsoft.com 			Think Blue. /\

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu