>> We can already do things (generic unix'y speaking) like dump a
>> complete core image of ram onto a disk when we punt, and we have the
>> technology for multiple initiator SCSI configuarations and to make
>> that work.
>Messy. If you are doing this kind of shit for real (and I mean real as in
>the 'we have 300 seconds to get back on our feet or the furnace is
>6000ft into the upper atmosphere' type real) then you are taking snapshots
>of both the program and data onto remote machines. (Thats quite easy to do
>with a log based fs). You are also making synchronization points and each
>of these you mark the file log as 'believed coherent' so that a crash
>won't leave junk on the files to fool a restart.

Irix 6.4 has two cool features:
(from a doc I got)
High Availability
Allow two machines to act as backups for each other. If the primary system
fails, the backup will take over the primary system's IP address. In
addition, filesystems mounted on the primary system will be switched over to
the backup system in the event of a failure. IRIS failsafe also has
the ability to restart certain applications by executing supplied or
site-specific shell scripts, with additional options to provide Netscape and
NFS failover capability.

Checkpoint Restart
SGI Checkpoint Restart is a set of user transparent software management
tools that allow designated system administrators, operators, and users to
suspend one or more jos in mid-execution and restart them later. The jobs
may be running on a single machine or an array of network-connected

Of course, you guessed it, this is marketing talk, but the concepts are
there. Having similar capabilities in Linux would be really cool, but then,
I also understand that it's less than trivial to setup and that there are
other priorities.
Eh, one can always dream :-)

