Re: Kernel-level checkpointing

From: Lars Marowsky-Bree (lmb@suse.de)
Date: Sun May 14 2000 - 16:22:24 EST


On 2000-05-14T13:54:21,
   Rayson Ho <ut_bookstore@yahoo.com> said:

> I want to develop a user-level application for
> fault-tolerance servers. Can someone tell me where I
> can get information about the kernel-level
> checkpointing (i.e., to write the image and state of a
> process to disk so that another computer can re-run
> that process)??
>
> APIs, kernel source, project URLs, etc would be very
> useful.

This hasn't been developed yet. All the solutions I have seen so far implement
that in the application, which saves its state in regular intervals. I assume
this will be the most efficient solution in any case.

A kludge I am toying with in my mind would be to take entire system snapshots
("suspend to swap") and restart those on another machine when one fails.

User Mode Linux may also be helpful here.

If you are looking for a serious, not-as-crazy starting point though, you may
want to look at MOSIX - for the process migration, they are facing similiar
issues.

Now, adding generic process check pointing combined with MOSIX, that would be
an awesome HA HPC cluster framework...

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>
    Development HA

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 15 2000 - 21:00:24 EST