Re: checkpoint/restart ABI

From: Eric W. Biederman
Date: Thu Aug 28 2008 - 19:44:36 EST


"Serge E. Hallyn" <serue@xxxxxxxxxx> writes:

> Quoting Peter Chubb (peterc@xxxxxxxxxxxxxxxxxx):

>> Beefing up ptrace or fixing /proc to be a real debugging interface
>> would be a start ... when you can get at *all* the info you need,
>
> Except we don't really want to export all the info you need for a
> complete restartable checkpoint. And especially not make it
> generally writable.

That and unless we get a lot of synergy from authors of debuggers
and debugging code it is a more general and slower interface for
no apparent gain.

> We have also started down that path using ptrace (see cryo, at
> git://git.sr71.net/~hallyn/cryodev.git).
>
> Right before the containers mini-summit, where the general agreement was
> that a complete in-kernel solution ought to be pursued, I had tried
> a restart using a binary format that read a checkpoint file and used
> cryo (userspace using ptrace) for the rest of the restart, only
> because there was no other reasonable way to set tsk->did_exec on
> restart.

Can we please describe this as the giant syscall approach. Instead
of a complete in-kernel solution. There are things like filesystems
that should be checkpointed separately, or not checkpointed at all.

However there is a large set of processes and process state that always
goes together and if you checkpoint a container you always want.

So building something that is roughly equivalent to a binfmt module
but that can save and restore multiple tasks with a single operation
looks like the right granularity.

>> Jeremy> Lightweight filesystem checkpointing, such as btrfs provides,
>> Jeremy> would seem like a powerful mechanism for handling a lot of the
>> Jeremy> filesystem state problems. It would have been useful when we
>> Jeremy> did this...
>>
>> And how! saving bits of files was very timeconsuming.
>
> Yes, we're looking forward to using btrfs' snapshots :)

Yep. And in the case of migration we don't even need to snapshot
a filesystem just mount it from on the target machine. Except for
the unlinked files challenge.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/