Re: C/R without "leaks" (was: Re: Creating tasks on restart:userspace vs kernel)

From: Greg Kurz
Date: Wed Apr 15 2009 - 18:43:20 EST


On Wed, 2009-04-15 at 23:56 +0400, Alexey Dobriyan wrote:
> > Again, so to checkpoint one task in the topmost pid-ns you need to
> > checkpoint (if at all possible) the entire system ?!
>
> One more argument to not allow "leaks" and checkpoint whole container,
> no ifs, buts and woulditbenices.
>
> Just to clarify, C/R with "leak" is for example when process has separate
> pidns, but shares, for example, netns with other process not involved in
> checkpoint.
>
> If you allow this, you lose one important property of checkpoint part,
> namely, almost everything is frozen. Losing this property means suddenly
> much more stuff is alive during dump and you has to account to more stuff
> when checkpointing. You effectively checkpointing on live data structures
> and there is no guarantee you'll get it right.
>
> Example 1: utsns is shared with the rest of the world.
>
> utsns content is modifiable only by tasks (current->nsproxy->uts_ns).
> Consequently, someone can modify utsns content while you're dumping it
> if you allow "leaks".
>
> Did you take precautions? Where?
>
> static int cr_write_utsns(struct cr_ctx *ctx, struct uts_namespace *uts_ns)
> {
> struct cr_hdr h;
> struct cr_hdr_utsns *hh;
> int domainname_len;
> int nodename_len;
> int ret;
>
> h.type = CR_HDR_UTSNS;
> h.len = sizeof(*hh);
>
> hh = cr_hbuf_get(ctx, sizeof(*hh));
> if (!hh)
> return -ENOMEM;
>
> nodename_len = strlen(uts_ns->name.nodename) + 1;
> domainname_len = strlen(uts_ns->name.domainname) + 1;
>
> hh->nodename_len = nodename_len;
> hh->domainname_len = domainname_len;
>
> ret = cr_write_obj(ctx, &h, hh);
> cr_hbuf_put(ctx, sizeof(*hh));
> if (ret < 0)
> return ret;
>
> ret = cr_write_string(ctx, uts_ns->name.nodename, nodename_len);
> if (ret < 0)
> return ret;
>
> ret = cr_write_string(ctx, uts_ns->name.domainname, domainname_len);
> return ret;
> }
>
> You should take uts_sem.
>
>
> Example 2: ipcns is shared with the rest of the world
>
> Consequently, shm segment is visible outside and live. Someone already
> shmatted to it. What will end up in shm segment content? Anything.
>
> You should check struct file refcount or something and disable attaching
> while dumping or something.
>
>
> Moral: Every time you do dump on something live you get complications.
> Every single time.
>
>
> There are sockets and live netns as the most complex example. I'm not
> prepared to describe it exactly, but people wishing to do C/R with
> "leaks" should be very careful with their wishes.

They should close their sockets before checkpoint and find/have some way
to reconnect after. This implies some kind of C/R awareness in the code
to be checkpointed.

--
Gregory Kurz gkurz@xxxxxxxxxx
Software Engineer @ IBM/Meiosys http://www.ibm.com
Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
Alan Moore.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/