Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Gene Cooperman
Date: Thu Nov 04 2010 - 12:33:41 EST


Yes, we are working with Condor to have them validate DMTCP. Time will tell.
- Gene

On Thu, Nov 04, 2010 at 08:36:16AM +0100, Tejun Heo wrote:
> Hello,
>
> On 11/04/2010 02:47 AM, Nathan Lynch wrote:
> >> In this case whitelisting the allowed
> >> state by requiring special APIs for all I/O (or even just standard
> >> APIs as long as they are supposed by the C/R lib you're linked against)
> >> is the more pragmatic, and I think faithful aproach.
> >
> > I don't think users will go for it. They'll continue to use dodgy
> > out-of-tree kernel modules and/or LD_PRELOAD hacks instead of porting
> > their applications to a new library. I think a C/R library is an
> > "ideal" solution, but it's one that nobody would use - especially in
> > HPC, unless the library somehow provides better performance.
>
> I hear that there are plans to integrate one of the userland
> snapshotting implementations with HPC workload manager. ISTR the
> combination to be condor + dmtcp but not sure. I think things like
> that make a lot of sense. Scientists writing programs for HPC
> clusters already work in given frameworks and what those applications
> do and how to recover are pretty well confined/defined. If you
> integrate snapshotting with such frameworks, it becomes pretty easy
> for both the admins and users.
>
> I'll talk about other issues in the reply to Oren's email.
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/