It can be, but there's a substantial problem with mmap. You have to
make sure that things that are mmapped get remapped to the same place
when the process is restarted. The condor stuff does this by dumping
all mmapped segments for the checkpoint. These are then mapped back
in from the data dump when the process is restarted. Mmaps can occur
from malloc (depending on the implementation) but the bigger headache
is that shared libs are mmapped in, meaning that a) all shared libs
used by the process are dumped, and b) after restart, the process is
effectively statically linked since it's now using its own copies of
the shared libs.
It'd be nice if you could remap the original libs instead of dumping
them. That could substantially speed up dumping and restarting, and
improve system usage by the restarted binaries, especially if there
are lots of them. The Condor people argued against this, but their
argument only applies when you want to migrate checkpointed apps
across machines (systems might have different versions of the libs or
might have them in different locations, etc)_. They don't really
apply if you only want to checkpoint and later restart on the same
machine.
Another problem with the Condor stuff is that they don't distribute
the source code (at least the last time I checked).
-- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/