Re: Process Migration on Linux - Impossible?

Fabio Olive Leite (leitinho@akira.ucpel.tche.br)
Mon, 29 Sep 1997 20:03:43 -0300 (EST)


Hi!

> I would suggest that you take a look at the many, many papers on cache
> affinity. All of these pares are saying one thing, and one thing only:
> process migration is a very, very bad idea. It's the best way imaginable
> to make a system of computers slow to a crawl.

I guess I'll have to study some more, then. I know it will crawl, but a
Grad Project doesn't have to be state of the art. Even if it has a
negative speedup :), I will certainly learn a lot from doing it, and
that's the ultimate goal of a Grad Project, at least on my Univ. :)

> Every SMP vendor would be happy (or not so happy) to tell you about their
> efforts in process migration on SMP hardware, where you can imagine it is
> substantially less painful to move the process. Less painful in terms of
> OS code, but extremely painful in terms of hardware efficiency.

Certainly shared memory gets things really easier. :)

> Good OS engineers learn to mentally view the idea of process migration as
> a very, very exceptional event, one that you want to do only under duress.

I think this project will teach me that. I just want to get it done,
anyway.

> If you really insist on this, you are simply reinventing the wheel. The
> system that you describe has been done, 100% at user level, and has been
> around for years. Condor does this. Jobs get sent to the "network" and
> land on some computer that is idle. When the computer is used again (i.e.,
> the mouse moves or some non-remote job app starts up) the process is
> checkpointed and moved.

Just like Sprite, only that it's done on the kernel level. Processes don't
have to be aware of migration, and developers don't need to use any
special library. I know that PVM, P4, Condor, DIPC and all other related
packages help in some way or another, it's just that they are packages,
simply. If I have a machine with ten users using up all their time slices
with some simple but intensive software of their own, they certainly won't
be interested in having to recode/port it to some package so that it may
get faster, provided they now worry about messages, pvm_spawn,
god_knows_what.

I think things like this have to be transparent to the user, and that's
precisely why I think it has to be done on kernel level, not on user
level.

I know that a net of Linux boxes is not the best place to have migration,
but I don't have the time nor the resources to get hold of a cluster of
tightly coupled "superstations" with some proprietary OS that already does
it, and neither have the people that doesn't even dream about migration,
yet adinistrate a higly idle net of Linux boxes, with people complaining
that box A is extremely loaded and the others are idle.

I just think there has got to be some way of helping in _that_ case.

> I think you are missing a fundemental optimization. If you "migrate"
> only at exec(2) time, then you get the best of both worlds and the worst
> of neither.

Migrate on exec time, and then fork having the file with the code on the
source machine. Boom, there goes any optimization you might have had. You
have to take into account that the system in focus is a network of
independent stations.

This is getting better! I like to discuss this subject, because it helps
get me some ideas. :)

[]!
Fabio
( Fabio Olive Leite leitinho@akira.ucpel.tche.br )
( Computing Science Student http://akira.ucpel.tche.br/~leitinho/ )
( )
( LOADLIN.EXE: The best Windows95 application. [Debian GNU/Linux] )