Re: Process Migration on Linux - Impossible?

Larry McVoy (lm@cobaltmicro.com)
Tue, 30 Sep 1997 14:35:01 -0700


: I must take exception to Larry's comment that migration is a bad idea
: but remote execution is great. I agree that remote execution
: (parallel make etc.) is wonderul. But just because migration is
: difficult doesn't mean it shouldn't be done.

I don't think process migration is such a bad idea, I just think that the
costs don't out weigh the benefits. I'm not saying that it can't be done,
I'm saying that it shouldn't be done, especially not at the expense of
normal OS performance, reliability, and maintainablity (adding code to
the kernel is mutually exclusive with all of those in almost all cases).

: I had a full-functioned
: implementation of transparent process migration in Sprite almost 10
: years ago.

So here's a question: if it worked and was useful, why has that facility
not made it into any sucessful commercial or free operating systems? Was
it ahead of its time or is there some inherent reason?

My definition of an architect is someone who knows the difference between
what could be done and what should be done. I'm trying to comment on the
architecture, not the engineering. I happen to think that full blown process
migration is something that can be done but should not be done.
Past experience has shown that process migration is costly in terms
of code required to make it work, time required to do it, and cache
utilization (both processor and file system).

Since I despise nay-sayers that don't bother to offer a better answer,
here's my better answer: cluster small SMP machines. Allow full blown
"migration" from one CPU to another CPU on an SMP, but not across machine
boundries. This gives some degree of dynamic load balancing, which is
the second order throughput term. Use static load balancing at exec()
time to get the first order term.

I happen to believe that such a system would keep up with, and in some
cases outperform, large SMP systems. I have a fair amount of real world
experience from Sun and SGI SMP systems that suggests that this approach
is better.

Here, we are discussing process migration across machine boundries,
something that is certainly much more expensive than migration from one
CPU to another CPU within a SMP, right? It is safe to say, is it not,
that if you can't migrate well within an SMP then it is going to do
nothing but get worse if you try and migrate to a different machine.
I've worked on 128-512 processor systems at SGI with 100% cache coherent
memory and file systems (done in hardeware, memory latency was 300-800ns
depending on where you were). We couldn't make page migration work well
on those machines.

Let me say that again. A company with SGI's resources, may full time
experienced engineers that would stack up with the best the research
community has to offer, couldn't get page migration to work. By "work"
I mean come up with a self tuning policy that results in better throughput
than just allocating the pages and leaving them where they were allocated
and leaving the process near them. /All/ attempts to improve performance
by using migration resulted in lower system throughput. The cost of moving
the process context and the pages outweighed any performance benefit.

It would be easy for people to just say "Well, those SGI engineers are
stupid". Heck, I could say it, I didn't work on the migration stuff,
I thought it was a bad idea before they started working on it, so it
isn't like I have any skin in the idea, quite the opposite. But the
SGI engineers are top notch, at least in this area. I challenge the
minds out there to come up with a migration policy that actually
improves performance under any realistic workload, not some toy
benchmark. Show me a process migration system that makes TPC-C
run better. Or fortran jobs. Or make. Or web. Or NFS. Anything
that customers will pay money to get.

The point is that yes, you can do it. But that is an academic point,
not anything that is actually useful. Not in my experience. I'm happy
to be proven wrong, but I'm unhappy with letting people go down a
proven rathole.