Re: MOSIX and kernel mods.

Richard Gooch (rgooch@atnf.csiro.au)
Fri, 5 Mar 1999 08:34:55 +1100


Kamran Karimi writes:
> Richard Gooch wrote:
>
> >Going hopefully more on-topic, I don't think MOSIX matters anyhow. I'm
> >not convinced of the merit of the idea in the first place. If you want
> >"nice" clustering support, you're probably better off with DIPC
> >anyway, keeping more of it in user space (but still not all).
>
> In DIPC's case, the kernel used to handle the IPC in a single computer.
> If you want a very high degree of transparency, you have to offer the
> new distributed functionality from the kernel.
>
> Although one may be able to use link libraries to move some stuff
> out of the kernel, DIPC's Distributed Shared Memory has to remain
> there, as it requires direct manipulation of the processor's Memory
> Management Unit (tools like mprot don't do what DIPC needs).

Distributed shared memory is a fundamentally flawed idea. It pretends
to be SMP when it really isn't. With SMP I can have two CPUs writing
to the same page (different cache lines) at the same time and it's
fast.

With DSM, you've got MMU tricks to only allow one machine at a time to
write. If one CPU writes and then the other reads, you've got shift
the page across the network.

With SMP, a lock is cheap. With DSM, a lock is expensive.

> >And I'm not even convinced about the DIPC model either. I think trying
> >to pretend a networked cluster of machines is really a single happy
> >machine with lots of nodes is fundamentally flawed.
>
> Interesting. This is like saying it is wrong to let the OS do
> task-switching and create the illusion that more than one task are
> running, while in reallity there is only one CPU. Why not make the
> programmers deal with such issues themselves? They have to make sure
> their programs stop at the specified time intervals so that other
> apps get a chance to run.

Your analogy is broken. Multitasking has hardware support and is a
cheap operation. DSM has no hardware support and is an expensive
operation.

Even on machines which do have hardware support (SGIs Origin series),
it's my understanding that getting page migration to work properly has
been a problem.

> And why stop here? Why not let them handle their own memory
> managment? After all, they know more about their memory needs than
> the OS! Let them use overlaying to save parts of their memory space
> they _know_ will not be used in near future out to a hard disk. This
> will improve performance because the OS is too dumb to make a good
> decision in many cases. The possibilities are limitless here.

Yeah, right. Lets rewrite the kernel in Java: it provides such nice
abstractions.

Please.

> The trend in computer science has been to add more and more
> abstraction layers, so as to reduce the complexity of the
> application development and maintanance times.

My point is that DIPC/DSM is the *wrong* abstraction. Same goes for
MOSIX.

> Performance seems to have been a secondary issue.

Sure, in *computer science*. CS labs around the world have been
producing new operating systems for decades, each one is more abstract
than the ones before. Now tell me how many of them are actually used
in the real world?

> >The right way
> >(IMNSHO) is to provide a flexible and lightweight API to support
> >communications (in user space), and write your application *knowing*
> >you're running on a networked cluster.
>
> IMNSHO, _most_ people prefer ease of programming to performance
> issues. Now, I understand if an embedded system developer has to
> deal with code speed and size issues, but it seems very selfish to
> me to see such a person insisting that all the other people should
> stop using high level development tools like Visual Basic because
> they create very big and slow code.

You don't need DSM to provide a nice and easy programming environment.

> Apparently you have no problem with automatic task switching and
> memory managemnt, how come you think adding another abstraction
> layer would be wrong? My guess (and I could be wrong) is that you
> are used to employing message passing for distributed
> programming. You have no problem with that, and adopting an easier
> programming model (like DIPC) seems unnecessary to you.

Let me tell you about what I've done for my distributed volume
rendering application. First, I wrote it in a message-passing
paradigm, because I had a message-passing machine. Then I rewrote it
to use threads, because I had SMP machines. Later, I rewrote it to use
message-passing for remote nodes and threads for local nodes.

The task is divided into jobs which are scheduled
appropriately. Scheduling locally is much faster (tens of
microseconds) than scheduling remotely (milliseconds). I need to be
aware of that. I'm down to the level where if I spend another one or
two milliseconds waiting for data to come over the network, I start
losing a few percent here and there. Total rendering time is 0.5
seconds.

I'm getting 73% parallel efficiency on a network of 5 machines:
dual PII 400
dual PII 266
dual PPro 180
dual Pentium 133
dual Pentium 133

I have techniques in mind to increase efficiency. Just one alone
should take it to 76%. It's hard work getting high efficiency on a
network of machines which have such a different range of performances,
especially when you can't predict how much work is involved in each job.

If I were to take my basic threaded code and run it with DIPC/DSM or
MOSIX, I'd consider myself lucky to get 10% efficiency.

> >Any abstraction that attempts to hide the immutable fact that
> >inter-node latency is high and bandwidth is low is a *bad*
> >abstraction. It lulls the programmer into thinking they're writing
> >efficient code. Local nodes and remote nodes *should* be treated
> >differently, because they *are* different.
>
> What hiding? No matter what programming interface you use (explicit
> message passing like PVM, or implicit message passing like DIPC),
> you have to deal with the fact that crossing the network is
> slow. This is obvious. DIPC does not hide this fact. It lets the
> programmer do data exchange in some familiar ways, so he or she
> won't have to learn a new set of APIs. Execute that DIPC program in
> a single computer, and it runs at full speed. Run it on a network,
> and things will probably slow down. In short: We run in a cruel
> world. Do we want to make things easier on ourselves or not??

DIPC (as in: strict message-passing) is one thing. Distributed shared
memory is another. Two nodes banging on the same page just isn't going
to run well with DSM.

> Your statment above is like someone advocating the exclusive use of
> assembly language now a days. Inspite of all the advances in
> networking, Distributed Programming has not become popular because
> of a lack of easy to use programming model. Only scientists and some
> other "power users" have ruled here. Given a simpler programming
> method, I think these people will become a small minority among the
> distributed programmers.

No, go back and read what I said. DSM is the *wrong* abstraction. DIPC
without the DSM it is OK (as long as the kernel isn't bloated with it:
do it in the library).

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/