RE: MOSIX and kernel mods.

Chris McClellen (Chris.McClellen@SciAtl.COM)
Fri, 5 Mar 1999 17:13:12 -0500


> -----Original Message-----
> From: owner-linux-kernel@vger.rutgers.edu
> [mailto:owner-linux-kernel@vger.rutgers.edu]On Behalf Of Kamran Karimi
> Sent: Thursday, March 04, 1999 11:42 PM
> To: Richard Gooch
> Cc: linux-kernel@vger.rutgers.edu
> Subject: Re: MOSIX and kernel mods.
>
>
> On Fri, 5 Mar 1999, Richard Gooch wrote:
>
> > As I said before, if it's just for you and your friends, and you
> > understand the pitfalls, then fine (of course, then patches to the
> > official kernel aren't appropriate).
> >
> > If you're providing this for others, then I think you've doing them a
> > disservice, because you're giving something that looks really nice,
> > but has too many hidden traps.
>
> Let me explain: People can use DIPC to write a distributed program, and
> yes, they may even use shared memory locks (DIPC's documentation
> discourages people from doing such things, and proposes better ways).
> The application will run, even if it is slow. _IF_ the programmer thinks
> that he needs more speed-up, then he can start changing the application.
>

Why code a parallel application to be slow? If you are diving into
parallelism of any kind, the assumption is that you are looking for some
speed. For instance, you have decided that you have some tasks that
can take place at the same time; ie a calculation that can be distributed.
Why put all the effort into parallelization, which implicitly means
looking at classical critical section problems, deadlocks, etc for a
nice big slowdown? "Parallel for free" isn't good if it runs worse than
a non-parallel program. It would also stink to have someone have to start
over because the way the used an abstraction is horrible, but seemed
sensible at the time.

To get a decent performing application, you have already as much admitted
that the programmer be aware of the pitfalls of DIPC. So then why
is this abstraction truly easier than say user-level message passing?

Is it so that you can say something like:

lock();
memcpy(sharedp, ...);
unlock()?

Instead of :

someapi_sendmessage(pgroup, sharedp, size)?

Yeah, in one implementation, it looks like any other pointer, and true
perhaps there is really a shared pool. But gosh it seems like it would be
slow to keep banging on sharedp in the first case. Caching doesn't seem
like a great idea of sharedp is updated frequently.

In another, the sharedp may actually live in each address space, and perhaps
the programs updated contents of local copy of sharedp upon receipt of
message. Again, frequent changes could cause a message storm.

However, it seems to me that if you are writing a parallel program that
is beyond SMP (ie, parallel across a cluster of heterogenous computers),
there are a lot of issues to deal with and if you are capable of
breaking the problem up correctly, you are equally capable of using
some other API that gives the same result without doing what amounts
to operator overloading.

On the other hand, if message passing APIs, etc, are too complicated
for people, I am going to go out on a limb here: they probably shouldn't
be writing distributed applications. If they understand the issues
the apis are not *that* complicated.

[snip]
> Many of the current distributed applications were designed with the goal
> of getting as much speed up as possible. This is my belief (and hope)
> that system like DIPC will open the doors for other classes of
> distributed applications. Other than that, a DIPC program is
> multi-processor-ready, meaning that by targetting an MP architecture the
> programmer can get a distributed application for almost free.
>

As I said, the only "free" here may be a slow program. Programs that
aren't distributed to begin with may not need to use IPC at all, least
of all DIPC.

> This is all about the way we think about computing. For me, people
> should optionally have a unifying software layer that not only hides
> things like I/O programming and memory managment, but also the existance
> of a network in the system. programmers can then decide if they want to
> use this high-level layer, or access the lower ones. Without the option,
> there is no choice.
>

Unfortunatly, you have to know what you are dealing with. Local memory
is faster than memory across the network. Fear the people who feel they
can't deal with that difference. If you want performance, you want
to know whats expensive and whats not. People have to design stuff all
the time that works efficiently over slow networks vs fast networks.
They need to know what the target is because software architecture will
differ depending on what you are dealing with (ie you may need caching
in one instance, but not in the other to get reasonable performance).
This is a reality of life. In the future, when all networks have
uniform bandwidth, then this will be irrelevant :-)

> One can also think of a psychological reason as to why some people don't
> like concepts like DSM. Imagine you have been writing distributed apps
> for years "the hard way" (and getting payed very well for it). Now what
> happens if suddenly everybody can do about the same thing? Of course an
> expert assembly language programmer always has some advantages over
> someone who only uses JAVA, but hey, the JAVA programmer can do a lot of
> work, and finish the projects faster too! It is quite understandable if
> our imaginary assembly programmer starts bashing JAVA (and all other high
> level languages, for that matter) by labeling them "ineficient", "hiding
> the reall way the hardware works (flawed?)", etc.
>

The "easy way" could turn out to be more expensive when the people who
don't understand what they are doing start using DSM, and a company
has to pay for the software twice: once for the person doing DSM in
a "sensible" way, and once for the rewrite in which someone else comes
in and uses something else, or fixes the program so that the program KNOWS
about HOW DSM operates. Having to know HOW DSM operates (and thusly you
know the pitfalls) kills this notion that it makes things easier.

I have to gree with Richard. I don't think linux needs DSM support.
Of course, thats just my opinion.

On the other hand, if using DSM in a "sensible" way wasn't really any worse
than typical apis, then there is much more of an argument for it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/