Re: Remote fork() and Parallel programming

mshar@vax.ipm.ac.ir
Mon, 15 Jun 1998 01:43:54 +0330


Hi,

yodaiken@chelm.cs.nmt.edu wrote:

>So this is a fundamental problem in distributed system design and explains
>why you have the wrong opinion about DSM as well [...]

What wrong opinion about DSM? Please refer to DIPC to know of my opinions
about DSM.

> [...] The OS has a simple
>algorithm to schedule tasks on a single machine, but nobody has
>convincingly proposed a good general purpose algorithm for distributed
>task scheduling. Similarly for virtual memory and DSM. Furthermore,
>time tradeoffs are different between distributed and unified systems.
>The cost of a couple of extra context switches can be a significant
>factor in a unified system, but DSM/process migration has a forced
>communication overhead that can easily absorb an extra context switch.

For process migartion, a simple load-measurement will do for the first
implementations. All computers are polled periodically, and the jobs are
migrated if some thresholds are exceeded in a machine. Because of the hint
mechanism, the application programmer can inform the system not to migrate
processes are are short lived, or that use many local resouces.

This help from the application programmer will greatly help prevent the
system from making the wrong decisions.

>Where is the gain from incorporating in the kernel? As a general
>rule, something should be in the kernel only if there is a
>compelling reason.

I completely agree with this general rule. But in Linux, process migration
will need some support from the kernel (to get and set run-time states),
so some hooks in the kernel will be needed. DIPC also uses this method.

>x[0] = 1;
>if( remote_fork())
> while(x[0]) == 1); /*where x points to distributed shared memory */
>else x[0] = 0;
>
>Works one way for real shared memory, another way for DSM. How do
>you fix it?

It works perfectly when using DSM with strict consistency, but it could be
slow. Like many OS text books, I'd tell the programmer to use semaphores
instead of busy waiting. I don't think such situations arise very frequently
in practice.

>Answer: memory channel. DSM on standard networks cannot work.

Yes it can. Its performance can improve if the programmers are a bit
careful. As I said before, the use of sychronization methods like
semaphores are strongly suggested even when programming a single computer.

>Or worse:
>P1:
>look through buffer for structures marked free
> put new data in free structures

>P2:
> look through buffer for structures marked full
> consume and mark free

Again, P1 and P2 can use two semaphores to inform each other of a new
develoment. If it helps to convey more information than the news about a
cell becoming free or full, then they can use messages which contains the
needed info (like a cell index).

Please note that such things are not needed for the correct operation of
a DSM system having strict consistency. They just help improve the
performance of progrtams using it.

>This is an excellent mechanism on real shared memory and works terribly
>on DSM. The problem is that you want to advertise something you can't
>deliver.

Yes, synchronizing via DSM can kill a program. A bit of programming
discipline is all that is needed to mitigate such problems. I believe the
advantages of DSM by far outweigh the disadvantages.

>The question is whether it is good to give the programmer an illusion that
>the OS cannot sustain.

RPC is not an illusion. It comes very close to the single computer
semantics of procedure calls (the syntax is the same). It is not perfect,
but still very useful.

>Only if the OS can properly make the cost tradeoff calculation. How
>does it do that?

It depends on many factors, like the network speed, other computers' speeds,
their current loads, etc. The algorithms that will decide on migration
should be tuned gradually to have an acceptable output.

I know this won't be perfect, but we have a valuable source of information
in the form of the hints from the application itself. If the programmer (or
better, the user, via a command line argument to the program) informs the
kernel that migration should be performed, then the cost tradeoff calculation
will be very relaibale.

If the user sees a drop in performance, after allowing process migration in
one run of the program (an error in the migration-decision algorithm), then
it will allow the program to run only locally the next time. This works
because most programs are executed many times.

>> Synchronization problems have been investigated by "grey haired architects"
>> before you and I were born. Read some OS books.
>
>You are avoiding the question. DSM turns out to be messages with some
>extra junk on top because most networks support messages, not shared
>memory.

You really think so? _Any_ elementary OS text book informs its readers that
using shared memory for synchronization is bad even in a single computer.
This is called busy waiting. So the answer is: use mechanisms like semaphores
and messages for this.

I thought this is too obvious to write.

-Kamran Karimi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu