Re: Remote fork() and Parallel programming

Andrej Presern (andrejp@luz.fe.uni-lj.si)
Tue, 16 Jun 1998 12:05:58 +0200


mshar@vax.ipm.ac.ir wrote:
> "Theodore Y. Ts'o" <tytso@MIT.EDU> wrote:
>
> > >It's not the bandwidth that's the issue, it's the latency. Bandwidth is
> > >easy. Latency is hard. DSM systems /all/ die because of latency issues.
> >
> > The main argument of anti-DSM people has always been the band-width (that
> > message passing can better use the bandwidth because the programmer has
> > total control over the transfers and can tune the program's behaviour), but
> > fortunately that is no longer important.
> >
> >Would you like to say more about that? I would think that this is very
> >important, and not being able to handle this case well would result in
> >really bad performance.
>
> Message passing systems require the programmer to explicitly transfer data
> to/from other computers. The programmer knows the application's data needs,
> so he can bring only the needed data, and at the time it is actually needed.

Actually, capability based systems are natural message passers. Since a
capability designates the object and the action that is to be performed
on the object, it normally suffices to only communicate the (small)
reference to the capability to get the job done. Because capability
based systems have strong borders between objects (which is also what
makes them so secure) and since the capability has to be explicitly
invoked (ie passing the border is on explicit request), the kernel might
as well (transparently) communicate the invocation to a remote node.
Application programmers write their software the same for single node
and distributed environments.

The point being: message passing systems do NOT neccessarily require the
programmer to explicitly transfer data to/from other computers. (That
Linux is not a capability based system is a different issue.)

> This results in efficient bandwith usage.

Wasting 4092 out of 4096 (see other messages in the thread) bytes
transfered is hardly efficient bandwidth usage.

> The kind of DSM I am talking about (transparent operation, as in DIPC), does
> not use explicit commands from the application to do its work. CPU's memory
> managemnet unit informs the DSM manager that a process is refering to some
> part of the memory that is not present, or is trying to write to a read-only
> address. That is all the information a DSM manager gets. It then works to
> make it possible for the application to perform its intended action, possibly
> by transfering the needed data from another computer. In DIPC's case, each
> transfer involves a multiple of 4K bytes.

Have you ever tried to construct a robust application whose parts of
address space can vanish into nothing (permanent storage such as a disk
is not 'nothing') and the application doesn't even know which parts CAN
disappear so that it can take preventive measures? You can't look at a
distributed environment as if it was a single virtual machine. A remote
node cannot be viewed as a permanent storage whose data will persist
because the node can go down. See what happens if parts of the disk that
we use to store programs go bad - the whole system can go down. You
can't even rely on the remote node to give you the pages that you really
want - a DSM cluster is the same as if it had no security at all because
an intruder can feed you any (troyan) code/data he feels like. If any
node in the cluster is compromised, all nodes are compromised. Now
imagine how secure a big DSM cluster (say 10k nodes) can be.

DSM is bad for performance, it's bad for resources, it's bad for
security and it's bad for robustness. The only thing it seems to be good
for is ease of use for lazy (or incompetent) programmers.

Andrej

-- 
Andrej Presern, andrejp@luz.fe.uni-lj.si

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu