Re: Help in DSM design

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Tue Mar 07 2000 - 13:57:43 EST


Stephen,

You make a good argument. We are partially funded by the same folks who
own Dolphin and SCI (scalabel choerent interface) (Amerscan). COMA and
all these shared-memory systems are interesting, but we abandoned pure
shared memory development in favor of a message passing scatter/gather
model due to issues of fault tolerance. It's almost impossible to make
a resilent fault tolerant system using shared memory designs (you have
no idea who owns what memory on what systems). SCI does, however, scale
very well in shared memory configurations due to it's point to point
link and switch design (plus the fact that the baseline technology is
made with Gallium Arsenide instead of bi-CMOS - though CMOS
implementations exist and are cheaper and slower -- it's questionable as
to wether yu can still father children after standing near one of these
devices while it's operating). SCI has been shown to scale up to 8000
processors in shared memory configurations -- however, providing fault
tolerance on such a system is almost impossible to achieve. We had done
some work on ccNUMA style cards -- Intel has been hesitant to open up
their memory bus to folks -- DG has such a system.

Interestingly, we use SCI adapters for clustering development, and they
are the best message passing interface we have seen for low latency
communications (you create shared memory windows between systems, and
pass messages through these windows). DSM software style shared memory
designs all seem to really suck beyond three nodes -- I think it's
probably a waste of someone's time, but may be useful for legacy SSI
applications.

Remember MACH? Who even uses it today, or Chorus for that matter?

Jeff

Aman Singla wrote:
>
> > Not just that. On SMP, and even more on NUMA, you only get scalable
> > performance by going out of your way to avoid unnecessary sharing of
> > data. It's just not transparent. And when you _do_ want to pass data
> > from X to Y, you only want to do it once, which requires extra
> > synchronisation to make sure that the receiver of the data doesn't start
> > peeking at the data structures until the data is known to arrive.
> >
> > With message passing, that synchronisation is implicit. With shared
> > memory, about all you can do is spin waiting for the data to arrive,
> > unless you add a separate synchronisation mechanism somewhere to do
> > locking outside the shared memory channel.
> >
> > It's not just granularity. The synchronisation hit is bad, on _all_
> > forms of DSM up to and including SMP. (The latest Linux development
> > kernel includes "big reader" locks which allow a read-mostly data
> > structure to be locked on each CPU separately with no inter-CPU memory
> > traffic at all, simply because even on such tightly coupled systems the
> > cost of passing even one cache line of shared memory traffic between
> > CPUs is too high.)
>
> The argument here is essentially that shared-memory as a programming
> model doesn't scale, as well as message passing can.. DSM just makes
> it more evident.
> But the lure of the easier programming model is substantial; and
> applications come in all flavors from will scale on anything, to
> won't scale on shared-memory but do OK on message-passing, to won't
> scale on message-passing either - it boils down to the extent of
> state sharing and synchronization inherent in the application.
> There's been considerable research in shared-memory protocols which
> have mechanisms to (try to) bring the communcation/synchronization
> down to the 'implicit in the application' minimum, as a msg-passing
> program would. They seem to work for some applications but I wouldn't
> say are general enough to be practical.
> But they mostly are explicit communication primitives within the
> context of coherence protocols and consistency mechanisms. I think
> one could reap most of the 'ease of programming' benefits of
> shared-memory programming by just having a global-address space and
> build 'put'/'get' kind of primitives to communicate explicitly on
> the top - kinda like msg-passing.
>
> :a
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:23 EST