Re: Remote fork() and Parallel Programming

Alan Cox (linker@nightshade.ml.org)
Wed, 17 Jun 1998 12:49:05 -0400 (EDT)


Very nice..

However, I would add that DIPC(dsm) is not like C. It's like pascal.
Overly safe and difficuly to extract good performance from. With C and a
good compiler you get near ASM speed (unless you compair C to really
tricked out assembly).

If we agree that MPI is like assemble, then perhaps we need to put alot
more attention to making MPI faster. You make it faster on a single
computer (like a SMP box) using shared memory. You make it faster on
clusters through shortcutted network stacks and optimized protocols. You
make it faster by bringing basic core functions into the kernel where
needed. You make hand optimized ASM version for various cpus.

Then you build DIPC on top of MPI.

Perhaps there needs to be a third alternitive, a more userfriendly wrapper
on MPI.. Like C is on assembly.

On Wed, 17 Jun 1998 mshar@vax.ipm.ac.ir wrote:

> Hi,
>
> Distributed programming has traditionally been a very specialized and
> "private" branch of computer science. Usually very big organizations or
> universities are/were fortunate enough to have access to custom-designed
> hardware suitable for distributed programming. Such hardware are/were
> very expensive and mostly used to solve big problems (that is how they
> justify their costs). Getting good performance is of paramount imprortance
> here; ease of programming is placed very low on their list of priorities.
> They are willing to hire very talented programmers and spend good money for
> training and program development. Programs are tuned for a special harware,
> and redesigning them is usually very hard.
>
> Most of the current tools and mentality for distributed programming stems
> from these environments. We can call this E1 (environment 1). In E1,
> distributed programs are nearly always parallel programs. Here developing
> and maintaing applications is very costly and hard. The number of programmers
> willing and able to enter this environment is not very high. Ease of
> programming can be neglected because these programmers can be trained for
> the job. The programmers are _expected_ to work a lot on distributed
> application development.
>
> I think the E1 mentality can not be applied to the rapidly changing (read:
> improving) world of distributed computing.
>
> Think about all those networked PCs and workstations all over the world.
> They are used in offices, universities, and even homes (call them E2). Each
> of these computers are usually used by a single person, and most of the time
> they have nothing to run, which means that they waste resources. They mostly
> use state of the art hardware and networking technology (many people
> frequesntly upgarde their PCs, and gigabit Ethernet is coming fast). This is
> an _enourmous_ source of computing power! It is very interesting if we can
> use their collective powers by forming clusters.
>
> If this is done, then suddenly thousands of developers can start writing
> distributed programs. Now we can hope for scores of new, useful applications;
> something we can not have if distributed programming stays in the domain of a
> minority.
>
> A good analogy can be found in the realm of programming languages. at one
> time there was a very small number of programmers in the world, and
> they programmed computers in machine language or assembly. But look at the
> situation now. Do you think there would be this many programmers and useful
> software if we had continued to use assembly language instead of higher level
> languages? How many people were able to learn and use assembly? How much
> harder would it be to develop some very big applications we are using today?
>
> Assembly language gives the programmers a lot of control over the programs'
> behaviour, and so they can write very efficient programs. Higher level
> languages (like pascal and prolog) provide a higher level of abstraction, and
> create a view of the system that can be very different from the actual
> hardware. This does result in some inefficiency. One has to compile a high
> level program, and it is always less efficient than when written in assembly.
> Inspite of all this, the trend has been to _add_ to the abstraction. Look at
> Delphi and visual basic as examples.
>
> Programmers still _can_ use assembly if they think they have to. Providing
> higer levels of abstraction does not prohibit us from using the blocks that
> were employed to build those abstractions. There is nothing mutually
> exclusive here.
>
> The case in the move from normal systems to distributed clusters is very
> similar. Do we want to have applications that use more than one computer in
> a cluster to solve a problem or not? If no, then there is little sense in
> building a cluster in the first place and the discussion ends, but if the
> answer is yes, then we have to think about a suitable programming model,
> because the presence of a network introduces many new complications into
> programming. Commonsense tells us that the closer a model is to what people
> are used to, the easier it is to be used.
>
> Most of us are trained to think of shared memory as the main mechanism of
> data exchange. We use gloabl varibales or the arguments to procedures to
> let different parts of our application to receive data and later return
> any results. I know anybody can learn to use other methods of programming,
> but considering the number of people we are talking about here, it is better
> if we keep the conventional techniques of programming as much as possible.
>
> Some of what we achive by doing so is:
>
> *) Application programmers don't need to learn new programming mentalities
> and techniques.
>
> *) We can continue to use many of the conventional algorithms
>
> *) The source code of normal and distributed applications will be very
> similar. This will be a great help in debugging and maintanance of
> distributed program (and we are think of thousands of such applications)
>
>
> Object Oriented Programming is one model to use for distributed application
> development. They have some rather nice properties, but here we are talking
> about Linux and its thousands of applications. Linux is not object oriented,
> so we can get it out of our consideration. Now we come to the message
> passing vs DSM arguments. Here is what we observe:
>
> *) For distributed programming to make sense, we _have_ to transfer data
> over the network. We _have_ to tolerate the difference in speed between a
> network and a local computer bus. This is an inherent property that has
> _nothing_ to do with the programming model we use.
>
> *) Networks are becoming faster everyday, and they do reach our PCs. Even
> Terabit networks _do_ exist. Latency is becoming the dominant factor in
> transfer times. In other words, the time of actual data transfer is starting
> to become negligible.
>
> *) Messages (in the sense of TCP/IP packets) can be considered the privimite
> method of data transfer in a network. Other methods, like PVM's messages or
> DIPC's shared memories, are implemented on top of this mechanism. PVM and
> DIPC both use TCP/IP to provide some abstractions of the underlaying
> hardware. PVM's messages are not very different from TCP/IP's packets in the
> way programmer use them (both are essentially routines that take some
> arguments and transfer data), but PVM's messages offer many services that
> TCP/IP does not. Examples include allowing the application to use logical
> computer address instead of IP addresses, or the conversion of the data
> contents to the receiver's suitable representation format. This eases the
> work of a distributed application programmer a lot. It is no wonder that so
> many people prefer to use systems like PVM and MPI than to use only TCP/IP.
> Obviously a PVM message has to cross the network, which means the program
> initiating the transfer has to tolerate the network latency and the time
> required to transfer the data.
>
> A message passing programmer has to worry about what the application is
> supposed to do in the first place. He also has to worry about making sure
> that the data needed for a compution is transfered to the right computer at
> the right time. He has to do such transfers explicitly, meaning that most
> probably the source code will be very different from the case where
> no such explicit data movements were needed.
>
> DSM is another step up the abstraction ladder. It completely hides the
> message passing requirements of a clutser, and allows the programmer to
> develop his distributed application the same way as a normal application.
> The developer continues to use shared memory to transfer data between
> different parts of the application, and can mostly forget about the presence
> of a network. I say "mostly forget", because the fact that the programmer
> does not see a network will not cause it to vanish. The same costs in data
> transfer to other computers are still there, so using certain techniques like
> busy waiting will result in poor peformance, because though the prgrammer can
> not see, his application is generating a lot of network traffic. Thus it is
> very clear that DSM can not relieve the programmer of all concerns about
> distributed programming, but it is still useful, because it makes the average
> programmer's job very easy and also produces a more readable and maintainable
> program.
>
> DSM to message passing can be considered like assembly to a high level
> language. Assembly language has its own advantages, but because higher
> level languages are easier to learn and program with, most people prefer
> them. In fact, the trend has always been to provide even higher level
> languages. Write a small program in Delphi or visual basic and you get a
> _very_ big compiled language. For most people it does not matter much that
> their executable is well over 100KB, while if written in assembly, it would
> be a few hundred bytes.
>
> Still, no one can be forced to use a high level language. If someone thinks
> that using assembly language is the answer to his application's needs, then
> he can very well do so. The same is true for DSM. You want to simulate an
> atomic explosion and need every bit of resource you've got? No problem, pay
> a lot of money to build custom hardware and hire good programmers; then wait
> till the application is designed and implemented.
>
> But not all those people with access to a cluster of PCs or workstations
> need all the performance their computers can give them. Most can live with
> some overhead and instead get ease of programming. In fact, in most cases if
> there is no easy way to program a cluster, then there will be no clutser at
> all.
>
> If we offer the harder programming models, then we are depriving ourselves
> of a lot of useful applications that could be developed by thousands of
> programmer who can't or don't want to learn unfamiliar programming models.
> This is a waste of both programming talent and computer equipment (no
> program, no resource usage).
>
> The arguments about the merits of having transparent process migration
> follow from the same way of thinking: that applications should not worry
> about where they are running. The programmer should not have to get involved
> in checkpointing and restarting his application. These issues add to the
> visible complexity of a distributed program. This will make developing
> distributed applications harder; something I and a lot of other people don't
> like at all.
>
> No method I know of is perfect, but some methods are preferable to others
> accroding to one's priorities. For me, it is the ease of programming.
> Experience in the computer industry has shown that one need not worry too
> much about hardware performance and its rate of improvement. It is software
> development that needs consideration because a hard-to-use programming model
> does not improve every two years or so. As evidence, just consider that
> distributed programming is a rather old branch of computer science, but it
> has not yet found widespread usage by application programmers. It will
> probably remain so as long as people consider distributed programming as
> something that _should_ have the active involvement of the programmer in the
> process of making the application work in a distributed environment.
>
> I think "we" having thousands of distributed applications available for our
> PC clusters that are not %100 efficient in their resource usage is better
> than "they" having a few very efficient and high performance distributed
> programs running in national laboratories.
>
> Everything is up to the Linux community.
>
>
> -Kamran Karimi
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu