Re: Lets get this right (WAS RE:MOSIX and kernel mods)

G.W. Wettstein (greg@wind.enjellic.com)
Sun, 7 Mar 1999 16:16:35 -0600


On Mar 6, 9:37pm, "Michael Loftis" wrote:
} Subject: Re: Lets get this right (WAS RE:MOSIX and kernel mods)

Good day to everyone following this thread.

> Which is exactly why itd be good to start work now. Nearby is a lab
> w/ 100Mbit ethernet and Giga-Ethernet. Both are very fast, and
> aside from using a Paragon backplane, the Giga is about as fast as
> it gets. This is *now* in 5 years Giga Ether will be consumer
> product. Imagine a 16-node Dual P-II Linux cluster on a switched
> full-duplex gigabit network... Right now pretty spendy (if you
> forgive the fact that there isn't free clustering in the kernel) but
> it *is* possible. Flash forward five years when the P-II can be
> bought at a garage sale and a Gigabit Ether switch would cost $50.

Actually I have 2 clusters like this sitting right underneath my
office. If I keep my project schedule on track and Nortel/Bay
delivers the blades for the A12's like they are supposed to we may be
able to light this creature up later this week.

I find this discussion of clustering quite interesting in light of
what my research associate has been telling me the last week. We are
currently running one of our research clusters over switched Fast
Ethernet. My RA and one of our campus researchers have been working
on parallelizing a 3-dimensional soils model using C++ and MPI. They
are already beginning to see the affects of network latency and how
this influences the overall timing of production runs. Clusters raise
very interesting issues WRT networking and a host of other issues.
The whole field is a study in very subtle issues.

That being said I think that we as a community need to consider the
whole issue very carefully. I have been working with Linux for a long
time (Hi Larry) as some of the 'old-timers' will attest. We need to
forge ahead with innovation and excellence since that is what has
taken us to the rampart of challenging one of the mightiest behemoths
in technology (MS).

This is going to require that we deal with issues in a manner which is
much different than in the past. If something gets put into the
kernel, even as a compile option, the suggestion is that this has been
given the imprimatur of Linus and the community. Both entities which
have gained a lot of respect and credibility in the last several
weeks/months. Developers and others will take the presence of code in
the 'official' tree as an affirmation of the features usefulness and
will begin building application dependency on it. Once this happens
there is no going back so we must make fundamental architecture
decisions with great prudence.

With the globalization of Linux development made possible by the
INTERNET the addition of clustering technology or other very specific
features is probably best supported as addendums to the basic kernel.
The future success of Linux is probably best exploited by careful
subscription to Darwinistic theories with respect to code inclusion.
By only allowing development to proceed along pathways which have
proven themselves in production environments or by measurable
performance indices we will insure that the sins of the past are
avoided.

Linux has attracted people, like Larry, since it provides what many
commercial OS shops have lacked. A development process where the
'best' ideas and technology win, rather than a process which gets
influenced by marketing demands or who hangs around best by the water
cooler or speaks the best at code review meetings.

What Larry is asking for is not unreasonable, proof by performance
measurement. I don't think that anyone would suggest for a minute
that Linus would not include something that amply demonstrated
superiority in a test cluster. Until that occurs new innovation
probably needs to occur outside the kernel. This can actually be
looked at as an advantage since we are rapidly flying toward a
situation where the ability to change things once they get into the
kernel becomes markedly reduced.

Well enough of this. For those of you who have read this far here is
the configuration of our 2 research clusters:

MIDAS: 16 node cluster
266Mhz Dual-PII
128Mbyte RAM
4 x 16Gbyte IDE drives (1 Terrabyte composite).
Network Fabric 1: 100Mbit Switched (Bay 450T) Fast Ethernet
Network Fabric 2: Switched (Fore) OC3 ATM
Network Fabric 3: 1Gbit Switched (Accellar) Ethernet

HURD: 16 node cluster
450Mhz Dual-PII
512Mbyte RAM
9Gbyte IDE drive.
Network Fabric 1: 100Mbit Switched (Bay 450T) Fast Ethernet
Network Fabric 2: 1Gbit Switched (Accellar) Ethernet

We have a multi-link trunking arrangement between the switches so that
we can insure 3Gbit/sec. interconnects between the switch backplanes.
Any node in either cluster can thus 'see' 3Gbit/sec to the rest of the
nodes on the switch backplane.

To give Larry's concerns about latency some extra credence it was
interesting that when we designed the two clusters the network
engineers worked with us were concerned that we wouldn't run into
applications that would be thwarted by the switch latency times.

To sort of put my money where my mouth is if there are any groups that
want to pursue 'Darwinism' in the field of kernel cluster development
let me know and I can provide testbed time for measurements and tests.

Have a pleasant start of the week everyone.

Greg

}-- End of excerpt from "Michael Loftis"

As always,
Dr. G.W. Wettstein Enjellic Systems Development - Specialists in
4206 N. 19th Ave. intranet based enterprise information solutions.
Fargo, ND 58102 WWW: http://www.enjellic.com
Phone: 701-281-1686 EMAIL: greg@wind.enjellic.com
------------------------------------------------------------------------------
"The greatest pleasure in life is doing what other people say you cannot do."
-- W. Bagehot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/