The Linux Letter for August
2, 1999
Hello again, and welcome to another Linux
Letter. You may have missed last week's letter, and for that all
I can do is offer my humble apologies. As a starving,
non-traditional college student, I was caught up in the frenzy
of summer term finals in Multivariable Calculus and just didn't
have the time to write the column. So I guess that this week I'd
better do something good!
Besides the Internet and the general
hullabaloo over open source software, one of the promising
directions for Linux is in supercomputing applications. While
Linux easily scales to multiple processors on a single computer,
many organizations are finding that tremendous performance can
be had by combining many single processor computers into
clusters of systems that behave as a single computer. The common
term for this is distributed computing.
Distributed computing breaks data up into
individual-computer sized chunks, which are then processed by
each computer in the cluster. Generally, a master computer
controls the communications within the cluster, assigns the data
to the individual systems and accepts the processed results back
for presentation to the user.
While you may not have actually used a
cluster of computers, Linux or not, chances are that you've
already heard of distributed computing. Distributed.net is using
the idea on a grand scale to attempt a brute force crack of the
latest RC5-64 code. SETI@Home
also uses distributed computing to process data from the Aricebo
radio telescope in Puerto Rico as they search for
extra-terrestrial intelligence. And here's how it works:
In the case of SETI@Home,
a group of servers stand by to deliver relatively small packets
of data to a client program that runs on another computer,
perhaps yours. When the client contacts the server via the
Internet, the server sends a packet of data to the client, then
closes the connection. The client software then processes the
data. When the data is processed, the client opens a connection
to the server, uploads its data, then downloads another packet
to be processed. The process repeats itself for as long as the
client system is running. At the same time, many other clients
are running the software, each downloading its own packets of
data for processing. In that manner, many thousands of packets
can be processed simultaneously, thus distributing the computing
process among many computers.
The results can be breathtaking. In the
case of SETI@Home,
roughly 38,000 years worth of data processing has been done in
just a few months. Yet the cost of the project has been minimal
because the data is processed using spare CPU cycles of many
thousands of computers, instead of dedicating a single
supercomputer to the task.
On a smaller scale, Linux supports
distributed computing with clusters. A cluster of as few as four
computers can process certain data as much as 5 times faster as
a single computer. Data that is very CPU intensive, such as
graphics rendering lends itself very well to this type of
processing.
Currently, the most popular clustering
software for use with Linux is PVM. This is a massaging software
that uses rsh to provide a secure means of communication between
all of the computers. It coordinates the passing of data amongst
the network nodes and provides API's to software programmers to
allow them to take advantage of the clustering abilities of
Linux.
Since networking hardware is becoming less
expensive, cluster performance is improving. 100Mbps switches
that used to be virtually unaffordable are emerging as
commonplace methods of connecting network nodes. And even hubs
perform well enough to be considered as candidates for
clustering.
Another method of clustering uses a
software package called Mosix. Instead of running programs as
separate processes on remote computers, Mosix causes the cluster
of computers to appear to be a single computer to the user. The
program distributes data by examining the performance of the
individual computers in the cluster, so that higher performance
computers process more data and lower performance computer
process less, so that each computer spends roughly the same
amount of time processing data.
If you're interested in pursuing
distributed computer, the best opportunity is probably through SETI@Home
or Distributed.net. In fact, The NOSPIN Group maintains a team
on SETI@Home, so
you can actually see how your processing contributes to the
performance of a very large "virtual" computer.
SETI@Home:
http://setiathome.ssl.berkeley.edu
Distributed.net:
http://www.distributed.net
MOSIX:
http://www.mosix.cs.huji.ac.il
PVM: http://www.epm.ornl.gov/pvm/pvm_home.html
Extreme Linux:
http://www.extremelinux.org