Beowulf is the legendary hero from the Old English poem of
the same name. In this epic tale, Beowulf saves the Danish kingdom of Hrothgar
from two man-eating monsters and a dragon by slaying each.
Beowulf is used today as a metaphor for a new strategy in
high-performance computing that exploits mass-market technologies to overcome
the oppressive costs in time and money of supercomputing, thus freeing people to
devote themselves to their respective disciplines. Ironically, building a
Beowulf cluster is so much fun that scientists and researchers eagerly roll up
their sleeves to undertake the many tedious tasks involved–at least for the
first time they build one.
The concept of Beowulf clusters originated at the Center of
Excellence in Space Data and Information Sciences (CESDIS), located at the NASA
Goddard Space Flight Center in Maryland. The goal in building a Beowulf cluster
was to create a cost-effective parallel computing system from mass-market
commodity, off-the-shelf components to satisfy specific computational
requirements in the earth and space sciences community.
Clusters of commodity computing systems
The
first Beowulf cluster was built from 16 Intel DX4 processors connected by a
channel-bonded 10 Mbps Ethernet, and it ran Linux. It was an instant success,
demonstrating the concept of using a commodity cluster as an alternative choice
for high-performance computing (HPC). After the success of the first Beowulf
cluster, several more were built by CESDIS using several generations and
families of processors and network interconnects.
At Supercomputing ’96, a supercomputing conference
sponsored by the Association for Computing Machinery (ACM) and the Institute of
Electrical and Electronics Engineers (IEEE), both NASA and the US Department of
Energy demonstrated clusters costing less than $50,000 that achieved greater
than a gigaflop-per-second sustained performance. As of November 1999, the
fastest Beowulf-class cluster, Cplant (Computational Plant) at Sandia National
Laboratory, ranked 44th in the Top 500 supercomputer list.
While many commercial supercomputers use the same processor,
memory, and controller chips that are employed by Beowulf clusters, they also
integrate proprietary gluing technologies (for example, interconnection
networks, special I/O subsystems, and advanced compiler technologies) that
greatly increase cost and development time. On the other hand, Beowulf clusters
only use mass-market components and are not subject to delays and costs from
custom parts and proprietary design.
Logical view of a Beowulf cluster
A Beowulf cluster uses a multicomputer architecture. It
features a parallel computing system that usually consists of one or more master
nodes and one or more compute nodes, or cluster nodes, interconnected via widely
available network interconnects. All of the nodes in a typical Beowulf cluster
are commodity systems–PCs , workstations, or servers–running commodity
software such as Linux.
The master node acts as a server for Network File System (NFS)
and as a gateway to the outside world. As an NFS server, the master node
provides user file space and other common system software to the compute nodes
via NFS. As a gateway, the master node allows users to gain access through it to
the compute nodes. Usually, the master node is the only machine that is also
connected to the outside world using a second network interface card (NIC).
The sole task of the compute nodes is to execute parallel
jobs. In most cases, therefore, the compute nodes do not have keyboards, mice,
video cards, or monitors. All access to the client nodes is provided via remote
connections from the master node. Because compute nodes do not need to access
machines outside the cluster, nor do machines outside the cluster need to access
compute nodes directly, compute nodes commonly use private IP addresses, such as
the 10.0.0.0./8 or 192. 168.0.0/16 address ranges.
From a user’s perspective, a Beowulf cluster appears as a
Massively Parallel Processor (MPP) system. The most common methods of using the
system are to access the master node either directly or through telnet or remote
login from personal workstations. Once on the master node, users can prepare and
compile their parallel applications, and also spawn jobs on a desired number of
compute nodes in the cluster.
Applications must be written in parallel style and use the
message-passing programming model. Jobs of a parallel application are spawned on
compute nodes, which work collaboratively until finishing the application.
During the execution, compute nodes use standard message-passing middle-ware,
such as Message Passing Interface (MPI) and Parallel Virtual Machine (PVM), to
exchange information.
Applications for Beowulf clusters
Since a Beowulf cluster is an MPP system, it suits
applications that can be partitioned into tasks, which can then be executed
concurrently by a number of processors. These applications range from high-end,
floating-point intensive scientific and engineering problems to commercial
data-intensive tasks. Uses of these applications include ocean and climate
modeling for prediction of temperature and precipitation, seismic analysis for
oil exploration, aerodynamic simulation for motor and aircraft design, and
molecular modeling for biomedical research.
Design choices for building a Beowulf cluster
Beowulf is not a special software package, new network
topology, or the latest Linux kernel pack. Beowulf is a concept of clustering
commodity computers to form a parallel, virtual supercomputer. You can easily
build a unique Beowulf cluster from those components that you consider most
appropriate. No Beowulf cluster exists that is general enough to satisfy
everyone’s needs–there are numerous design choices for building a Beowulf
cluster. The bottomline is that a Beowulf cluster must be built to perform your
applications the best (see box "Design considerations for Beowulf
clusters").
The next steps for Beowulf clusters
Beowulf-class clusters have successfully penetrated the HPC
market, which has traditionally been dominated by expensive, "big iron
boxes" from IBM, Cray, Silicon Graphics, and others. Beowulf clusters
provide an alternative for building cost-effective, high-performance parallel
computing systems.
We expect Beowulf to continue its rapid deployment to fulfill
the need for technical computation. We also anticipate that more commercial
parallel applications such as applications for computational finance and data
mining will find their way into Beowulf clusters.
Jenwei Hsieh is a lead engineer on the Internet
Infrastructure Technologies team and Cluster Development
Group at Dell. Reprinted with permission from "Dell Power Solutions"