Beowulf:  The Art of Clusters

                     by:  Stan Cook               

cluster.jpg (49119 bytes)                             beowulf.gif (21152 bytes)

Overview:

The Beowulf Project was started at CESDIS(Center of Excellence in Space Data & Information Sciences), which is operated for NASA by USRA in early 1994. In the summer of 1994 the first Beowulf 16 node cluster was constructed for the Earth and Space Sciences project, (ESS), at the Goddard Space Flight Center (GSFC). The project quickly spread to other NASA sites, other R&D labs and to universities around the world. The project's scope and the number of Beowulf installations have grown over the years.  The Beowulf Project is now hosted by Scyld Computing Corporation, which was founded by members of the original Beowulf team with a mission to develop and support Beowulf systems in the larger commercial arena.

One goal for the original Beowulf was to determine the applicability of massively parallel computers to problems facing the Earth.  The 1st Beowulf was built to  address problems associated with the large data sets that are often involved in Earth and Space science applications.  The recent cost effectiveness and Linux support for high performance networks for PC class machines has enabled the construction of balanced systems built entirely of COTS (Commodity off the shelf) technology.

Access to large machines often means access to a tiny fraction of the resources of that machine shared  by many users.  Building a cluster allows for a system that can be completely controlled and results can be fully utilized in a more effective, higher performance, computing platform.

The Beowulf Project grew from the first Beowulf machine and likewise the Beowulf community has grown from the NASA project. Like the Linux community, the Beowulf community is a loosely organized confederation of researcher and developer. Each organization has its own agenda and its own set of reason for developing a particular component or aspect of the Beowulf system. As a result, Beowulf class cluster computers range from several node clusters to several hundred node clusters. Some systems have been built by computational scientists and are used in an operational setting, others have been built as test-beds for system research and others are serve as an inexpensive platform to learn about parallel programming.

Most people in the Beowulf community are independent, do-it-yourselfers. Since everyone is doing their own thing, the notion of having a central control within the Beowulf community just doesn't make sense. The community is held together by the willingness of its members to share ideas and discuss successes and failures in their development efforts. The mechanisms that facilitate this interaction are the Beowulf mailing lists, individual web pages and the occasional meeting or workshop.

Distinguishing a Beowulf class cluster computer from a NOW (Network Operated Workstation):

First, the nodes in the cluster are dedicated to the cluster. This helps ease load balancing problems, because the performance of individual nodes are not subject to external factors. Also, since the interconnection network is isolated from the external network, the network load is determined only by the application being run on the cluster. This eases the problems associated with unpredictable latency in NOWs. All the nodes in the cluster are within the administrative jurisdiction of the cluster. For examples, the interconnection network for the cluster is not visible from the outside world so the only authentication needed between processors is for system integrity. On a NOW, one must be concerned about network security. Another example is the Beowulf software that provides a global process ID. This enables a mechanism for a process on one node to send signals to a process on another node of the system, all within the user domain. This is not allowed on a NOW. Also, only one monitor and keyboard for a cluster as opposed to a network.  Finally, operating system parameters can be tuned to improve performance. For example, a workstation should be tuned to provide the best interactive feel (instantaneous responses, short buffers, etc), but in cluster the nodes can be tuned to provide better throughput for coarser-grain jobs because they are not interacting directly with users

Performance of Beowulf clusters:

These hard core parallel programmers are first and foremost interested in high performance computing applied to difficult problems. At Supercomputing '96 both NASA and DOE demonstrated clusters costing less than $50,000 that achieved greater than a gigaflop/s sustained performance. A year later, NASA researchers at Goddard Space Flight Center combined two clusters for a total of 199, P6 processors and ran a PVM version of a PPM (Piece-wise Parabolic Method) code at a sustain rate of 10.1 Gflop/s. In the same week (in fact, on the floor of Supercomputing '97) Caltech's 140 node cluster ran an N-body problem at a rate of 10.9 Gflop/s. This does not mean that Beowulf clusters are supercomputers, it just means one can build a Beowulf that is big enough to attract the interest of supercomputer users.

Uses of a Beowulf cluster:

John Hopkins Medical School uses their cluster system primarily for molecular dynamics simulations as part of their research.  They use computer calculations to explore the connections between the  microscopic world and the world of experimental measurements.  The lab is currently focused on protein: lipid interactions and their implications for tertiary structure and binding.

The University of California at Irvine uses a Beowulf cluster for such applications as:

  High Energy Physics simulations

  Numerical Quantum Field Theory

  High Energy data reduction

  Light scattering and surface physics modeling

  Astrophysical Simulations

  High temperature superconductivity

  Plasma Physics simulations

  Applied Mathematics research

  Computational Fluid Dynamics

  Climatic Research and Atmospheric Chemistry

 

Useful links:

www.beowulf.org         Chronicle of Higher Education        Clemson's Grendel

Clustering company       John Hopkins Medical Center        USM's Wiglaf

University of CA at Irvine