Content is scrollable

Welcome to Mad Dog - A beowulf system

If you are accessing this page, you might also want to see my list of U.K. Beowulf sites.

What is Mad Dog

Mad Dog is a cluster of 12 dual processor linux machines. Seven of these are Pro 166 machines (192 MB of memory), the remaining five are PII 400 ones, with 500 MB memory. It is a Beowulf system, based on the original work done by Don Becker and collaborators at NASA, strongly influenced by the Loki and Naegling machines from Los Alamos and CalTech. It is based on commodity components, and free software, which makes this a very cheap alternative to commercial HPC systems.

Configuration

Each of the machines that make up Mad Dog are Dual Pentium Pro 166 Mhz machines, on a gigabyte motherboard, with 196 MB memory each, and a 4.3 GB IDE hard disk. The root machine contains 2 ethernet cards (3COM XL), and all machines are connected through a fast ethernet switch for intraprocess communication (Superstack II 3000) and the outside world. They have a simple video card and are connected to a mechanical keyboard/monitor switch. The new nodes (called englishmen, get it?) are Dual PII 400 machines on a Tyan Tiger motheboard, with 500 MB ram. The new network cards are DLINK 630TX ones.

Each of the machines runs RedHat Linux version 6.0. Supported software includes gcc, g77, pvm and mips. The compute nodes boot with their root file systems NFS mounted from the head (root) machine. The disks on these nodes are at the moment only used for local storage and swap space. Ip-numbers are assigned using rarp, for ease of reconfiguration.

Links to information etc.

Minus points

One of the problems is maintaining such systems - since I already have quite a few linux PCs in my group, that is not a major additional stumbling. More important is the speed, and especially the latency of network connections using ordinary TCP/IP on fast ethernet. There exist of the shelf solutions to this (such as myrinet and SCI), but these are expensive.

There have been some attempt to write special protocols for exchanging information with lower latency. At the moment it seems that VIA (pushed by Intel and Microsoft, amongst others) is the new way to go. The SCI systems have their own software. If I am not mistaken there is a cluster at RAL that implements this technology.

Future

It might be interesting to experiment with special protocols for inter-process communication. To that end I will probably add an additional ethernet card to each machine, and connect these to a fast hub. This link can then be used for standard ip traffic not related to the parallel processing (nfs, telnet, rsh, etc.). It might be smart to look at linux/alpha, since alpha PCs are a lot faster than intels. A final interest is in parallel file systems , which have recently been implemented for linux PCs.



Last Edited 18 June 1999. Niels Walet / Niels.Walet@umist.ac.uk

Back to June 1999 main page