Author: Arto Teräs
Status: Final, version 1.13
Date: 2005-07-28
This guide describes how to install the Rocks cluster distribution including CSC customizations on the M-grid clusters. This guide is brief and concentrates on the M-grid specific things, for more detailed general description of each step refer to the Rocks Users Guide.
The physical installation of the clusters which should be done by the supplier (HP) is described in a separate document titled Installation and Configuration Instructions for the M-grid Clusters.
The preferred method of installation is network install from the CSC distribution server rocks.csc.fi. For this you need only a small network boot image which should be burned on a cd. The fallback method is install from a cd set, burning a complete Rocks installation cd and each roll on a separate cd. All cd images are available from
http://rocks.csc.fi/install/downloads/.
The server is not accessible from the whole world. Contact CSC before installation and give your IP address (or address range) so that we can open access for you.
NOTE 1! We made an error in the installation instructions sent to HP, indicating that the first ethernet interface in admin server and frontend should be connected to the public network. However, Rocks uses the first interface for the compute network, the second interface is for public network. You will probably have to switch the network cables of the two first interfaces before installation to fix this.
NOTE 2! When there is an additional network card installed (front ends and admin nodes) it is a somewhat difficult to predict which card is detected first by the Linux kernel. In the mini cluster at CSC, the supplementary card was detected first in the admin server (DL145) but the internal card was first in the frontend (DL585). Therefore first and second interface in the front end are the ports integrated on the motherboards while in the admin server they are the ports located on the supplementary card. This should be taken account when connecting cables. If the order of detection is reversed for some reason, moving the cables to another port may be necessary for the installation to succeed.
Install first the administration server and then the cluster front end. The basic installation procedure is the same for both. The network installation is described first.
If you are installing from the cd set, the procedure is almost the same, but you won't need to configure network settings for network installation in the beginning. Instead, you'll need to feed the roll cds in one at a time when asked to do so, and then again during the package installation.
Computing nodes are installed by running a daemon on the front-end listening to bootp requests and powering up the nodes one by one. It is not necessary to wait for each installation to finish before starting with next node. However it is a good idea to first install one node until the end and see that it works properly before moving to the other nodes. For example, if partitioning or installing the bootloader fails for some reason, the nodes end up in a state where one needs to manually wipe the hard disks or force booting from the network (and later change the boot order back).
Before starting the node installation, download a partition schema file which suits your hard disk setup. See the disk partitioning document for more details. The following options are available:
The partitioning schema file should be copied as file /export/home/install/site-profiles/3.2.0/nodes/replace-auto-partition.xml on the cluster front-end. After that change to directory /home/install and run command rocks-dist dist. Then move on to the actual installation of the nodes:
NOTE! Rocks assigns names and ip addresses to the nodes in successive order: first node will be called compute-0-0 and have ip 10.11.1.254, second will be compute-0-1 and have ip 10.11.1.253 etc. This should correspond the labels on the computers. In case there's a broken node which doesn't boot, exit the insert-ethers program before powering up the next one. Relaunch insert-ethers with the --rank parameter, indicating which number the next node should get. For example, if the sixth node (labeled compute-0-5) doesn't boot, type insert-ethers --rank=6 and then power up the seventh node.
Later you can reinstall nodes using the shoot-node command. See Rocks manual for details.
Move /tmp directory to /var/tmp so that there is no risk of user temporary files filling up the root partition. This can be done by creating a directory /var/tmp, copying the contents of /tmp there, then moving /tmp to /tmp-old, creating a symbolic link /tmp (ln -s /var/tmp /tmp) and removing /tmp-old. (If the system isn't happy with /tmp disappearing for a while, you can also boot from a rescue cd and do the operation from there.)
Check that the cluster is up and running properly by running links http://localhost/ and selecting cluster status from the list. It should show the total number of CPUs with frontend CPUs included.
Configure the keyboard by running redhat-config-keyboard and selecting Finnish latin1 (if you don't happen to use a us keyboard).
Configure X by running redhat-config-xfree86, which should auto-detect the display card (and hopefully your mouse too). This can be omitted, but if you plan to work locally on the front-end having X available makes many things nicer. For example, you can follow the progress of node reinstallations in xterms.
Configure the firewall settings. A default configuration is provided by CSC in the file /etc/rc.d/rc.firewall. This should have different contents on the admin server and frontend, the admin server one being more strict. Add your own modifications primarily to /etc/rc.d/rc.firewall.local. You will probably want to do at least the following:
Configure the mail server, see a separate guide.
Add local user accounts.
Contact CSC (if you haven't already done so). Proceed to acceptance tests.
Update 2005-08-28: The guide is already a bit outdated: a freshly installed cluster requires a number of updates which are not part of the installation package.
2005-08-27 Version 1.13. Added link to mail server configuration guide, and a mention that a reinstalled server needs updates after the installation procedure.
2004-09-29 Version 1.12. Removed mention of (nonexisting) roll frontend.
2004-09-28 Version 1.11. Added a note about configuring the keyboard and instructions to run the rocks-dist command to partitioning schema instructions. (AJT)
2004-09-28 Version 1.1. Added partitioning schema files. (AJT)
2004-09-27 Version 1.01. Added a link to disk partitioning guide, mention
about software RAID on admin server and instructions how to move /tmp
to /var/tmp (AJT)
2004-09-23 Version 1.0 published.