M-grid: Cluster Maintenance Project | Guide to Install User Programs |
Version 0.93 |
2005-03-08 |
OlliS/CSC |
This report sets common guidelines for how to install user supported programs (users own programs) and for local administrators how to install locally supported programs to the M-grid clusters. These programs include self-programmed and self-compiled programs, group shared programs and level 1 supported programs. See separate document M-Grid_Maintenance_SystemSoftware.html for a description of the support levels. Also given are instructions on how to name installation directories for uniform Grid use, and how to package locally supported programs as rpm packages for inclusion in the automated installation rolls.
category | disks local to node | visibility | typical example (for installation) |
possibly overwritten during install/boot |
typical person preparing the installation |
1 | the front end | visible only there | /opt/ |
yes | csc-admin (local admin) |
2 | a compute node | visible only there | /opt/ |
yes | csc-admin local admin (or user) |
3 | the front end | visible to the compute nodes (NFS mounted) |
/home/opt/ |
no | local admin (or user) |
/home/<user> |
no | any local user (quite freely) |
The directories of the cluster nodes have different visibilities according to what is NFS mounted and where (see the above table).
Most system directories are in the category 1 or 2. Most of the temporary working disk space is in the category 2, although there is working space at the front end node too (category 1).
The user home directories are the typical example for the category 3, i.e. they reside on the common shared "disk server" disks, and are NFS mounted so that they are visible to all nodes. Unfortunately the NFS mounting mechanism makes the disk access not optimal, so that performance-wise it is better to have heavily used software (especially very large binaries) installed separately to each compute node. In a Rocks cluster environment this is handled automatically by the Rocks installation mechanism, but it requires some central co-ordination and careful planning.
Because of the way Rocks distribution installs the nodes automatically (and possibly overwrites them) and because of the centralized and partly automated maintenance and installation of the front end node, root partition of the front end node and local disks of the compute nodes are not safe places for permanent installation of software, unless the software is explicitly included in the automated rolls. Furthermore, from the users' perspective, the compute node disks are meant mainly for temporary working storage used during calculations.
Hence, generally, the only safe place for the permanent direct
installation of user programs is the shared home directory
/home/
and its subdirectories on the front-end node (or the "disk
server"). All other places can in principle be overwritten without further
notice. Furthermore, the home directory is visible to all nodes.
However, there are mechanisms discussed below to ensure that programs can be installed so that they are automatically re-installed when the unsafe disk areas are overwritten and initialized.
In addition, even the "unsafe" places can be used when testing and debugging an software installation, so that it can later be included in the automatic installation rolls. But in these cases the installation is a temporary one, and could (will) be overwritten during the next install. [N.B. The front end computer is not re-installed (overwritten) very often, but the local disks of the compute nodes might get overwritten quite often.]
In addition of choosing a good place (directory) to install the program, one
should choose carefully the name of the directory and possibly the name of the
executable. It is a good practice (when the program is self-compiled) to
indicate the compiler used to compile the program and the bit-depth (64 or 32)
used when compiling. If the compiler and bit are indicated with an additional
subdirectory level (.../
<comp><bits>/
),
this leads to the following possibilities (when installing to the common user
level directory /home/opt/
):
/home/opt/software_name/pgi64/ (64 bit, compiled with the Portland Group
compiler)
/home/opt/software_name/pgi32/ (32 bit, compiled with the Portland
Group compiler)
/home/opt/software_name/gnu64/ (64 bit, compiled with the GNU
compiler)
/home/opt/software_name/gnu32/ (32 bit, compiled with the GNU
compiler)
Of course it is not necessary to have all these versions of all programs and libraries, but even if only one of the possibilities are required, it is a good programming practice to indicate compiler and bit-depth. Sometimes also some other information should be included in the directory or executable name, e.g. dependency on a certain MPI library.
This naming convention is applicable only to those programs which are self-compilable. Many ready made packages make several assumptions of the installation directory and naming conventions. If these can be changed, one should follow these guidelines, but in general changes to complex installation and compilation scripts should be made only as necessary and with caution.
In Runtime Environment (RE) concept there is no need to have directories (such as /home/opt/bin) with links to the application executables (to handle PATHs).
There are at least following type of programs; classified according to the
extent of usage and support: (the program or library named hilvi
is
used as an example, with added subdirectory /pgi64/
indicating the
used compiler and bit-depth)
CSC supported Level 2 (supp) programs: These are installed by CSC, most of
them to the system level directory /opt/
, in this case
/opt/hilvi/pgi64/
. The directory /opt/
is
reserved for optional programs; and here optional program is a program
optional from the operating system point of view, i.e. something which isn't
included in the Rocks distribution or has been repackaged by CSC.
Level 1 (inst) programs installed by CSC and supported by one group: These
are (at least potentially) used by other groups within other clusters, and
also potentially used in the M-grid. They should be installed to the directory
/opt/
, too:
/opt/hilvi/pgi64/
The installation is done by CSC after the responsible group has tested the installation procedure. The directory /opt could be overwritten, but if the directory /home/install contains the necessary rpm packages, they are then re-installed.
Personalware programs: These are the self-developed Fortran or C programs, or those obtained e.g. from colleagues in source or executable form. The characteristic feature here is that these are run by one person only. Here a safe installation directory is contained within the users home directory, for instance
/home/<user>/hilvi/pgi64/
Any user can install quite freely any desirable software into his or her
own home directory /home/<user>/
for individual use.
Groupware programs: Either self-developed or obtained elsewhere, but used
only by a group within one cluster. A safe place again is contained within the
/home
directory, but since it not anymore used by one person
only, a personal directory is not a good solution. Hence it is recommended
that these programs are installed into some commonly agreed directory. Because
the level 2 supported programs are installed into the system directory
/opt
these user or locally supported programs are
recommended to be installed into the directory /home/opt/
; for
instance
/home/opt/hilvi/pgi64/
It is a local decision, whether all local users can write to the local
installation directory /home/opt/
, or is it only writeable by the
local administrator.
If the program can be used so that there is no need for its visibility to the
compute nodes, it can still be installed into the directory
/home/opt/
; there is usually no harm even if it visible to the
compute nodes.
Some programs are installed on the front-end node, either on the system disks
to various system directories (only by CSC and only if included in the rolls!)
or on the shared disks ("disk server disks"), mostly to the safe directory
/home
and its subdirectories. All these directories are visible on
the compute nodes, i.e. they are exported to the nodes using NFS.
The installation of programs into the front end node (to subdirectories of
system level directory /opt/
or user directories
/home/
and /home/opt/
), while relatively easy, is not
optimal performance-wise. Hence it is recommended that heavily used programs are
installed directly into the local disks of the compute nodes. The recommended
directory is again named /opt/
. Note that these opt-directories
are not mounted to other computers, so that they are always visible only
locally. And because at the initialization time the local disks of the nodes are
automatically rewritten and filled with "new" contents, this has to be done
in coordinated fashion.
The installation on the local node disks can be tried out and debugged any time on any node, however. Just be aware, that the installation will be wiped out next time when the node is reinstalled.
To prepare the installation on the local disk of the compute nodes:
/opt/
/opt/
)
Note that there is absolutely no need to manually install the software to all compute nodes!
FIXME: We'll have to consider whether we want to do package installations, upgrades and other configuration changes by pushing the changes to running nodes, or whether we opt for the node reinstall mechanism. Testing should include at least reinstallation of one node to make sure that the package is properly included in the distribution.
Grid environment itself does not require uniform installation of applications (directory hierarchy, for example), although in the M-grid clusters we aim to do that in order to maximize the synergy in the software maintenance.
An application environment in NorduGrid is initialized using RE script, which is just a regular shell script setting the environment variables, such as PATH, LD_LIBRARY_PATH, etc., required by the application. The same script can be executed directly by local users, providing similar functionality as given by "use <module>", for example.
REs installed in the clusters are automatically advertized in the NorduGrid information system. They are used in the job brokering to direct jobs to resources which meet the job requirements, such as installed application software or software libraries.
RE can be viewed as an interface to the application. Every RE has a www RE Homepage describing the application, providing interface documentation for users and possibly installation instructions for site administrators. Links to all RE Homepages are collected into Runtime Environment Registry maintained by CSC and NDGF.
FIXME: RE scripts are executed in the computing
nodes prior to the actual jobs, so the most natural place for the scripts is in
the shared filesystem, /home/runtimeenvironments
.
For more information, see NorduGrid documentation and Grid Runtime Environment Registry proposal.
When adding programs to the installation system, they have to be transformed into standard (RedHat) rpm installation packages.
FIXME: Add simple packaging instructions.
Local admins can include rpms to be installed to the nodes according
to the instructions
in the Rocks manual. The directory to put the packages is
/home/install/contrib/enterprise/3/public/x86_64/RPMS
and
you'll need to write an XML file listing the packages in
/home/install/site-profiles/3.2.0/nodes
.
The various support levels for programs are described in document M-Grid_Maintenance_SystemSoftware.html.
2005-03-08 Version 0.93. Added instructions where to put RPM
packages to be installed to nodes (ArtoT).
2004-10-08 Version 0.92. Minor edits (OlliS).
2004-10-08 Version 0.92. Material about Grid Runtime Environments
added (Juha Lento).
2004-09-25 Version 0.91. First published version.