Authors: Arto Teräs, Pekka Tolvanen (CSC)
Status: Final, version 1.2
Date: 2004-09-28
Cluster disks should be partitioned according to this document. The RAID configuration on the front end should be already done by the HP person according to the installation and configuration guide. File system type should be ext3 in all partitions except swap.
Location | Mount point | Usage | Size | Comment |
---|---|---|---|---|
system disk | - (swap) | Swap space | 2 GB | |
system disk | / | Root, OS (CSC use) | 10 GB | |
system disk | /opt | Programs not belonging to the core OS | 10 GB | |
system disk | /var | Logs, temporary files | rest | Combined with /tmp |
shared disk | /export/grid | Files related to Grid jobs | 10%, min. 50 GB | |
shared disk | /export | User home directories, data, local software | rest |
Note that it is not possible to combine /var and /tmp partitions during the installation. The /tmp directory will initially reside on the / (root) partition. After the installation is completed, local admins should create a directory /var/tmp and a symbolic link /tmp pointing to it.
Quotas are mainly a question of local policy. A small default could be set by CSC and sites can then manage their own quota requirements and sizes. FIXME: In the current version of the installation package quotas are not enabled by default. A reasonable amount of disk space for grid use is guaranteed by having a separate partition /export/grid for the temporary data files related to grid jobs.
Sites can also decide how they manage the disk space between actual user home directories, optional separate shared data directories under /export and directory /export/home/opt which is reserved for local software installations. Rocks also has a directory /export/home/install where packages to be installed to the nodes are put. Home partition filling up may prevent copying upgrades to this directory and thus be a problem. However, it was considered unnecessarily complicated to make a separate partition for it.
The admin server comes with two 80GB system disks. They should be configured as software RAID-1 (mirror) by the local administrator. This can be done with Disk Druid in the installation program. The partitions should be as follows:
Location | Mount point | Usage | Size | Comment |
---|---|---|---|---|
system disk | - (swap) | Swap space | 1 GB | |
system disk | / | Root, OS, tmp | 10 GB | |
system disk | /export | Home directories, backups | rest | Possibly /var will be moved here too |
Note that there is no intention of exporting any admin server partitions using NFS, but as we use the Rocks frontend installation also for the admin server home directory is by default placed under /export.
In general, it does not matter whether the disk size is 80 GB or 160 GB, but the different RAID configurations require different partitioning. Partitions are summarized in the following table:
Compute node with one disk | ||||
---|---|---|---|---|
Location | Mount point | Usage | Size | Comment |
first disk | - (swap) | Swap space | 1 GB | |
first disk | / | Root, OS | 5 GB | |
first disk | /opt | Programs not belonging to the core OS | 10 GB | |
first disk | /tmp | Temporary files | rest | |
Compute node with two disks as RAID-1 (mirror) | ||||
Location | Mount point | Usage | Size | Comment |
both disks | As in compute nodes with only one disk (identical partitions on both disks) | |||
Compute node with two disks as RAID-0 (stripe) | ||||
Location | Mount point | Usage | Size | Comment |
both disks | As in compute nodes with only one disk (identical partitions on both disks) | |||
The difference to the RAID-1 scheme is that in this case the two /tmp partitions are combined using RAID-0 to form one big partition. Partitions / (root) and /opt are configured as RAID-1 (mirror). |
Based on Mikael Johansson's <mikael.johansson@helsinki.fi> tests, chunk size 128 kt gives optimal performance for the RAID-0 setup when using the ext3 filesystem.
Some groups wondered about whether the swap should be bigger - we don't really see a need for that. Running calculations larger than the actual memory size on cluster nodes is slow, and the nodes are not multi-user boxes where there are normally several unused processes running (which can be moved to swap). If a group still thinks they require a large swap, they could modify the predefined partition table file deducting the extra swap space from the /tmp partition - at least initially groups will need to manually select the right partitioning schema file after frontend installation.
Files implementing these partition schemas in Rocks installation are available in the partitioning_schemas subdirectory.
2004-09-28 Version 1.2. Rocks partitioning schema files added. (AJT)
2004-09-28 Version 1.1. RAID-0 node partitioning changed according
to Mikael Johansson's suggestion. (AJT)
2004-09-27 Version 1.0.