M-grid overview and getting started

Author: Arto Teräs
Status: Final, version 1.2
Date: 2006-02-16

General

The M-grid consists of ten Linux-based PC clusters in seven different cities in Finland. Local unix accounts in each cluster are available only for the members of the group who owns that particular system, with the exception of Sepeli at CSC where any scientist in Finland can apply for an account. However, all systems can be used through the grid interface by anyone who has an account in one of the clusters. The list of clusters and links to web pages with more information of that particular system is below:

Hardware and software

Each cluster consists of one HP Proliant DL585 front end server and a number of HP DL145 computing nodes. Each node has two 1.8-2.2 GHz AMD Opteron processors, 2-8 GB of RAM and 80-320 GB of local disk. Each cluster also has typically 1-2 TB of shared disk. The nodes are connected to each other and the frontend using two gigabit Ethernet networks: one for internal communication of parallel (MPI) jobs and one for NFS file transfers. Each cluster also has one additional server for administrative purposes.

Exceptions are Sepeli, in which half of the nodes have two AMD Opteron dual-core processors, 4 cores total per node, and Brutus, which mainly consists of AMD Athlon MP processors and has only one internal network.

System nameCPUsClockMemory / nodeLocal disksOther
Akaatti302.2 GHz4-8 GB2x160 GBsystem diagram
Ametisti1321.8-2.2 GHz2-4 GB2x80/2x160 GBsystem diagram
Brutus941.6-2.8 GHz2-6 GB80-240 GBDetails at http://brutus.oulu.fi
Jaspis81.8 GHz2 GB1x160 GBsystem diagram
Kivi102.2 GHz4 GB1x160 GBsystem diagram
Kvartsi962.2 GHz4 GB1x160 GBsystem diagram
Opaali241.8 GHz8 GB1x160 GBsystem diagram
Sepeli7682.2 GHz4-8 GB2x80 GBConsists of 256 single core and 256 dual core CPUs, 768 cores total.
Spektroliitti261.8 GHz4 GB1x160 GBsystem diagram
Topaasi242.2 GHz4 GB2x80 GBsystem diagram

The operating system is Rocks Linux 3.2.0 (slightly modified by CSC) which is based on Red Hat Enterprise Linux 3. Each cluster is also running N1 Grid Engine as the local batch queue system and NorduGrid ARC as the Grid middleware. Available end user applications and libraries vary between clusters but all of them should have at least the programs and libraries supported by CSC. Oulu is again an exception because they are not part of the centralized system administration.

You can find some more information of the AMD Opteron platform in the kvartsi user guide at HUT.

Getting a user account

If you belong to one of the groups participating in M-grid, contact your local system administrator to get an account. Others can apply for an account to Sepeli using the normal application procedure at CSC.

Local user accounts to other than the cluster of your own group and Sepeli at CSC are not available. However, all resources can be used through the grid interface.

Logging in and submitting jobs

You can login to the frontend node using any ssh client which supports ssh version 2. Files can be transferred using scp and sftp, or GridFTP when using the grid.

All jobs should be run from the frontend via the batch queuing system so generally there is no need to log in interactively to the compute nodes. Running jobs directly on the compute nodes is strictly forbidden. In some situations it might be necessary to rescue data or manually clean up old files from a compute node. For these purposes you can login to compute nodes from the frontend with ssh. Some clusters (e.g. Sepeli) run scripts which clean up extraneous processes from the compute nodes automatically.

It is also possible to start an interactive sessions to the computing nodes through the batch queueing system. Please use the command qrsh and specify relevant parameters (e.g. -l h_rt=1:00:00 to reserve a one hour session) if needed. These sessions are managed by the queueing system so it's also an acceptable way of running jobs. It is even possible to test parallel jobs this way by requesting a parallel environment and running the start command (e.g. mpirun) manually.

You can find more information about compiling software and submitting jobs from the user guide at HUT, the Sepeli user howto and the preliminary M-grid user's guide.

Guidelines

Contact your local system administrator for more information on usage policy, quotas etc.

Available resources in the Grid

You can see the current status of systems available in the grid at any time using the Grid Monitor on the NorduGrid web site. Note that the monitor shows all systems connected to the whole NorduGrid and you have access to only some of the systems. Finnish grid users should have access to all the M-grid resources and some smaller test resources which are not meant for production runs. The resources of M-grid are not currently available for other NorduGrid users.

In the M-grid, 20% of each cluster is prioritized for grid jobs. In Sepeli, the grid share is 12%. However, the resource allocation is dynamic, which means that grid jobs can fill all the free nodes available in any cluster. Respectively, if there are no grid jobs, locally submitted jobs can use all resources. Jobs submitted directly to the local batch queue system can use only the local computing nodes, but jobs submitted through the Grid interface can go to any cluster which has free nodes or the shortest queue.

The maximum execution time of grid jobs is currently limited to 24 hours and only serial jobs are supported. However, the grid is already intended for production runs and users are encouraged to try the grid interface. Submitting grid jobs does not consume the CPU hour quota of CSC customers.

In addition to the computing resources, there is a Grid Storage Element (SE) se1.ndgf.csc.fi available. It is a special disk server with 2.5 TB of disk space connected to the grid.

For more information on using the grid resources, see the article "Grid-laskentaa M-gridissä" on page 11 of the issue 4/2005 of the @CSC magazine (in Finnish) and the guide Getting Started with Grid Use in M-grid.

Getting help

User support is provided by the system administrators at each M-grid site. CSC users are instructed to contact the CSC Helpdesk. You can also join the M-grid discuss mailing list and send questions there.

Changelog

2005-02-16 Version 1.2. Updated the information related to Sepeli. (AJT)
2005-12-13 Version 1.1. Added a link to the akaatti cluster home page. (AJT)
2005-11-28 Version 1.0. Added information about qrsh, small fixes. (AJT)
2005-11-14 Version 0.9. Initial public version. (AJT)