Parallel Simulations with GROMACS 4.0β

David van der Spoel 2007
Dept. of Cell and Molecular Biology
Uppsala University
spoel at xray.bmc.uu.se

Short Introduction

In this exercise we will show how to perform some parallel simulations on the new Cray XT4 at CSC. The simulations will be done using the development version of GROMACS, which is not yet being distributed because we do not consider it stable enough for production. Nevertheless it is a better demonstration of what parallel computing is about than GROMACS 3.3 and, due to limited time, we have only been able to set up the latest development code.

Computer

You will need an account on the Cray XT4 which is named louhi.csc.fi. In the remainder of the text I will call you USER.

System

You go back to the exercise described in the pre-workshop tutorial. If you don't have them on your system yet, please download the input files to your desktop computer, and make a tpr file:
grompp -o lys1 -c run.gro -f run.mdp
now copy the lys1.tpr file to the Cray:
scp lys1.tpr USER@louhi.csc.fi
(please fill in your real user name, and you will be prompted for your password). Log in to louhi and add this into your .cshrc file:
source ~koulu29/wd/software/bin/GMXRC
Then go to your work directory (because of disk quotas) and make a test directory, and copy the tpr file there:
cd $WRKDIR
mkdir test
cd lys
mv ~/lys1.tpr .
We're getting closer... Now all we need is a job script, try this to start with (create a file and name it lyso.job):
#!/bin/csh -f

#PBS -l size=1
#PBS -j oe

cd $PBS_O_WORKDIR

rm -f lyso.job.o*

module load xt-catamount/1.5.30

setenv DUALCORE 1

yod -VN -sz 2  `which mdrun` -dd  2 1 1 -s lys1  -dlb
Note on the third line the number of nodes, each node has two processors so you will have two processors in this case. In the yod command you have to write the real number of nodes. Then, for mdrun you will use the -dd (domain decompostion) flag with a partitioning of the system, in this case we use two slabs in the X direction, and the whole box in the Y and Z direction. Now we're ready to submit the job to the queue.
qsub lyso.job
You can check how your job is faring using:
qstat -n

Speeding up

Now you have done your first parallel simulation. Now edit the job script to go to 4 or 6 processors. If you get even further you will find that there are some limitations built into mdrun. Since we're using the particle mesh Ewald method with a grid of 60x60x60, the total number of nodes will need to be an integer divisor of 60. See how far you can go in this manner by specifying different arguments to the -dd flag.


Going further

The new mdrun can also use dedicated ndes for PME, you do this by adding the -npme flag with a number as argument. mdrun checks that the total number of nodes is equal to the number of dd nodes plus the number of pme nodes, if you do it incorrect the run will not start. Now the number of pme nodes should be an integer divisor of 60 again. Check again how far you can go. Please note down how much time each run takes (it is written at the bottom of the md0.log file).