gromacs logo
GROMACS exercises


CSC course, Espoo, Finland, 3-5 february 2004


David van der Spoel
nog een keer

  1. Practical details
  2. Algorithmic tests
    1. Accuracy of simulations
    2. Time step
    3. Cut-off
    4. Particle mesh Ewald
  3. Peptide simulations
    1. Creation of topology
    2. Artificial folding
    3. Simulation in solvent

Practical details

For the purpose of this short course, it is assumed that the user is familiar with Unix/Linux command line tools. To start with the exercise get the exercise.tgz file and unpack it somewhere:
% tar xzvf exercise.tgz

This will give you three subdirectories (ala10, argon, water) that we will use for subsequent exercises. Where appropriate, commands to be type will be written like the line above, where the percent sign marks the command-line prompt, which should not be typed. In order to start a molecular dynamics simulation using GROMACS we need three input files:
  1. The atomic coordinates (and, optionally, velocities) are stored in a file that is conventionally called conf.gro
  2. The molecular topology file (conventionally called topol.top) that describes the chemical composition of the system, including information on the force field parameters, bond lengths, etc.
  3. The molecular dynamics parameter file (conventionally called grompp.mdp) which holds parameters like the number of integration steps, treatment of cut-offs and so on.
With the three files in place we can then start the simulation, which takes two steps. The three files are read by the grompp (gromacs preprocessor program) command, which verifies the consistency and the contents of the files and a new  binary file is produced, conventionally called topol.tpr. The command you want to type is:
% grompp -v
Check the error messages and warnings produced (some of the warnings are harmless). The topol.tpr file is then read by the mdrun (molecular dynamics run) program:
% mdrun -v
(the -v flags in both cases stands for verbose, which gives you additional information). This will take some time depending on the computer you are using. The simulation will produce a number of output files:
  1. traj.trr containing atomic coordinates, velocities and/or forces, in full precision, stored at user-defined time intervals. This file is useful for analysis of velocities, and for restarting/continuing simulations.
  2. ener.edr containing energy terms, temperature, pressure, and some more.
  3. confout.gro, containing the coordinates and velocities at the end of the simulation
  4. traj.xtc, an (optional) trajectory file containing reduced precision coordinates (default accuracy is 1 pm) which is about 3 times smaller in size than the corresponding trr file.
Throughout the exercise, you will find questions in a bold red font. Write your answer on a piece of paper, and save it for later discussion. In case you want to know more about one of the GROMACS programs, you can type the name of the program with the -h (help) flag, e.g.:
% grompp -h
You will then see a help text on the screen plus a description of the meaning of all the options.

Please note: this course uses features of GROMACS that were implemented only in version 3.2. The files as provided will not work with older versions of the software.

ace
ace ace ace ace

Algorithmic tests

Algorithmic details are important for scientist that want to employ computer modeling tools. The sad truth is, that the models, available to the computational chemist, are crude, force field parameters are not perfect (far from it), and computer time is limited. In this exercise we will test some of the basic algorithms using argon and water as test systems. The system specifications for the two liquids are as follows:
System
Argon
Water
Number of molecules
216
216
Box size
6.0 (reduced units)
1.86 nm
Temperature
120 K
300 K
Pressure
1 bar
1 bar
Main interactions
Van der Waals
Coulomb + Van der Waals
Features of the liquid
Little structure
Structured through hydrogen bonds
Potential/model used
Van der Waals, reduced units
Simple point charge

Furthermore, it is assumed that the user will change options in the parameter files (mainly grompp.mdp). For an overview of the parameters in this file see XXX. 

Accuracy of simulations

In principle, computer simulations should follow the important conservation laws:
Conservation of energy can be monitored by checking the ratio of the fluctuations in Total energy and the Kinetic Energy. For a well behaved system the ratio R:
    R = Δ Etot/Δ Ekin
should be less than 0.05.

Time step

The integration time step (variable dt in the mdp file) determines how Newton's equation of motion are integrated (see GROMACS manual chap. 3). The time step should be small enough such that the fundamental vibrations in the system are reproduced accurately. If the time step is too large numerical problems may occur, because atoms come to close to each other. The Van der Waals interaction is modeled by a Lennard-Jones potential, which has the form:
    V = 4 ε [ (σ/r)12 -  (σ/r)6  ]
which implies that the short range repulsion is very strong because of the 12th power. In order to estimate the maximum time step we can approximate the potential by a harmonic one:
    V = 1/2 k ( r - r0 )2
at the potential minimum r0. Compute r0. Given that we work with dimensionless parameters (i.e. σ = 1, ε = 1), compute k. (Hint equate the second derivatives with respect to r). From k, compute the frequency ω of the motion in the potential well. (Hint, in dimensionless units the mass m = 1 as well). Now finally, to simulate reliably a harmonic motion there should be at least 20 time steps in a full period of the motion. What is the maximum time step Δt?

Try to simulate the argon box with time steps of  0.001, 0.002, 0.005, 0.01, 0.015, 0.02. Adapt the number of time steps such that all simulations are equally long (roughly 20 time units). Compute again the factor R as a function of time step. Explain the results. Compare to the analytically determined maximum time step.w

Cut-off

The cut-off is the distance beyond which the interaction is truncated. Obviously, this is an approximation, since there always is an interaction. Go to the argon directory and do simulations with a cut-off of 1.0, 1.5, 2.0 and 2.5 (make subdirectories for each of the tests). Note that there are three different variables for cut-offs in the grompp.mdp file, and you have to change them all.  Perform simulations of 50,000 steps. Note that you have to check for equilibration. Use g_energy -w (which will start xmgrace) to plot the energies. Then rerun g_energy with the -b option (to leave out the first 10 ps), e.g.:
% g_energy -b 10
And note the average and fluctuations for the Total Energy and Kinetic Energy. Compute R from this, and plot R as a function of cut-off. Explain the results.

Particle mesh Ewald

Go to the water directory and check the grompp.mdp. Run two simulations of 500 ps each (if necessary, start the simulations over night, and do the analysis the next day, while you proceed with the next assignment), with:
coulombtype = cut-off
and
coulombtype = pme
Compare the ratio R of the two simulations. Which one is better? Then, using the program g_dielectric compute the dielectric constant (for both simulations):
% g_dipoles -enx ener
Remember that you have to check for equilibration by plotting the energy first. After how much time can the system be considered to be in equilibrium? You may have to add the flag -b XXX to the command line for g_dielectric, depending on the previous answer. Compare the results of the two simulations again. Which one gives you a more credible result? Try to explain why.
Both simulation trajectories can be used for additional analysis. For example, the diffusion constant of water is often quoted in simulation studies:
% g_msd -f traj.xtc -b XXX -n index
Another common analysis is that of the liquid structure. The radial distribution function g(r) gives the probability to find an atom at a distance r from another atom. It is defined in a simulation as:
N-1 = 4 π ρ∫ 0 00  r2 g(r) dr
Where N is the number of atoms and ρ is the number density. Compute the g(r) from both simulations (select OW twice):
% g_rdf -f traj.xtc -b XXX -n index
Is there any difference in these properties between the PME and the cut-off simulations? Remember that you have also computed the dielectric constant. What does the combination of these analyses teach you about how to asess the quality of your simulations?

ace
ace ace ace ace

Peptide simulations

Creation of topology

In this part of the exercise we will look at properties of simulated peptides. We will also try to fold an extended chain into an alpha helix. We start with an extended chain with sequence Ala10 (it is very simple to generate such a chain with the PyMOL software). Check in the ala10 directory, try to visualise the pdb file with gOpenMol.  We import the peptide into GROMACS by running:
 % pdb2gmx -f ala10.pdb -ff oplsaa -ignh -ter
The options mean the following:
First you will we have to perform an energy minimisation:
% editconf -f conf.gro -box 6 6 6 -center 3 3 3 -o b4em
% grompp -v -f em -o em -c b4em

% mdrun -v -s em -c after_em

The minimization use the novel algorithm, called l-bfgs (don't ask me why...), which is far more powerful than the old steepest descent method. It converges to very low energies without problems.
It will be convenient if you make a subdirectory for each of the different simulations that you will do. In the first test make a tpr file using the vacuum.mdp file and the after_em.gro coordinates (under the provision that you made a subdirectory called run1):
% grompp -v -f  vacuum -c after_em -o run1/topol
% cd run1
% mdrun -v

Please view the trajectory with gOpenMol. What happens to the peptide? On what time scale do things happen?

Artificial folding

To speed up the folding a bit, cut-and-paste the following section into your protein topology, straight after the #include "posre.itp" (but outside the #ifdef).

#ifdef ALPHA
[ distance_restraints ]
16      48      1       1       1       0.0     0.3     0.8     1
26      58      1       2       1       0.0     0.3     0.8     1
36      68      1       3       1       0.0     0.3     0.8     1
46      78      1       4       1       0.0     0.3     0.8     1
56      88      1       5       1       0.0     0.3     0.8     1
66      98      1       6       1       0.0     0.3     0.8     1
#endif
                                                                               
#ifdef ALPHA3
[ distance_restraints ]
16      38      1       1       1       0.0     0.3     0.8     1
26      48      1       2       1       0.0     0.3     0.8     1
36      58      1       3       1       0.0     0.3     0.8     1
46      68      1       4       1       0.0     0.3     0.8     1
56      78      1       5       1       0.0     0.3     0.8     1
66      88      1       6       1       0.0     0.3     0.8     1
76      98      1       6       1       0.0     0.3     0.8     1
#endif
                                                                                       
#ifdef LALPHA
[ distance_restraints ]
 8       46      1       1       1       0.0     0.3     0.8     1
18      56      1       2       1       0.0     0.3     0.8     1
28      66      1       3       1       0.0     0.3     0.8     1
38      76      1       4       1       0.0     0.3     0.8     1
48      86      1       5       1       0.0     0.3     0.8     1
58      96      1       6       1       0.0     0.3     0.8     1
#endif
                                                                               
Make subdirectories with the names of each of the #ifdefs above, copy vacuum.mdp to each of the directories, and edit the define line in the vacuum.mdp file:
define = -DALPHA
(etc. for the three directories). Now generate tpr files again, starting from after_em.gro, and the modified vacuum.mdp file. Check the trajectories with gOpenMol.
What is the difference between the three new simulations and the first one, when judged by visual inspection?
We can try to make a quantitaive comparison of the final structures using the root mean square deviation:
% echo 1  1 | g_confrms -f1 alpha/confout.gro -f2 alpha3/confout.gro
(and the other two combinations). Note the RMSD values, and save the pdb files with the overlaid structures. Describe the final structures. What does the quantitative estimate using the RMSD tell you about the quality of "visual inspection"? Which of the three simulations looks most like an alpha helix?

Simulation in solvent

Now add some solvent to the system in the ALPHA directory. Since the structure is extended we can not put it in a box and fill it with solvent, the box would be enormous. We will use the -shell option of genbox:
% genbox -cp after_em -cs spc216 -o b4em_sol.pdb -shell 1.2 -p topol

This will put a shell of 1.2 nm water around the peptide,and update the topology. Look at the resulting structure in b4em_sol.pdb with gOpenMol before you proceed. Is it as you expected? Why, or why not? When we now simulate the solvated peptide with pressure coupling, the system will collapse and then (hopefully) start to fold. Now we need to do a short minimisation before the simulation:
% grompp -v -c b4em_sol -f em -o em_sol
% mdrun -v -s em_sol -c after_em_sol.pdb
Do check the structure after minimization as well.

Put the define = -DALPHA in your solv.mdp file, and the corresponding distance restraints in your topology for solvent simulation.
% grompp -v -f solv -c after_em_sol -o fold_it
% mdrun -v -s fold_it
This simulation will take some more time, so you may need to run it overnight. Analyse the trajectory by viewing it. Then use the g_hbond program to analyse hydrogen bonds. First select protein twice, to get the intramolecular hydrogen bonds, then select protein and water to get the protein-water h-bonds. Compare the numbers, and their fluctuations. Is there a trend in the graphs, or a correlation between them? You can also compute the residence time of the water molecules on the different functional groups of the protein., e.g. the Ala sidechain. For this we will have to make an index file first:
% make_ndx -f after_em_sol << EOF
a OW
q
EOF
% g_hbond -contact -n index -ac -r 0.6 -b 200
In the g_hbond command select Sidechain and OW. The -c option computes the autocorrelation function of the contact time between Cβ and OW. View the file hbac.xvg using xmgrace. Use xmgrace to integrate the ACF, this will give you the correlation time. Is it higher or lower than you expected? Why?
Now do the same analysis for interactions between Carbonyl groups and Amide groups and water oxygens. For these analyses you should put the distance to 0.35, the largest hydrogen bonding distance. Compare the residence times with that on the hydrophobic sidechain. Finally we will compute the helix dipole as a function of time in the vacuum as well as the solvent simulation. Theory says that the helix dipole is roughly equal to the dipole created byt two charges of +0.5 and -0.5 with a separation corresponding to the length of the helix. To do this, first make an index file helix.ndx containing:
[ helix ]
1
Then feed the trajectories and the index file to the dipole program:
% g_dipoles -f traj -n helix -s  fold_it
Compare the results for the different simulations. To convert the results from Debye (as output by the dipole program) to electron nm, divide the theoretical result by 48.3. Measure the length of your helix (using gOpenMol). How does the theoretical result of the helix dipole compare to the simulations? Do you have a full α helix in any of your simulations?

ace
ace ace ace ace

That's all folks. Comments to David van der Spoel.