next up previous contents index
Next: Sample control files Up: Format of Keywords and Previous: Keywords for Module Mpshift   Contents   Index


Keywords for Parallel Runs

On all systems the parallel input preparation is done automatically. Details for the parallel installation are given in Section 3.2.1. The following keywords are necessary for all parallel runs:

$parallel_platform architecture
$numprocs number CPUs

Currently the following parallel platforms are supported:

SMP
for systems with very fast communication; all CPUs are used for the linear algebra part. Synonyms for SMP are:
HP V-Class, SP3-SMP and HP S/X-Class
MPP
for systems with fast communication like Fast-Ethernet, the number of CPUs that will be taken for linear algebra part depends on the size of the matrices. Synonyms for MPP are:
SP3 and linuxcluster
cluster
for systems with slow communication, the linear algebra part will be done on one single node. Synonyms for cluster are:
HP Cluster and every platform that is not known by TURBOMOLE
SGI
similar to SMP, but here the server task is treated differently: the MPI implementation on the SGIs would cause this task to request too much CPU time otherwise.

$numprocs is the number of slaves, i.e. the number of nodes doing the parallel work. If you want to run mpgrad, $traloop has to be equal to or a multiple of $numprocs.

For very large parallel runs it may be impossible to allocate the scratch files in the working directory. In this case the $scratch files option can be specified; an example for a dscf run is given below. The scratch directory must be accessible from all nodes.

$scratch files
   dscf  dens       /home/dfs/cd00/cd03_dens
   dscf  fock       /home/dfs/cd00/cd03_fock
   dscf  dfock      /home/dfs/cd00/cd03_dfock
   dscf  ddens      /home/dfs/cd00/cd03_ddens
   dscf  xsv        /home/dfs/cd00/cd03_xsv
   dscf  pulay      /home/dfs/cd00/cd03_pulay
   dscf  statistics /home/dfs/cd00/cd03_statistics
   dscf  errvec     /home/dfs/cd00/cd03_errvec
   dscf  oldfock    /home/dfs/cd00/cd03_oldfock
   dscf  oneint     /home/dfs/cd00/cd03_oneint

For all programs employing density functional theory (DFT) (i.e. dscf/gradand ridft/rdgrad) $pardft can be specified:

$pardft
       tasksize=1000
       memdiv=0

The tasksize is the approximate number of points in one DFT task (default: 1000) and memdiv says whether the nodes are dedicated exclusively to your job (memdiv=1) or not (default: memdiv=0).

For dscf and grad runs you need a parallel statistics file which has to be generated in advance. The filename is specified with
$2e-ints_shell_statistics file=DSCF-par-stat
or
$2e-ints'_shell_statistics file=GRAD-par-stat
respectively.

The statistics files have to be generated with a single node dscf or grad run. For a dscf statistics run one uses the keywords:

$statistics  dscf parallel
$2e-ints_shell_statistics    file=DSCF-par-stat
$parallel_parameters
       maxtask=400
       maxdisk=0
       dynamic_fraction=0.300000
and for a grad statistics run:
$statistics  grad parallel
$2e-ints'_shell_statistics    file=GRAD-par-stat
$parallel_parameters
       maxtask=400
maxtask is the maximum number of two-electron integral tasks,
maxdisk defines the maximum task size with respect to mass storage (MBytes) and
dynamic_fraction is the fraction of two-electron integral tasks which will be allocated dynamically.

For parallel grad and rdgrad runs one can also specify:

$grad_send_dens
This means that the density matrix is computed by one node and distributed to the other nodes rather than computed by every slave.

In the parallel version of ridft, the first client reads in the keyword $ricore from the control file and uses the given memory for the additional RI matrices and for RI-integral storage. All other clients use the same amount of memory as the first client does, although they do not need to store any of those matrices. This leads to a better usage of the available memory per node. But in the case of a big number of auxiliary basis functions, the RI matrices may become bigger than the specified $ricore and all clients will use as much memory as those matrices would allocate even if that amount is much larger than the given memory. To omit this behaviour one can use:

$ricore_slave integer

specifying the number of MBs that shall be used on each client.

For parallel jobex runs one has to specify all the parallel keywords needed for the different parts of the geometry optimization, i.e. those for dscf and grad, or those for ridft and rdgrad, or those for dscf and mpgrad.


next up previous contents index
Next: Sample control files Up: Format of Keywords and Previous: Keywords for Module Mpshift   Contents   Index
TURBOMOLE