Running Parallel Jobs -- OpenMP case

Next: Preparing your input file Up: Parallel Runs Previous: Sample simple PBS start Contents Index

Running Parallel Jobs -- OpenMP case

The programs dscf and ricc2 are also partially parallelized with OpenMP for applications on shared-memory, in particular multi-CPU and multi-core, machines.

The OpenMP parallelization does not need any special program startup. The binaries can be invoked in exactly the same manner as for sequential (non-parallel) calculations. The only difference is, that before the program is started the environment variable OMP_NUM_THREADS has to be set to the number of threads that should be used by the program. The number of threads is is essentially the max. number of CPU cores the program will try to utilize. To exploit e.g. all eight cores of a machine with two quad-core CPUs set

export OMP_NUM_THREADS=8

(for csh and tcsh use setenv OMP_NUM_THREADS=8).

Presently the OpenMP parallelization of ricc2 comprises all functionalities apart from the recently implemented RI-MP2-F12, the LT-SOS-RI-MP2, and the calculation of expectation values for $\hat{{S}}^{2}_{}$ . Note that the memory specified with $maxcor is for OpenMP-parallel calculation the maximum amount of memory that will be dynamically allocated by all threads together. To use your computational resources efficiently, it is recommended to set this value to about 75% of the physical memory available for your calculations, but to at most 16000 (megabytes). (Due to the use of integer*4 arithmetics the ricc2 program is presently limited to 16 Gbytes.)

In the dscf program the OpenMP parallelization covers presently only the Hartree-Fock coulomb and exchange contributions to the Fock matrix in fully integral-direct mode and is mainly intended to be used in combination with OpenMP parallel runs of ricc2. Nevertheless, the OpenMP parallelization can also be used in DFT calculations, but the numerical integration for the DFT contribution to the Fock matrix will only use a single thread (CPU core) and thus the overall speed up will be less good.

Localized Hartree-Fock calculations ( dscf program ) are parallelized using OpenMP. In this case an almost ideal speedup is obtained because the most expensive part of the calculation is the evaluation of the Fock matrix and of the Slater-potential, and both of them are well parallelized. The calculation of the correction-term of the grid will use a single thread.

Restrictions:

In the ricc2 program the parts related to RI-MP2-F12, LT-SOS-RI-MP2 or calculation of expectation values for $\hat{{S}}^{2}_{}$ to not (yet) use OpenMP parallelization. If the OpenMP parallelization is switched on (by setting OMP_NUM_THREADS) these parts will still be executed sequentially.
In the dscf program the DFT part will only be executed sequentially by a single thread and the $incore option will be ignored if more than one thread is used. Semi-direct dscf calculations (i.e. if a size larger than 0 is given two-electron integral scratch file in $scfintunit) can not be combined with the OpenMP parallel runs. (The program will than stop with error message in the first Fock matrix construction.)

Next: Preparing your input file Up: Parallel Runs Previous: Sample simple PBS start Contents Index

TURBOMOLE