The programs dscf and ricc2 are also partially parallelized with OpenMP for applications on shared-memory, in particular multi-CPU and multi-core, machines.
The OpenMP parallelization does not need any special program startup. The binaries
can be invoked in exactly the same manner as for sequential (non-parallel)
calculations. The only difference is, that before the program is started the
environment variable OMP_NUM_THREADS
has to be set to the number of
threads that should be used by the program. The number of threads is
is essentially the max. number of CPU cores the program will try to utilize.
To exploit e.g. all eight cores of a machine with two quad-core CPUs set
export OMP_NUM_THREADS=8
setenv OMP_NUM_THREADS=8
).
Presently the OpenMP parallelization of ricc2 comprises all
functionalities apart from the recently implemented RI-MP2-F12, the LT-SOS-RI-MP2,
and the calculation of expectation values for .
Note that the memory specified with $maxcor is for OpenMP-parallel
calculation the maximum amount of memory that will be dynamically allocated
by all threads together.
To use your computational resources efficiently, it is recommended to set
this value to about 75% of the physical memory available for your calculations,
but to at most 16000 (megabytes). (Due to the use of integer*4 arithmetics
the ricc2 program is presently limited to 16 Gbytes.)
In the dscf program the OpenMP parallelization covers presently only the Hartree-Fock coulomb and exchange contributions to the Fock matrix in fully integral-direct mode and is mainly intended to be used in combination with OpenMP parallel runs of ricc2. Nevertheless, the OpenMP parallelization can also be used in DFT calculations, but the numerical integration for the DFT contribution to the Fock matrix will only use a single thread (CPU core) and thus the overall speed up will be less good.
Localized Hartree-Fock calculations ( dscf program ) are parallelized using OpenMP. In this case an almost ideal speedup is obtained because the most expensive part of the calculation is the evaluation of the Fock matrix and of the Slater-potential, and both of them are well parallelized. The calculation of the correction-term of the grid will use a single thread.
OMP_NUM_THREADS
) these parts will still be
executed sequentially.