When the O() method is employed, it is expected that one can obtain a good parallel efficiency because of the inherent algorithm. A typical MPI execution is as follows:
% mpirun -np 4 openmx DIA512_DC.dat > dia512_dc.std &The input file 'DIA512_DC.dat' found in the directory 'work' is for the SCF calculation (1 MD) of the diamond including 512 carbon atoms using the divide-conquer (DC) method. The speed-up ratio in comparison of the elapsed time per MD step is shown in Fig. 25 (a) as a function of the number of processes on a CRAY-XC30 (2.6 GHz/Xeon processors). We see that the parallel efficiency decreases as the number of processors increase, and the speed-up ratio at 128 CPUs is about 84. The decreasing efficiency is due to the decrease of the number of atoms allocated to one processor. So, the weight of other unparallelized parts such as disk I/O becomes significant. Moreover, it should be noted that the efficiency is significantly reduced in non-uniform systems in terms of atomic species and geometrical structure due to disruption of the road balance, while an algorithm is implemented to avoid the disruption. See also the subsections 'DC-LNO method' and 'Krylov subspace method' for further information on parallelization.
|