O() calculation

When the O() method is employed, it is expected that one can obtain a good parallel efficiency because of the inherent algorithm. A typical MPI execution is as follows:

     % mpirun -np 4 openmx DIA512_DC.dat > dia512_dc.std &

The input file 'DIA512_DC.dat' found in the directory 'work' is for the SCF calculation (1 MD) of the diamond including 512 carbon atoms using the divide-conquer (DC) method. The speed-up ratio in comparison of the elapsed time per MD step is shown in Fig. 25 (a) as a function of the number of processes on a CRAY-XC30 (2.6 GHz/Xeon processors). We see that the parallel efficiency decreases as the number of processors increase, and the speed-up ratio at 128 CPUs is about 84. The decreasing efficiency is due to the decrease of the number of atoms allocated to one processor. So, the weight of other unparallelized parts such as disk I/O becomes significant. Moreover, it should be noted that the efficiency is significantly reduced in non-uniform systems in terms of atomic species and geometrical structure due to disruption of the road balance, while an algorithm is implemented to avoid the disruption. See also the subsections 'DC-LNO method' and 'Krylov subspace method' for further information on parallelization.

**Figure 25:** Speed-up ratio of the elapsed time per MD step in parallel calculations using MPI on a CRAY-XC30 (2.6 GHz Xeon processors) (a) for the carbon diamond including 512 atoms in the supercell by the DC method, (b) for a single molecular magnet consisting of 148 atoms by the cluster method, and (c) for the carbon diamond including 64 atoms in the super cell by the band method with 3 $\times$ 3 $\times$ 3 k-points. For comparison, a line which corresponds to the ideal speed-up ratio is also shown.
$\includegraphics[width=8.0cm]{DIA-MPI.eps}$