Conventional scheme

Using the conventional diagonalization method, OpenMX Ver. 3.8 is capable of performing geometry optimization for systems consisting of 1000 atoms if several hundreds processor cores are available. To demonstrate the capability, one can perform 'runtestL2' as follows:

     % mpirun -np 128 openmx -runtestL2 -nt 4

Then, OpenMX will run with 7 test files, and compare calculated results with the reference results which are stored in 'work/large2_example'. The following is a result of 'runtestL2' performed using 264 MPI processes and 2 OpenMP threads on CRAY-XC30.

1 large2_example/C1000.dat Elapsed time(s)= 767.64 diff Utot= 0.000000000560 diff Force= 0.000000006188
2 large2_example/Fe1000.dat Elapsed time(s)= 8427.51 diff Utot= 0.000000006708 diff Force= 0.000000002148
3 large2_example/GRA1024.dat Elapsed time(s)= 1418.26 diff Utot= 0.000000006759 diff Force= 0.000000002854
4 large2_example/Ih-Ice1200.dat Elapsed time(s)= 304.88 diff Utot= 0.000000000209 diff Force= 0.000000000213
5 large2_example/Pt500.dat Elapsed time(s)= 2552.29 diff Utot= 0.000000005013 diff Force= 0.000000000203
6 large2_example/R-TiO2-1050.dat Elapsed time(s)= 1716.23 diff Utot= 0.000000000600 diff Force= 0.000000000200
7 large2_example/Si1000.dat Elapsed time(s)= 814.97 diff Utot= 0.000000001037 diff Force= 0.000000000478
Total elapsed time (s) 16001.80

The quality of all the calculations is at a level of production run where double valence plus a single polarization functions are allocated to each atom as basis functions. Except for 'Pt500.dat', all the systems include more than 1000 atoms, where the last number of the file name implies the number of atoms for each system, and the elapsed time implies that geometry optimization for systems consisting of 1000 atoms is possible if several hundreds processor cores are available. The input files used for the calculations and the output files are found in the directory 'work/large2_example'. The following information is compiled from the output files.

No. Input file SCF steps Elapsed time(s/SCF/spin) Dimension
1 large2_example/C1000.dat 46 16 13000
2 large2_example/Fe1000.dat 343 16 13000
3 large2_example/GRA1024.dat 67 15 13312
4 large2_example/Ih-Ice1200.dat 32 6 9200
5 large2_example/Pt500.dat 190 16 12500
6 large2_example/R-TiO2-1050.dat 65 31 15750
7 large2_example/Si1000.dat 46 16 13000

The dimension of the Kohn-Sham Hamiltonian is of the order of 10000, and the elapsed time per SCF step is around 16 seconds for all the systems, implying that the difference in the total elapsed time mainly comes from the difference in the SCF iterations to achieve the SCF convergence of 10e-10 (Hartree) for the band energy.