Using the conventional diagonalization method, OpenMX Ver. 3.9 is capable of performing geometry optimization for systems consisting of 1000 atoms if several hundreds processor cores are available. To demonstrate the capability, one can perform 'runtestL2' as follows:

% mpirun -np 128 openmx -runtestL2 -nt 4Then, OpenMX will run with 7 test files, and compare calculated results with the reference results which are stored in 'work/large2_example'. The following is a result of 'runtestL2' performed using 640 MPI processes and 1 OpenMP threads on a Xeon cluster machine.

1 | large2_example/C1000.dat | Elapsed time(s)= 777.60 | diff Utot= 0.000000007341 | diff Force= 0.000000008795 |

2 | large2_example/Fe1000.dat | Elapsed time(s)= 8181.70 | diff Utot= 0.000000002241 | diff Force= 0.000000011061 |

3 | large2_example/GRA1024.dat | Elapsed time(s)= 927.20 | diff Utot= 0.000000012903 | diff Force= 0.000000004981 |

4 | large2_example/Ih-Ice1200.dat | Elapsed time(s)= 445.88 | diff Utot= 0.000000000216 | diff Force= 0.000000001451 |

5 | large2_example/Pt500.dat | Elapsed time(s)= 2629.20 | diff Utot= 0.000000015832 | diff Force= 0.000000001879 |

6 | large2_example/R-TiO2-1050.dat | Elapsed time(s)= 844.58 | diff Utot= 0.000000002263 | diff Force= 0.000000001108 |

7 | large2_example/Si1000.dat | Elapsed time(s)= 658.53 | diff Utot= 0.000000000404 | diff Force= 0.000000000908 |

Total elapsed time (s) 14464.69

The quality of all the calculations is at a level of production run where double valence plus a single polarization functions are allocated to each atom as basis functions. Except for 'Pt500.dat', all the systems include more than 1000 atoms, where the last number of the file name implies the number of atoms for each system, and the elapsed time implies that geometry optimization for systems consisting of 1000 atoms is possible if several hundreds processor cores are available. The input files used for the calculations and the output files are found in the directory 'work/large2_example'. The following information is compiled from the output files.

No. | Input file | SCF steps | Elapsed time(s/SCF/spin) | Dimension |

1 | large2_example/C1000.dat | 53 | 14.7 | 13000 |

2 | large2_example/Fe1000.dat | 408 | 10.0 | 13000 |

3 | large2_example/GRA1024.dat | 72 | 12.9 | 13312 |

4 | large2_example/Ih-Ice1200.dat | 57 | 7.8 | 9200 |

5 | large2_example/Pt500.dat | 161 | 16.3 | 12500 |

6 | large2_example/R-TiO2-1050.dat | 38 | 22.2 | 15750 |

7 | large2_example/Si1000.dat | 45 | 14.6 | 13000 |

The dimension of the Kohn-Sham Hamiltonian is of the order of 10000, and the elapsed time per SCF step is around 15 seconds for all the systems, implying that the difference in the total elapsed time mainly comes from the difference in the SCF iterations to achieve the SCF convergence of 10e-10 (Hartree) for the band energy.