In some cases, one may want to know machine performance for more time consuming calculations. For this purpose, an automatic running test with relatively large-scale systems can be performed by
For the MPI parallel running
% mpirun -np 128 openmx -runtestLFor the OpenMP/MPI parallel running
% mpirun -np 128 openmx -runtestL -nt 2Then, OpenMX will run with 16 test files, and compare calculated results with the reference results which are stored in 'work/large_example'. The comparison (absolute difference in the total energy and force) is stored in a file 'runtestL.result' in the directory 'work'. The reference results were calculated using 16 MPI processes of a 2.6 GHz Xeon cluster machine. If the difference is within last seven digits, we may consider that the installation is successful. As an example, 'runtestL.result' generated by the automatic running test is shown below:
1 | large_example/5_5_13COb2.dat | Elapsed time(s)= 39.43 | diff Utot= 0.000000000013 | diff Force= 0.000000000046 |
2 | large_example/B2C62_Band.dat | Elapsed time(s)= 572.22 | diff Utot= 0.000000000025 | diff Force= 0.000000013928 |
3 | large_example/CG15c-Kry.dat | Elapsed time(s)= 40.71 | diff Utot= 0.000000002112 | diff Force= 0.000000001090 |
4 | large_example/DIA512-1.dat | Elapsed time(s)= 37.93 | diff Utot= 0.000000169524 | diff Force= 0.000000033761 |
5 | large_example/FeBCC.dat | Elapsed time(s)= 81.55 | diff Utot= 0.000000000649 | diff Force= 0.000000001349 |
6 | large_example/GEL.dat | Elapsed time(s)= 47.05 | diff Utot= 0.000000000066 | diff Force= 0.000000000002 |
7 | large_example/GFRAG.dat | Elapsed time(s)= 24.05 | diff Utot= 0.000000000122 | diff Force= 0.000000000015 |
8 | large_example/GGFF.dat | Elapsed time(s)= 639.31 | diff Utot= 0.000000000051 | diff Force= 0.000000000243 |
9 | large_example/MCCN.dat | Elapsed time(s)= 53.72 | diff Utot= 0.000000009994 | diff Force= 0.000000016474 |
10 | large_example/Mn12_148_F.dat | Elapsed time(s)= 76.58 | diff Utot= 0.000000000096 | diff Force= 0.000000000090 |
11 | large_example/N1C999.dat | Elapsed time(s)= 97.56 | diff Utot= 0.000000006902 | diff Force= 0.000000007356 |
12 | large_example/Ni63-O64.dat | Elapsed time(s)= 78.00 | diff Utot= 0.000000000782 | diff Force= 0.000000000047 |
13 | large_example/Pt63.dat | Elapsed time(s)= 60.40 | diff Utot= 0.000000002147 | diff Force= 0.000000000059 |
14 | large_example/SialicAcid.dat | Elapsed time(s)= 47.80 | diff Utot= 0.000000000005 | diff Force= 0.000000000003 |
15 | large_example/ZrB2_2x2.dat | Elapsed time(s)= 143.16 | diff Utot= 0.000000000030 | diff Force= 0.000000000003 |
16 | large_example/nsV4Bz5.dat | Elapsed time(s)= 104.20 | diff Utot= 0.000000010770 | diff Force= 0.000000000605 |
The comparison was made using 128 MPI processes and 4 OpenMP threads
(totally 256 cores) on CRAY-XC30.
Since the automatic running test requires large memory, you may encounter
a segmentation fault in case that a small number of cores are used.
Also the above example implies that the total elapsed time is about 36 minutes
even using 256 cores. See also the Section 'Large-scale calculation' for another large-scale
benchmark calculation.