In some cases, one may want to know machine performance for more time consuming calculations. For this purpose, an automatic running test with relatively large-scale systems can be performed by
For the MPI parallel running
% mpirun -np 132 openmx -runtestL2For the MPI/OpenMP parallel running
% mpirun -np 132 openmx -runtestL -nt 2Then, OpenMX will run with 16 test files, and compare calculated results with the reference results which are stored in 'work/large_example'. The comparison (absolute difference in the total energy and force) is stored in a file 'runtestL.result' in the directory 'work'. The reference results were calculated using 16 MPI processes of a 2.6 GHz Xeon cluster machine. If the difference is within last seven digits, we may consider that the installation is successful. As an example, 'runtestL.result' generated by the automatic running test is shown below:
1 | large_example/5_5_13COb2.dat | Elapsed time(s)= 29.90 | diff Utot= 0.000000000066 | diff Force= 0.000000000045 |
2 | large_example/B2C62_Band.dat | Elapsed time(s)= 337.18 | diff Utot= 0.000000000030 | diff Force= 0.000000016106 |
3 | large_example/CG15c-Kry.dat | Elapsed time(s)= 40.14 | diff Utot= 0.000000011260 | diff Force= 0.000000415862 |
4 | large_example/DIA512-1.dat | Elapsed time(s)= 25.85 | diff Utot= 0.000000000030 | diff Force= 0.000000006092 |
5 | large_example/FeBCC.dat | Elapsed time(s)= 49.46 | diff Utot= 0.000000000094 | diff Force= 0.000000000010 |
6 | large_example/GEL.dat | Elapsed time(s)= 33.36 | diff Utot= 0.000000000028 | diff Force= 0.000000000001 |
7 | large_example/GFRAG.dat | Elapsed time(s)= 17.98 | diff Utot= 0.000000000315 | diff Force= 0.000000000030 |
8 | large_example/GGFF.dat | Elapsed time(s)= 528.97 | diff Utot= 0.000000000068 | diff Force= 0.000000000349 |
9 | large_example/MCCN.dat | Elapsed time(s)= 45.48 | diff Utot= 0.000000000062 | diff Force= 0.000000000001 |
10 | large_example/Mn12_148_F.dat | Elapsed time(s)= 51.59 | diff Utot= 0.000000000093 | diff Force= 0.000000000076 |
11 | large_example/N1C999.dat | Elapsed time(s)= 85.00 | diff Utot= 0.000000000389 | diff Force= 0.000000000096 |
12 | large_example/Ni63-O64.dat | Elapsed time(s)= 42.77 | diff Utot= 0.000000000111 | diff Force= 0.000000000085 |
13 | large_example/Pt63.dat | Elapsed time(s)= 37.97 | diff Utot= 0.000000000246 | diff Force= 0.000000000139 |
14 | large_example/SialicAcid.dat | Elapsed time(s)= 45.34 | diff Utot= 0.000000000004 | diff Force= 0.000000000005 |
15 | large_example/ZrB2_2x2.dat | Elapsed time(s)= 92.80 | diff Utot= 0.000000000086 | diff Force= 0.000000000002 |
16 | large_example/nsV4Bz5.dat | Elapsed time(s)= 82.71 | diff Utot= 0.000000005296 | diff Force= 0.000000000023 |
The comparison was made using 132 MPI processes and 2 OpenMP threads
(totally 264 cores) on CRAY-XC30.
Since the automatic running test requires large memory, you may encounter
a segmentation fault in case that a small number of cores are used.
Also the above example implies that the total elapsed time is about 26 minutes
even using 264 cores. See also the Section 'Large-scale calculation' for another large-scale
benchmark calculation.
2016-04-03