Re: Slow DM step with OpenMP parallelization ( No.1 ) |
- Date: 2022/01/14 15:29
- Name: Naoya Yamaguchi
- Hi,
I guess that you forgot to set the environment variable of the OpenMP (e.g. `OMP_NUM_THREADS=2`).
Regards, Naoya Yamaguchi
|
Re: Slow DM step with OpenMP parallelization ( No.2 ) |
- Date: 2022/01/14 16:14
- Name: Naoya Yamaguchi
- Dear Pavel,
I misunderstood your parallelization way. In the cases of 24 MPI and 12 MPI/2 OMP, the 12 MPI/2 OMP case requires a diagonalization only for the DM as explained in http://www.openmx-square.org/openmx_man3.9/node82.html , while the 24 MPI cases doesn't, since the number of k-points to be calculated is 14 in your case. If you do benchmark calculations, you need to consider it.
Regards, Naoya Yamaguchi
|
Re: Slow DM step with OpenMP parallelization ( No.3 ) |
- Date: 2022/01/14 17:00
- Name: Pavel Ondracka <pavel.ondracka@email.cz>
- OK, my bad for not reading the manual well enough. I believe I get it now, thanks a lot for the explanation.
BTW I think there is a small mistake in the manual "In addition, when the number of processes used in the parallelization _exceeds_ (spin multiplicity)$\times $(the number of k-points), OpenMX uses an efficient way in which finding the Fermi level", specifically if I now understand it correctly the "exceeds" should in fact be "is at least", right?
I think it would be also nice if the k-point-process distribution would be written somewhere into the output, the same way the atoms per process distribution is written right now, so that I can actually see I'm doing stupid stuff things. I mean in this case the calculation of the number of irreducible k-points as illustrated in the manual is quite simple, this is P1 so there is just the inversion symmetry, but for something that has actually other symmetries this would be non-trivial IMO.
|
Re: Slow DM step with OpenMP parallelization ( No.4 ) |
- Date: 2022/01/14 18:51
- Name: Naoya Yamaguchi
- Dear Pavel,
>BTW I think there is a small mistake in the manual "In addition, when the number of processes used in the parallelization _exceeds_ (spin multiplicity)$\times $(the number of k-points), OpenMX uses an efficient way in which finding the Fermi level", specifically if I now understand it correctly the "exceeds" should in fact be "is at least", right?
It is right.
>I think it would be also nice if the k-point-process distribution would be written somewhere into the output,
Although I've not tried this, according to the source code, you can get such information when you set `level.of.stdout` to 3.
Regards, Naoya Yamaguchi
|
|