Re: Geometry optimization and restart files in parallel (on many nodes) ( No.1 ) |
- Date: 2011/06/16 21:05
- Name: T.Ozaki
- Hi,
A directory where the restart files are stored has to be shared by NFS, and can be specified by a keyword:
System.CurrrentDirectory
whose default is ./
I guess that the directory you used is a local work directory which can be accessed from only each node.
Regards,
TO
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.2 ) |
- Date: 2011/06/16 23:10
- Name: Mauro Sgroi <maurosgroi@yahoo.it>
- Dear Prof. Ozaki,
thanks a lot for the reply. I will configure my input to save the files on a shared NFS. My only doubt was related to the speed of this type of filesystem and to the related performance of the calculation. Best regards, Mauro.
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.3 ) |
- Date: 2011/06/17 10:41
- Name: T.Ozaki
- Hi,
The reduction of performance depends on the performance of file system you use. In our experiences, the significant reduction of performance has not been observed in parallel calculations using less than 100 cores, although this part could be a bottleneck using many cores, say 1000 cores, as you mentioned.
Regards,
TO
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.4 ) |
- Date: 2011/06/17 16:43
- Name: Mauro Sgroi <maurosgroi@yahoo.it>
- Dear T.Ozaki,
thanks a lot for the information. I will install the code on our calculation cluster following your suggestion. Best regards, Mauro Sgroi.
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.5 ) |
- Date: 2011/06/17 18:59
- Name: Mauro Sgroi <maurosgroi@yahoo.it>
- Dear T.Ozaki,
I'm executing an geometry optimization run (Valorphin_DC.dat) using a NFS shared to all processors and I get the same error message:
******************* MD= 2 SCF= 1 ******************* Failed (2) in reading the restart file /tmp/openmx/mount/val_dc_rst/val_dc.rst24 Failed (2) in reading the restart file /tmp/openmx/mount/val_dc_rst/val_dc.rst29 Failed (2) in reading the restart file /tmp/openmx/mount/val_dc_rst/val_dc.rst30
Is this a normal behavior for a geomtry optimization?
Best regards, Mauro.
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.6 ) |
- Date: 2011/06/20 18:05
- Name: T.Ozai
- Hi,
I do not see such a case using the input file with a little modification so that the geometry optimization can be performed.
If you are really sure that the directory is shared by NFS, I guess that time for synchronization in NFS is too long.
Regards,
TO
|
Re: Geometry optimization and restart files in parallel (on many nodes) ( No.7 ) |
- Date: 2011/06/21 19:25
- Name: Mauro Sgroi <maurosgroi@yahoo.it>
- Hi,
you are right. It was a problem with the permissions to write on the NFS from one of the nodes. Now the code works smoothly. I'm checking the performances because I suspect that our network is too slow to get good scaling. Maybe we need to install an high performance NFS.
Best regards, Mauro Sgroi.
|