Taisuke Ozaki (ISSP), Masahiro Fukuda (ISSP), and Gengping Jiang (Wuhan Univ. of Sci. and Tech.)
First-principles electronic structure calculations based on the density functional theories (DFT) [1,2]
have been playing a versatile role in a wide variety of materials sciences to deeply understand physical and chemical properties of
existing materials and even to design novel materials having a desired property before actual experiments [3,4,5,6].
In recent years more complicated materials with secondary structures such as heterointerfaces [7,8] and
dislocations [9] have been becoming
a scope of application by DFT calculations. Since these complicated structures cannot be easily modeled by a small unit cell,
development of efficient DFT methods in accordance with development of massively parallel computers is crucial to realize
such large-scale DFT calculations.
Among efficient DFT methods [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,26,25,27,28,29,30,31,32,33,34],
O() methods whose computational cost scales linearly as a function of the number of atoms
have enabled us to extend the applicability of DFT to large-scale systems
[12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34].
Nevertheless applications of the O(
) methods to metallic systems have been still limited because of the fundamental
difficulty of a truncation scheme in real space, which is an idea commonly adopted in most of the O(
) methods
[12,13], in realizing the O(
) methods [14] as discussed in the Appendix.
A theoretically proper approach to go beyond the truncation scheme is to take account of the contribution
from the external region beyond the truncated region via a self energy in Green function formalisms
[26,27,28,29,30].
Another straightforward approach is to use a relatively large cutoff radius in the truncation scheme
in order to reach a sufficient accuracy. The latter approach might be suited to the divide-conquer (DC) method [31,32]
among the O(
) methods proposed so far in the following twofold aspects:
(i) There is a way to reduce the computational cost in the framework of the DC method by lowering
the dimension of matrices with an introduction of a Krylov subspace [33,34].
(ii) The computational time of the DC method can be reduced by a massive parallelization, since the calculations
in the DC method are performed nearly independently for each atom [35,36].
With the two aspects the improvement of the DC method can be a promising direction to develop an accurate, efficient,
and robust O(
) method applicable to not only insulators and semi-conductors, but also metals.
Along this line, the DC method based on the Krylov subspace has been proven to be an efficient and accurate O(
) method
by a wide variety of applications such as dynamics of Li ions in a lithium ion battery [37,38,39]
and structure optimization of semicoherent heterointerfaces in steel [7,8].
However, there exist drawbacks in the generation of the Krylov subspace [34].
Since the Krylov subspace is generated at the first self-consistent field (SCF) step and kept unchanged
during the subsequent SCF calculation,
calculated quantities such as the electron density and total energy depend on the initial guess of electron density or
Hamiltonian matrix elements. If the Krylov subspace is regenerated every SCF step to avoid the dependency on
the initial guess, the computational efficiency must be largely degraded.
In addition the iterative calculations in the generation of the Krylov subspace tend to suffer from numerical
round-off error, leading to an uncontrollable behavior in the SCF calculation if the Krylov subspace is regenerated
every SCF step [34].
Therefore, a more robust approach needs to be developed to improve the DC method, which overcomes the drawbacks
inherent in the DC method based on the Krylov subspace.
In this paper, we focus on localized single-particle natural orbitals (LNOs) calculated by a low-rank approximation
to perform a coarse graining of basis functions, and apply the LNOs to the DC method, which will be referred to as
the DC-LNO method hereafter, to reduce the dimension of matrices without sacrificing the accuracy.
The DC-LNO method overcomes the drawbacks in the DC method based on the Krylov subspace, while taking account of reduction of
the prefactor in the computational cost and a simple algorithm with less communication leading to a high parallel efficiency.
A series of benchmark calculations clearly demonstrates that the DC-LNO method is an accurate, efficient, and robust O()
method applicable to not only insulators and semi-conductors, but also metals in step with recent development of massively
parallel computers.
The paper is organized as follows: In the section II we propose a method to generate LNOs, and present the DC-LNO method as an extension of the DC method. In the section III the implementations of the method are discussed in detail. In the section IV a series of benchmark calculations is presented. In the section V we summarize the theory of the DC-LNO method and numerical aspects.
We consider an extension of a divide-conquer (DC) approach [31,32] by introducing a coarse graining
of basis functions as shown in Fig. 1. For each atom the Kohn-Sham (KS) Hamiltonian and overlap matrices are truncated
within a given cutoff radius, and the resultant truncated cluster problem is solved atom by atom, leading to the O() scaling
in the computational cost. As expected the error of truncation can be systematically reduced in exchange for
the increase of computational cost as the cutoff radius increases.
The number of atoms in a truncated cluster exceeds 300 atoms in many cases in order to attain a sufficient accuracy
as discussed later on. To reduce the computational cost we introduce a coarse graining of basis functions that
the original basis functions, pseudo-atomic orbitals (PAOs) in our case [40,41], are replaced
by localized single-particle natural orbitals (LNOs)
in the long range (yellow) region to represent the Hamiltonian and overlap matrices, while the PAO functions remain unchanged
in the short range (orange) region.
In the following subsections a method of generating LNOs and a DC method with LNOs are discussed in detail.
![]() |
We present a method of generating LNOs based on a low-rank approximation via a local eigendecomposition
of a projection operator. The method might be applicable to any local basis functions, and not limited
to the application to the DC method we discuss in the paper.
Even in the conventional O() calculations, the LNOs can be easily obtained by a non-iterative calculation
using the density matrix and overlap matrix elements. Since a smaller number of LNOs well reproduce dispersion
of occupied bands, they can be an alternative basis set for a compact representation for the Hamiltonian and
overlap matrices.
Though in this subsection we present a method of calculating LNOs in a general form starting from Bloch functions
to clarify a mathematical basis of LNOs, it should be clearly noticed in this place that the density matrix is
directly calculated in the DC-LNO method without calculating the Bloch functions.
The general formulation presented here provides a theoretical basis of three step algorithm which will be discussed
in the end of the subsection.
Under the Born-von Karman boundary condition
we expand a Kohn-Sham (KS) orbital
,
indexed with a k-vector
in the first Brillouin zone and band index
, using PAOs
[33,34],
being a real function, as
![]() |
(1) |
![]() |
![]() |
![]() |
(2) |
![]() |
![]() |
![]() |
(3) |
![]() |
![]() |
![]() |
(4) |
![]() |
![]() |
![]() |
(5) |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
(10) |
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
(11) |
![]() |
![]() |
![]() |
(12) |
![]() |
![]() |
![]() |
(13) |
![]() |
![]() |
![]() |
(16) |
![]() |
![]() |
![]() |
(17) |
The computational procedure to generate LNOs is summarized as follows:
(i) Calculation of .
For each atom
the matrix
is calculated by Eq. (17),
where the summation over
is limited within a finite range because of the locality of PAOs in real space.
(ii) Diagonalization of
.
Since the matrix
is nonsymmetric, the diagonalization in
Eq. (20) is performed by a generalized eigenvalue solver for a nonsymmetric matrix
such as DGEEV in LAPACK [48].
(iii) Selection of
.
Eigenvectors
whose eigenvalues are larger than or equal to the threshold
value
are selected as LNOs.
Only the overlap and density matrices are required to calculate LNOs through the steps (i)-(iii) above.
Therefore, either conventional O(
) methods
or O(
) methods can be employed as eigenvalue solver as long as they generate the density matrix.
As for LNOs other than those in the central cell with
, it is apparent from the derivation that
one can obtain
by parallel translation of
with the lattice vector
.
The method can also be extended to choose another energy window.
In the projection operator
defined by Eq. (7), the Fermi-Dirac function is introduced
to choose the occupied space. However, one can choose a proper energy window in the definition of
the projection operator
depending on what we discuss, which enables us to focus on specific bands
such as localized
-bands near the Fermi level. In this sense, LNOs can be utilized like
Wannier functions (WFs) [49,50],
while WFs are obtained through a unitary transformation of Bloch functions rather than the low-rank approximation,
and they are orthonormal each other unlike LNOs. It is also worth pointing out that our method shares
the basic idea based on the projection with other methods such as the quasiatomic orbitals scheme [51,52,53],
a projection method [54], and a method via selected columns of the density matrix [55,56].
It might be possible for the method we present in the paper to be applied for other localized basis functions such as finite element methods [57,58] and finite difference methods [59,60,61]. In those cases one may introduce spatial partitioning methods such as Voronoi tessellation to decompose basis functions rather than focusing on basis functions on a single grid. A similar procedure can be applied for a set of partitioned basis functions.
Here we consider an extension of the DC method [31,32] using LNOs discussed in the previous subsection.
Our theoretical basis to formulate the O() DC method is that the total energy and atomic forces
in the KS framework can be calculated by using electron density
, density matrix
,
and energy density matrix
defined by
The overall procedure of the DC method with LNOs is summarized as follows:
(i) Calculation of LNOs. LNOs are calculated by Eq. (20) for all atoms in the central
cell with
. At every SCF step LNOs are updated, leading to self-consistent determination of LNOs.
(ii) Construction of
and
.
For each atom
the KS and overlap matrices for a truncated cluster associated with the atom
are constructed by Eq. (26).
(iii) Diagonalization of
by making use of a parallel eigenvalue solver.
(iv) Finding a common chemical potential to conserve the total number of electrons in the system using Eq. (27),
as discussed in Ref. [34] in detail.
(v) Calculation of density matrix
and energy density matrix
using Eqs. (24), (25),
and (27).
(vi) Calculation of electron density
using Eq. (23).
We have implemented the DC-LNO method into the OpenMX DFT software package [66] which is based on norm-conserving
pseudopotentials (PPs) [67,68] and optimized pseudo-atomic orbitals (PAOs) [40,41] as basis set.
All the benchmark calculations were performed with a computational condition of a production level.
The basis functions used are C6.0-s2p2d1, Si7.0-s2p2d1, Ti7.0-s2p2d1, O6.0-s2p2d1, Li8.0-s3p2, Al7.0-s2p2d1, and
Fe5.5-s3p2d2 for carbon, silicon, titanium, oxygen, lithium, aluminum, and iron, respectively,
where in the abbreviation of basis functions such as C6.0-s2p2d1, C stands for the atomic symbol,
6.0 the cutoff radius (Bohr) in the generation by the confinement scheme, and s2p2d1 means the employment
of two, two, and one optimized radial functions for the -,
-, and
-orbitals, respectively.
The radial functions were optimized by a variational optimization method [40].
These basis functions we used can be regarded as double zeta plus polarization
basis sets if we follow the terminology of Gaussian basis functions.
As valence electrons in the PPs we included
and
,
and
,
,
,
, and
,
and
,
,
, and
,
and
, and
,
,
, and
states for carbon, silicon, titanium, oxygen, lithium, aluminum,
and iron, respectively.
All the PPs and PAOs we used in the study were taken from the database (2013) in the OpenMX website [66], which
were benchmarked by the delta gauge method [69]. Real space grid techniques are used for the numerical integrations
and the solution of the Poisson equation using FFT with the energy cutoff of 300 Ryd [70].
We used a generalized gradient approximation (GGA) proposed by Perdew, Burke, and Ernzerhof
to the exchange-correlation functional [65]. An electronic temperature of 300 K is used to count the number
of electrons by the Fermi-Dirac function for all the systems we considered.
The short and long range regions depicted in Fig. 1 are determined as follows:
(i) We first pick up atoms in a sphere with a given cutoff radius .
(ii) Among the atoms selected by the step (i) we distinguish the first neighboring atoms (FNAs)
having non-zero overlap with the central atom in terms of basis functions, and
remaining atoms other than FNAs are called the second neighboring atoms (SNAs), where
the number of FNAs and SNAs are
and
, respectively.
(iii) The short range region is determined by adjusting a cutoff radius
so that the number of atoms in a sphere with a radius of
can be
as close as possible to
,
where the parameter
can vary from 0 to 1, and
we will discuss the choice of
later on.
(iv) The long range region consists of
remaining atoms other than atoms selected by the step (iii).
If we assign FNAs to atoms in the short range region, the total energy does not converge to
the numerically exact one calculated by the conventional diagonalization method even if
the cutoff radius
increases systematically. This is because the error
with the low-rank approximation by Eq. (22) keeps increasing
as the cutoff radius increases. To avoid the situation, we add a buffer region consisting
of about
atoms as described by the step (iii) above, which guarantees
the convergence of the total energy and other quantities as a function of the cutoff radius
.
Throughout the study, we used
of
for all the systems by taking the accuracy
into account more than the efficiency, and did not adjust the parameter, while a smaller value,
which well balances both the accuracy and efficiency, can be employed for some systems.
The way of parallelization for the DC-LNO method on parallel computers will be discussed together with its benchmark calculations later on.
In order to investigate to what extent LNOs can span occupied spaces, we compare band dispersions
of gapped and metallic systems calculated with PAOs and LNOs. Figure 2(a) and (b) show
band dispersions of diamond and silicon calculated by a conventional O() diagonalization method
with PAOs and LNOs. For both the cases the SCF calculations were performed by using PAOs.
For the case of LNOs the band dispersions were calculated with the LNOs after the SCF calculations
with PAOs. The number of LNOs per atom is 4 for both carbon and silicon atoms.
It is found that in both the cases the band dispersions of occupied space are well reproduced
with LNOs compared to those calculated by PAOs, while a large difference can be seen in conduction bands
between PAOs and LNOs as expected. The good agreement between PAOs and LNOs in describing
the occupied bands implies that the low-rank approximation by Eq. (22)
is reasonably valid. As shown in Table I we see that the first four eigenvalues of the matrix
are actually dominant for both diamond and silicon, justifying the low-rank approximation.
The largest and the next three eigenvalues correspond to a
-orbital and
-like orbitals
deformed by contribution of
-orbitals, respectively.
It is also noted that the eigenvalues can be negative, which is related to a negative value
of Mulliken populations for delocalized orbitals [71,72].
As well as the gapped systems, similar calculations were performed for metals, lithium
in the body centered cubic (BCC) structure and aluminum in the face centered cubic (FCC) structure
as shown in Figs. 3 and 4, respectively.
The band dispersions calculated with the minimal LNOs are reasonably compared to those by PAOs,
while the use of the five and nine LNOs for Li and Al atoms fully reproduce the band dispersions including conduction
bands as shown in Figs. 3(b) and 4(b), respectively.
One can confirm again in Table I
that eigenvalues for the minimal LNOs are dominant even for metals, while the magnitude of
the subsequent eigenvalues is relatively large compared to those of the gapped systems.
Thus, we conclude that LNOs can be regarded as a compact basis set spanning well the occupied space
for both gapped and metallic systems.
![]() |
![]() |
![]() |
Diamond | Si | Li | Al | |||||
![]() |
0.514 | 0.639 | 0.999 | 0.551 | ||||
![]() |
0.483 | 0.430 | 0.277 | 0.253 | ||||
![]() |
0.483 | 0.430 | 0.075 | 0.253 | ||||
![]() |
0.483 | 0.430 | 0.075 | 0.253 | ||||
![]() |
0.012 | 0.018 | 0.074 | 0.049 | ||||
![]() |
0.012 | 0.018 | 0.001 | 0.049 | ||||
![]() |
0.012 | 0.018 | 0.001 | 0.049 | ||||
![]() |
0.011 | 0.012 | 0.001 | 0.033 | ||||
![]() |
0.011 | 0.012 | -0.002 | 0.033 | ||||
![]() |
-0.001 | 0.002 | - | -0.002 | ||||
![]() |
-0.001 | 0.002 | - | -0.002 | ||||
![]() |
-0.001 | 0.002 | - | -0.002 | ||||
![]() |
-0.018 | -0.011 | - | -0.015 |
As a first step of validation of the DC-LNO method with respect to computational accuracy and efficiency,
we show in Fig. 5 the absolute error in the total energy of gapped and metallic systems calculated
by the DC and the DC-LNO methods, where the reference energies were calculated by the conventional
O() method with dense k-points for the Brillouin zone sampling as given in the caption of Figs. 2-4.
For all the cases the threshold value
in Eq. (22) was set to be 0.1
which gives us the minimal LNOs corresponding to orbitals of valence electrons.
In the gapped systems, diamond, silicon, and rutile TiO
, the absolute error decreases almost
exponentially as the number of atoms in a truncated cluster increases. The overall behaviors of
the error between the DC and DC-LNO methods are similar to each other, while the error
by the DC-LNO method is accidentally much smaller than that by the DC method in the case of large truncated
clusters of diamond. As well as the gapped systems the absolute error for metals calculated by the DC-LNO
method decreases as increasing the number of atoms in a truncated cluster in a similar way to the DC method.
The relatively large oscillating behavior observed in the metals might be related to long range characteristics
of the off-diagonal Green functions as discussed in the Appendix.
For all the cases including metals it is found that a truncated cluster including about 300 atoms is required
to attain the millihartree accuracy corresponding to the error less than a few millihartree/atom in the total energy.
From the comparison between the DC and DC-LNO methods, we see that the computational accuracy does not degrade largely
even if basis functions for atoms in the long range region are approximated by LNOs, and that thereby the computational
accuracy can be controlled mainly by the size of truncated cluster just like for the DC method.
Since controlling only the single parameter allows us to balance the computational accuracy and efficiency,
it is expected that the feature makes the DC-LNO method easy to use for a wide variety of applications.
![]() |
Since the total number of basis functions to represent the Hamiltonian of the truncated cluster is
reduced by introducing LNOs while keeping the accuracy,
it is expected that the computational time can be substantially reduced.
In the whole procedure of the DC-LNO method the calculation of LNOs and construction of Hamiltonian
and overlap matrices occupy a small fraction of the whole computational time, typically less than ,
and thereby the computational time is mainly governed by solving of the eigenvalue problem
for each atom
.
Noting that the computational time to solve the eigenvalue problem scales as the third power
of the dimension of the matrices, the ratio of computational time between the DC-LNO and DC methods
for elemental systems might be estimated by
![]() |
In Fig. 7 we further show timing results of the DC-LNO method as a function of the number of atoms in the unit cell of Si crystal. It is confirmed from the linear increase of the computational time that the DC-LNO method is a linear scaling approach in practice. As a comparison the computational time is also shown for the conventional diagonalization method [73]. The crossing point between the two methods in the computational time is located around 100 atoms when 1280 CPU cores is used. Over 100 atoms the DC-LNO method is much faster than the conventional diagonalization method. The reason why the crossing point is located at such a small number of atoms is partly due to a better parallel efficiency of the DC-LNO method as discussed in the next subsection.
![]() |
To minimize the computational time on massively parallel computers we introduce a multilevel parallelization
using message passing interface (MPI). In our implementation there are three levels for the parallelization,
i.e., atom level, spin level, and diagonalization level as explained below:
(i) Parallelization in the atom level.
If the number of MPI processes is smaller than that of atoms,
only the parallelization in the atom level is taken into account.
The allocation of atoms to MPI processes is performed by a bisection method which is
based on a projection of atoms onto a principal axis calculated from an inertia tensor and a modified binary
tree of MPI processes to minimize memory usage and amount of MPI communications [36].
(ii) Parallelization in the spin level.
If the number of MPI processes exceeds that of atoms, and the spin polarized calculation is performed,
the parallelization in the spin level is introduced on top of the parallelization in the atom level, where
a loop for the spin index is further parallelized.
(iii) Parallelization in the diagonalization level.
If the number of MPI processes is larger than the product of the number of atoms and the multiplicity of spin index,
corresponding to 1, 2, and 1 for non-spin polarized, spin-polarized, and non-collinear calculations, respectively,
a parallelization in the diagonalization level is further taken into account on top of both the parallelizations
in the atom and spin levels. The parallelization in the diagonalization level is made by employing a parallel eigenvalue
solver ELPA [73]. It is noted that the parallelization in the diagonalization level requires a considerable amount
of MPI communications, while the parallelizations in the atom and spin levels have less MPI communications.
So, one would expect a high parallel efficiency in the atom and spin levels, while the parallelization
in the diagonalization level might be limited up to several tens of MPI processes.
To achieve a better scaling for the parallelization in the diagonalization level, it is important to allocate
CPU cores in the same computer node as MPI processes to avoid the inter-node communication as much as possible.
We have implemented the multilevel parallelization so that amount of the inter-node communication can be minimized
especially for the parallelization in the diagonalization level.
In Fig. 8 the speed-up ratio in the MPI parallelization of the DC-LNO method is shown for
non-spin polarized calculations of a diamond supercell containing 64 atoms.
Since the multiplicity of spin index is 1, we see a nearly ideal behavior up to 64 MPI processes.
Beyond 64 MPI processes the parallelization in the diagonalization level is taken into account on top of
the parallelization in the atom level.
A superlinear speed-up is observed at 128 and 256 MPI processes, which might be due to an effective use
of cache by the reduction of memory usage, and a good scaling is achieved up to 1280 MPI processes at which
the parallel efficiency is calculated to be 70% using the elapsed time at 1 MPI process as reference.
Since each computer node has 20 CPU cores in this case, it would be reasonable to observe the good scaling up to
1280 (=6420) MPI processes.
Thus, we see that the multilevel parallelization is very effective to minimize the computational time
in accordance with a recent development of massively parallel computers.
![]() |
To verify the accuracy of forces on atoms calculated by the DC-LNO method, results of NVE molecular
dynamics (MD) simulations are shown in Fig. 9. We see that the sum of the kinetic energy of atomic nuclei (Kinetic)
and the internal total energy (), being a conserved quantity, is reasonably conserved as a function of time,
and the fluctuation is about one tenth of the kinetic energy or the internal total energy.
It should be noted that the approximate conservation of the sum is achieved for not only Si being a semi-conductor,
but also Al being a metal.
Thus, it can be concluded that the accuracy of forces on atoms calculated by the DC-LNO method is sufficient
for practical purposes, while it was remarked in the Sec. II that the forces on atoms in the DC-LNO
method are not calculated variationally. The sufficient accuracy of the calculated forces is achieved by the use of
large cutoff radii in constructing the truncation clusters, which is realized by both the introduction of LNOs and
the massive parallelization with the multilevel parallelism.
![]() |
To further demonstrate the applicability of the DC-LNO method for MD simulations,
we show radial distribution functions (RDFs) in liquid phases of silicon, aluminum, lithium, and SiO in Fig. 10.
Since the electronic structures exhibit metallic features in the liquid phases of silicon, aluminum, and lithium,
the MD simulations can be considered as a severe benchmark to validate the applicability of the DC-LNO method to metals.
The cutoff radii
we used correspond to truncated clusters consisting of about 300 atoms in the ideal bulk structures.
It turns out that in all the cases the DC-LNO method reproduces well the results by the conventional
diagonalization method,
and that the obtained RDFs are well compared to other computational results [75,76,77,78].
The considerable agreement between the DC-LNO and conventional methods strongly implies that a sufficient accuracy in reproducing
at least RDF for MD simulations can be attainable with a cutoff radius
resulting in truncated clusters consisting
of about 300 atoms for not only insulators but also metals. Thus, adjusting the cutoff radius
so that the number
of atoms in a truncated cluster can be
300 atoms would be a compromise to balance the computational accuracy
and efficiency, while the difference between the DC and DC-LNO methods in terms of the computational efficiency
may not be significant for truncated clusters of this size.
It is crucial to minimize the elapsed time for realization of long time MD simulations.
With the computational condition the elapsed time per SCF step for silicon is 1.5 (sec.) on average
using 1280 MPI processes on the same machine used for the calculations shown in Fig. 8.
![]() |
We have presented an efficient O() method based on the DC approach and a coarse graining of basis functions by
localized single-particle natural orbitals (LNOs) for large-scale DFT calculations. A straightforward way to attain sufficient accuracy
in the DC method is to employ a relatively large cutoff radius for the truncation of a system, which is the most fundamental
parameter in most of O(
) methods to control the computational accuracy and efficiency.
We have adopted the rather brute force approach, and attempted to decrease the computational cost by introducing LNOs
as basis functions in the long range region of the truncated cluster, and to minimize the elapsed time in the computation
with the help of a multilevel parallelization.
The method of generating LNOs is based on a low-rank approximation to the projection operator for the occupied space
by a local eigendecomposition at each atomic site, and the band structure calculations with PAOs and LNOs clearly show
that the resultant LNOs
span well the occupied space of not only gapped systems but also metals. It is also worth mentioning that
the computational cost of generating LNOs is almost negligible thank to the independent calculation at each atomic site.
By replacing PAOs with LNOs in the long range region of the truncated cluster in the DC method, the computational cost of
the DC method can be reduced without largely sacrificing the accuracy.
Nothing that the DC-LNO method holds the simple algorithm of the original DC method suited to the parallel calculation,
we have implemented a multilevel parallelization using MPI by taking account of the atom level, spin level, and
diagonalization level. It was demonstrated that the speed-up of the DC-LNO method by the multilevel parallelization can
be expected up to a specific number of MPI processes which corresponds to the product of the number of atoms, the multiplicity
of spin index, and the number of CPU cores in a single computer node. For example, if a spin-polarized calculation is
performed for a system consisting of 1000 atoms on a parallel computer with 20 CPU cores per node, a high parallel
efficiency might be expected up to 40000 MPI processes. As a validation of the applicability of the DC-LNO method,
we have performed MD simulations for liquid phases of an insulator, semi-conductor, and metals, and confirmed that
the RDFs calculated by the DC-LNOs are in good agreement with those by the conventional
diagonalization method,
which may lead to its various applications to structural determinations of amorphous and liquid structures of complicated
materials [79,80,81,82,83].
Considering the simplicity and robustness of the algorithm, we conclude that the DC-LNO method is an efficient and accurate
approach to large-scale DFT calculations for a wide variety of materials including metals.
This work was supported by Priority Issue (Creation of new functional devices and high-performance materials to support next-generation industries) to be tackled by using Post 'K' Computer, MEXT, Japan. Part of the computation in the study was performed using the computational facility of the Japan Advanced Institute of Science and Technology.
As an example we show asymptotic behaviors of the off-diagonal Green functions for an one-dimensional (1D) tight-binding (TB)
model with a single -orbital on each site, and relate the asymptotic behaviors to electronic structures
in gapped and metallic systems. The analysis interprets evidently the oscillating behavior of the error
in the total energy of the metallic systems as shown in Fig. 5, and the rapid convergence
in a high electronic temperature [33,24].
Let us consider an orthogonal chain model with the nearest neighbor interaction and the on-site energy
as defined by
![]() |
(27) |
![]() |
(28) |
![]() |
We now relate the asymptotic behaviors of Green functions
to the calculation of density matrix which is defined by Eq. (24). Introducing the Matsubara expansion
of the Fermi-Dirac function, and changing the integration path with the Cauchy theorem, one has [88]
Although the matrix elements of the Green function associated with only the central atom
in the calculation of each truncated cluster
are calculated in our implementation, it is
possible to generalize the single central atom to a cluster consisting of several atoms.
An optimum size of the central cluster might be adjusted in order to reduce the computational cost
as discussed in the Appendix of Ref. [34]. However, symmetries in bulks,
as can be confirmed in charge density and Mulliken populations, tend to be violated by introducing
the central cluster instead of the single central atom.
Therefore, we choose the single central atom rather than the central cluster
in order to preserve the symmetrical properties.