Taisuke Ozaki (JAIST)
During the last three decades continuous efforts [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,20,22,23,24,25,26,27,28,29,30,31] have been devoted to extend applicability of the density functional theory (DFT)[1,2] to large-scale systems, which leads to realization of more realistic simulations being close to experimental conditions. In fact, lots of large-scale DFT calculations have already contributed for comprehensive understanding of a vast range of materials,[32,33,34,35,36,37] although widely used functionals such as local density approximation (LDA)[38] and generalized gradient approximation (GGA)[39] have limitation in describing strong correlation in transition metal oxides and van der Waals interaction in biological systems.
The efficient methods developed so far within the conventional DFT can be classified into two categories in terms of computational complexity, [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,20,22,23,24,25,26,27,28] while the other methods, which deviate from the classification, have been also proposed.[29,30,31] The first category consists of O() methods, [3,4,5,6,7,8,9,10,11,12] where is the number of basis functions, as typified by the Householder-QR method,[11,12] the conjugate gradient method,[4,8,9] and the Pulay method,[6,7] which have currently become standard methods. The methods can be regarded as numerically exact methods, and the computational cost scales as O() even if only valence states are calculated because of the orthonormalization process. On the other hand, the second category involves approximate O() methods such as the density matrix methods,[21,22,23] the orbital minimization methods,[20,25] and the Krylov subspace methods[16,17,18,19,27] of which computational cost is proportional to the number of basis functions . The linear-scaling of the computational effort in the O() methods can be achieved by introducing various approximations like the truncation of the density matrix[21] or Wannier functions[20,25] in real space. Although the O() methods have been proven to be very efficient, the applications must be performed with careful consideration due to the introduction of the approximations, which might be one of reasons that the O() methods have not been widely used compared to the O() methods. From the above reason one may think of whether a numerically exact but low-order scaling method can be developed by utilizing the resultant sparse structure of the Hamiltonian and overlap matrices expressed by localized basis functions. Recently, a direction towards the development of O() methods has been suggested by Lin et al., in which diagonal elements of the density matrix is computed by a contour integration of the Green function calculated by making full use of the sparse structure of the matrix.[40] Also, efficient schemes have been developed to calculate certain elements of the Green function for electronic transport calculations,[41,42] which are based on the algorithm by Takahashi et al.[43] and Erisman and Tinney.[44] However, except for the methods mentioned above the development of numerically exact O() methods, which are positioned in between the O() and O() methods, has been rarely explored yet for large-scale DFT calculations.[45]
In this paper we present a numerically exact but low-order scaling method for large-scale DFT calculations of insulators and metals using localized basis functions such as pseudo-atomic orbital (PAO),[46] finite element (FE),[47] and wavelet basis functions.[48] The computational effort of the method scales as O(), O(), and O() for one, two, and three dimensional (1D, 2D, and 3D) systems, respectively. In spite of the low-order scaling, the method is a numerically exact alternative to the conventional O() methods. The key idea of the method is to directly compute selected elements of the density matrix by a contour integration of the Green function evaluated with a set of recurrence formulas. It is shown that a contour integration method based on a continued fraction representation of the Fermi-Dirac function[49] can be successfully employed for the purpose, and that the number of poles used in the contour integration does not depend on the size of the system. We also derive a set of recurrence formulas based on the nested dissection[51] of the sparse matrix and a block factorization using the Schur complement[12] to calculate selected elements of the Green function. The computational complexity is governed by the calculation of the Green function. In addition to the low-order scaling, the method can be particularly advantageous to the massively parallel computation because of the well separated data structure.
This paper is organized as follows: In Sec. II the theory of the proposed method is presented together with detailed analysis of the computational complexity. In Sec. III several numerical calculations are shown to illustrate practical aspects of the method within a model Hamiltonian and DFT calculations using the PAO basis functions. In Sec. IV we summarize the theory and applicability of the numerically exact but low-order scaling method.
Let us assume that the Kohn-Sham (KS) orbital is expressed by a linear
combination of localized basis functions such as PAO,[46] FE,[47]
and wavelet basis functions[48] as:
(4) |
(5) |
(6) |
We perform the integration of the Green function, Eq. (5), by a contour integration method using a continued fraction representation of the Fermi-Dirac function.[49] In the contour integration the Fermi-Dirac function is expressed by
(7) |
(8) |
Moreover, it should be noted that the number of poles to achieve
convergence is independent of the size of system.
Giving the Green function in the Lehmann representation, Eq. (8) can be
rewritten by
(9) |
The energy density matrix , which is needed to calculate forces on atoms within
non-orthogonal localized basis functions, can also be calculated by the
contour integration method[49] as follows:
(10) |
(11) |
(12) | |||
(13) |
(14) | |||
(15) |
It is found from the above discussion that the computational effort to compute the density matrix is governed by that for the calculation of the Green function, consisting of an inversion of the sparse matrix of which computational effort by conventional schemes such as the Gauss elimination or LU factorization based methods scales as O(). Thus, the development of an efficient method of inverting a sparse matrix is crucial for efficiency of the proposed method.
Here we present an efficient low-order scaling method, based on a nested dissection approach,[51] of computing only selected elements in the inverse of a sparse matrix. The low-order scaling method proposed here consists of two steps: (1) Nested dissection: by noting that a matrix is sparse, a structured matrix is constructed by a nested dissection approach. In practice, just reordering the column and row indices of the matrix yields the structured matrix. (2) Inverse by recurrence formulas: by recursively applying a block factorization[12] to the structured matrix, a set of recurrence formulas is derived. Using the recurrence formulas, only the selected elements of the inverse matrix are directly computed. The computational effort to calculate the selected elements in the inverse matrix using the steps (i) and (ii) scales as O(), O(), and O() for 1D, 2D, and 3D systems, respectively, as shown later. First, we discuss the nested dissection of a sparse matrix, and then derive a set of recurrence formulas of calculating the selected elements of the inverse matrix.
As an example the right panel of Fig. 1(c) shows a structured matrix obtained by the nested dissection approach for a finite chain model consisting of ten atoms, where we consider a -valent nearest neighbor tight binding (NNTB) model. When one assigns the number to the ten atoms as shown in the left panel of Fig. 1(a), then is a tridiagonal matrix, of which diagonal and off-diagonal terms are assumed to be and , respectively, as shown in the right panel of Fig. 1(a). As the first step to generate the structured matrix shown in the right panel of Fig. 1(c), we make a dissection of the system into the left and right domains[52] by renumbering for the ten atoms, and obtain a dissected matrix shown in the right panel of Fig. 1(b). The left and right domains interact with each other through only a separator consisting of an atom 10. As the second step we apply a similar dissection for each domain generated by the first step, and arrive at a nested-dissected matrix given by the right panel of Fig. 1(c). The subdomains, which consist of atoms 1 and 2 and atoms 3 and 4, respectively, in the left domain interact with each other through only a separator consisting of an atom 5. The similar structure is also found in the right domain consisting of atoms 6, 7, 9, and 8. It is worth mentioning that the resultant nested structure of the sparse matrix can be mapped to a binary tree structure which indicates hierarchical interactions between (sub)domains as shown in Fig. 1(d). By applying the above procedure to a sparse matrix, one can convert any sparse matrix into a nested and dissected matrix in general. However in practice there is no obvious way to perform the nested dissection for general sparse matrices, while a lot of efficient and effective methods have been already developed for the purpose.[53,54] Here we propose a rather simple but effective way for the nested dissection by taking account of a fact that the basis function we are interested in is localized in real space, and that the sparse structure of the resultant matrix is very closely related to the position of basis functions in real space. The method bisects a system into two domains interacting through only a separator, and recursively applies to the resultant subdomains, leading to a binary tree structure for the interaction. Our algorithm for the nested dissection of a general sparse matrix is summarized as follows:
(i) Ordering. Let us assume that there are basis functions in a domain we are interested in. We order the basis functions in the domain by using the fractional coordinate for the central position of localized basis functions along -axis, where , and 3. As a result of the ordering, each basis function can be specified by the ordering number, which runs from 1 to in the domain of the central unit cell. The ordering number in the periodic cells specified by , where , is given by , where is the corresponding ordering number in the central cell. In isolated systems, one can use the Cartesian coordinate instead of the fractional coordinate without losing any generality.
(ii) Screening of basis functions with a long tail. The basis functions with a long tail tend to make an efficient dissection difficult. The sparse structure formed by the other basis functions with a short tail hides behind the dense structure caused by the basis functions with a long tail. Thus, we classify the basis functions with a long tail in the domain as members in the separator before performing the dissection process. By the screening of the basis functions with a long tail, it is possible to expose concealed sparse structure when atomic basis functions with a variety of tails are used, while a systematic basis set such as the FE basis functions may not require the screening.
(iii) Finding of a starting nucleus. Among the localized basis functions in the domain, we search a basis function which has the smallest number of non-zero overlap with the other basis functions. Once the basis function is found, we set it as a starting nucleus of a subdomain.
(iv) Growth of the nucleus. Staring from a subdomain (nucleus) given by the procedure (iii), we grow the subdomain by increasing the size of nucleus step by step. The growth of the nucleus can be easily performed by managing the minimum and maximum ordering numbers, and , which ranges from 1 to , and the grown subdomain is defined by basis functions with the successive ordering numbers between the minimum and maximum ordering numbers and . At each step in the growth of the subdomain, we search two basis functions which have the minimum ordering number and maximum ordering number among basis functions overlapping with the subdomain defined at the growth step. In the periodic boundary condition, can be smaller than zero, and can be larger than the number of basis functions . Then, the number of basis functions in the subdomain, the separator, and the other subdomain can be calculated by , , and , respectively, at each growth step. By the growth process one can minimize being a measure for quality of the dissection, where the measure takes equal bisection size of the subdomains and minimization of the size of the separator into account. Also if is larger than , then this situation implies that the proper dissection can be difficult along the axis.
(v) Dissection. By applying the above procedures (i)-(iv) to each -axis, where , and 3, and we can find an axis which gives the minimum . Then, the dissection along the axis is performed by renumbering for basis functions in the domain, and two subdomains and one separator are obtained. Evidently, the same procedures can be applied to each subdomain, and recursively continued until the size of domain reaches a threshold. As a result of the recursive dissection, a structured matrix specified by a binary tree is obtained.
As an illustration we apply the method for the nested dissection to the finite chain molecule shown in Fig. 1. We first set all the system as domain, and start to apply the series of procedures to the domain. The procedure (i) is trivial for the case, and we obtain the numbering of atoms and the corresponding matrix shown in Fig. 1(a). Also it is noted that the screening of the basis functions with a long tail is unnecessary, and that we only have to search the chain direction. In the procedure (iii), atoms 1 and 10 in Fig. 1(a) satisfy the condition. Choosing the atom 1 as a starting nucleus of the domain, and we gradually increase the size of the domain according to the procedure (iv). Then, it is found that the division shown in Fig. 1(b) gives the minimum . Renumbering for the basis functions based on the analysis yields the dissected matrix shown in the right panel of Fig. 1(b). By applying the similar procedures to the left and right subdomains, one will immediately find the result of Fig. 1(c).
We directly compute the selected elements of the inverse matrix using
a set of recurrence formulas which can be derived by recursively applying
a block factorization to the structured matrix obtained by the nested
dissection method as shown below.
To derive the recurrence formulas, we first introduce the block
factorization[12] for a symmetric square matrix :
(16) |
(17) |
(18) |
(19) |
(21) |
(22) |
(23) | |||
(24) |
To address a more general case where the dissection for the sparse matrix
is further nested, we suppose that the matrix has the same inner
structure as
(29) | |||
(30) | |||
(31) |
(32) | |||
(33) |
We are now ready to calculate the selected elements of the Green function using
the inverses of the Schur complements and calculated by the recurrence
formulas of Eqs. (28)-(32). By noting that Eq. (21) has a recursive structure and
the matrix is structured by the nested dissection, one can derive the following
recurrence formula:
A simple but nontrivial example is given in Appendix A to illustrate how the inverse of matrix is computed by the recurrence formulas, and also a similar way is presented to calculate a few eigenstates around a selected energy in Appendix B, while the proposed method can calculate the total energy of system without calculating the eigenstates.
As well as the conventional DFT calculations, in the proposed method the chemical potential has to be adjusted so that the number of electrons can be conserved. However, there is no simpler way to know the number of electrons under a certain chemical potential before the contour integration by Eq. (8) with the chemical potential. Thus, we search the chemical potential by iterative methods for the charge conservation. Since the contour integration is the time-consuming step in the method, a smaller number of the iterative step directly leads to the faster calculation. Therefore, we develop a careful combination of several iterative methods to minimize the number of the iterative step for sufficient convergence. In general, the procedure for searching the chemical potential can be performed by a sequence (1)-(2) or (5)-(1)-(3)-(1)-(4)-(1)-(4)-(1) in terms of the following procedures. As shown later, the procedure enables us to obtain the chemical potential conserving the number of electrons within electron/system by less than 5 iterations on an average.
(1) Calculation of the difference in the total number of electrons.
The difference in the total number of electrons is defined
with calculated using Eq. (8) at a chemical potential
by
(36) |
(2) Using the retarded Green function.
If the difference is large enough so that the interpolation schemes
(3) and (4) can fail to guess a good chemical potential,
the next trial chemical potential is estimated by using the retarded Green function.
When the chemical potential of is considered,
the correction
estimated by the retarded
Green function to is given by
(37) |
(38) |
(39) |
(3) Linear interpolation/extrapolation.
In searching the chemical potential , if two previous results (
) and (
)
are available, a trial chemical potential is estimated
by a linear interpolation/extrapolation method as:
(40) |
(4) Muller method[55,56].
In searching the chemical potential , if tree
previous results (
), (
),
and (
) are available, they can be fitted to a
quadratic equation:
(41) |
(42) |
(5) Extrapolation of chemical potential for the second step.
During the self-consistent field (SCF) iteration, the chemical
potential obtained at the last SCF step is used as the initial guess
in the current SCF step. In addition, we estimate the second trial
chemical potential by fitting a set of results
,
,
and
,
where the subscript and the superscript in and
mean
the iteration step in search of the chemical potential and the SCF step,
respectively,
at three previous SCF steps to the following equation:
(43) |
(44) |
m+1 or p | P | P-1 | P-2 | P-3 | P-4 | P-5 | P-6 | P-7 | P-8 | P-9 | P-10 |
1D (1) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2D () | 1 | ||||||||||
3D () | 1 |
We analyze the computational complexity of the proposed method.
As discussed in the subsection Contour integration of the Green function,
the number of poles for the contour integration is independent of the size of system.
Thus, we focus on the computational complexity of the calculation of the Green function.
For simplicity of the analysis we consider a finite chain, a finite square lattice,
and a finite cubic lattice as representatives of 1D, 2D, and 3D systems, respectively,
which are described by the -valent NNTB models
as in the explanation of the nested dissection. Note that the results in the analysis
might be valid for more general cases with periodic boundary conditions.
Since the computational cost is governed by Eq. (29), let us first analyze the
computational cost of Eq. (29), while those of the other equations will be discussed later.
Considering that the recurrence formula of Eq. (29)
develops as shown in Fig. 3, and that the calculation of Eq. (29) corresponds to
the open circle in the figure, the computational cost can be estimated by
(45) |
For the finite 1D chain system, we see that
and
as listed in
Table I.
Thus, the computational cost for the 1D system is estimated as
(46) |
For the finite 2D square lattice system, we see
, and
and
depend on and , respectively
as shown in Table I. To estimate the order of the computational cost
we approximate
and
as
and
which are equal to or more
than the corresponding exact number.
Then, the computational cost for the 2D system can be
estimated as follows:
1D | 2D | 3D | ||
Eq. (28) | ||||
Eq. (29) | ||||
Eq. (30) | ||||
Eq. (32) | ||||
Eq. (38) | ||||
Eq. (39) |
For the finite 3D cubic lattice system we have
as well as the 1D and 2D systems. As shown in the analysis of the 2D systems,
by approximating
and
as
and
,
which are equal to or more than the corresponding exact number,
we can estimate the computational cost for the 3D system
as follows:
(47) |
We further analyze the computational cost of the other Eqs. (28), (30), (32), (38), and (39) which are the primary equations for the calculation of the Green function. Although the detailed derivations are not shown here, they can be derived in the same way as for Eq. (29). Table II shows the order of the computational cost for each equation. It is found that the computational cost is governed by Eq. (29), while the computational cost of Eq. (39) is similar to that of Eq. (29).
In addition to the analysis of the computational cost for the inverse
calculations by Eq. (28)-(32), (38), and (39), we examine that for
the nested dissection. The step (i) in the nested dissection
involves a sorting procedure by the quicksort algorithm of which computational cost
scales as
for a sequence with length of ,
and thereby the computational cost for -dimensional system can be estimated by
(48) |
(49) |
In the section several numerical calculations for the -valent NNTB model and DFT are presented to illustrate practical aspects of the low-order scaling method. All the DFT calculations in this study were performed by the DFT code OpenMX.[59] The PAO basis functions[46] used in the DFT calculations are specified by H4.5-, C5.0-, N4.5-, O4.5-, and P6.0- for deoxyribonuleic acid (DNA), C4.0- for a single C molecule, and Pt7.0- for a single Pt cluster, respectively, where the abbreviation of basis function such as C5.0- represents that C stands for the atomic symbol, 5.0 the cutoff radius (bohr) in the generation by the confinement scheme, means the employment of one primitive orbitals for each of and orbitals.[46] Since the PAO basis functions are pseudo-atomic orbitals with different cutoff radii depending on atomic species, the resultant Hamiltonian and overlap matrices have a disordered sparse structure, reflecting the geometrical structure of the system. Norm-conserving pseudopotentials are used in a separable form with multiple projectors to replace the deep core potential into a shallow potential.[60] Also a local density approximation (LDA) to the exchange-correlation potential is employed.[38]
As shown in the previous section, it is possible to reduce the computational cost from O() to the low-order scaling without losing numerical accuracy. Here we validate the theoretical scaling property of the computational effort by numerical calculations. Figure 4 shows the elapsed time required for the calculation of inverse of a 1D linear chain, a 2D square lattice, and a 3D cubic lattice systems as a function of number of atoms in the unit cell under periodic boundary condition, which are described by the -valent NNTB models. The last three points for each system are fitted to a function by a least square method, and the obtained scalings of the elapsed time are found to be O( ), O(), and O() for the 1D, 2D, and 3D systems, respectively. Thus, it is confirmed that the scaling of the computational cost is nearly the same as that of the theoretical estimation.
To demonstrate that the proposed method is a numerically exact method even if the summation in Eq. (8) is terminated at a modest number of poles, we show the convergence in the SCF calculations calculated by the conventional diagonalization and the proposed methods for deoxyribonuleic acid (DNA) in Fig. 5, where 80 poles is used for the summation, and the electronic temperature is 700 K. It is clearly seen that the convergence property and the total energy are almost equivalent to those by the conventional method with only 80 poles. Table III also presents the rapid convergence of the total energy and forces in the SCF calculation as a function of the number of poles. In the case, the use of only 40 poles is enough to achieve the numerically exact results for not only the total energy but also the forces on atoms within numerical precision. Since the density matrix, total energy and forces on atoms can be very accurately evaluated within numerical precision as shown above, the low-order scaling method can be regarded as a variational method in practice.
Poles | MAE of forces | |
10 | 80.698617103348 | 0.040703500227 |
20 | 0.122135859603 | 0.000111580338 |
40 | 0.000000000162 | 0.000000000148 |
60 | 0.000000000264 | 0.000000000148 |
80 | 0.000000000255 | 0.000000000148 |
100 | 0.000000000251 | 0.000000000148 |
Although the computational cost of the proposed method can be reduced from the cubic to low-order scalings, the prefactor directly depends on the number of iterations in the iterative search of the chemical potential. To address how the combination of interpolation and extrapolation methods discussed before works to search a chemical potential which conserves the total number of electrons within a criterion, we show in Fig. 6 the number of iterations for finding the chemical potential, conserving the total number of electrons with a criterion of electron/system, as a function of the SCF step for a C molecule, DNA, and a Pt cluster. Only few iterations are enough to achieve a sufficient convergence of the chemical potential as the SCF calculation converges, while a larger number of iterations are required at the initial stage of the SCF step. It turns out that the proper chemical potential can be searched by the mean iterations of 2.1, 2.4, and 4.0 for a C molecule, DNA, and a Pt cluster, respectively. The property of the iterative search is closely related to the energy gap of systems. The energy gap between the highest occupied and lowest unoccupied states of the C molecule, DNA, and Pt cluster are 1.95, 0.67, and 0.02 eV, respectively. For the C molecule and DNA with wide gaps the number of iterations for finding the chemical potential tends be large up to 10 SCF iterations, since the interpolation or extrapolation scheme may not work well due to the existence of the wide gap.
We demonstrate that the proposed method is suitable for the parallel computation because of the well separated data structure. It is apparent that the calculation of the Green function at each in Eq. (8) can be independently performed without data communication among processors. Thus, we parallelize the summation in Eq. (8) by using the message passing interface (MPI) in which a nearly same number of poles are distributed to each process. The summation in Eq. (8) can be partly performed in each process, and the global summation is completed after all the calculations allocated to each process finish. In most cases the global summation can be a very small fraction of the computational time even including the MPI communication, since the amount of the data to be communicated is O() due to the use of localized basis functions. In addition to the parallelization of the summation in Eq. (8), the calculation of the Green function can be parallelized in two respects. In the recursive calculations of Eqs. (28)-(32), one may notice that the calculation for different is independently performed, and also the calculations involving and in Eqs. (28)-(32) can be parallelized with respect to the column of and without communication until the recurrence calculations reach at . For each the MPI communication only has to be performed at . In our implementation only the latter part as for the calculation of the Green function is parallelized by a hybrid parallelization using MPI and OpenMP, which are used for internodes and intranode parallelization. As a whole, we parallelize the summation in Eq. (8) using MPI and the calculations involving and in Eqs. (28)-(32) using the hybrid scheme.
Figure 7 shows the speed-up ratio by the parallel calculation in the elapsed time of one SCF iteration. The speed-up ratio reaches about 350 and the elapsed time obtained is 3.76 using 162 processes and 4 threads, which demonstrates the good scalability of the proposed method. On the other hand, the conventional diagonalization using Householder and QR methods scales up to only 21 processes, which leads to the speed-up ratio of about 10 and the elapsed time of 7.09 . Thus, we see that the proposed method is of great advantage to the parallel computation unlike the conventional method, while the comparison of the elapsed time suggests that the prefactor in the computational effort for the proposed method is larger than that of the conventional method.
An efficient low-order scaling method has been developed for large-scale DFT calculations using localized basis functions such as the PAO, FE, and wavelet basis functions, which can be applied to not only insulating but also metallic systems. The computational effort of the method scales as O(), O(), and O() for 1D, 2D, and 3D systems, respectively. The method directly evaluates, based on two ideas, only selected elements in the density matrix which are required for the total energy calculation. The first idea is to introduce a contour integration method for the integration of the Green function in which the Fermi-Dirac function is expressed by a continued fraction. The contour integration enables us to obtain the numerically exact result for the integration within double precision at a modest number of poles, which allows us to regard the method as a numerically exact alternative to conventional O() diagonalization methods. It is also shown that the number of poles needed for the convergence does not depend on the size of the system, but the spectrum radius of the system, which implies that the number of poles in the contour integration is unconcerned with the scaling property of the computation cost. The second idea is to employ a set of recurrence formulas for the calculation of the Green function. The set of recurrence formulas is derived from a recursive application of a block factorization using the Schur complement to a structured matrix obtained by a nested dissection for the sparse matrix . The primary calculation in the recurrence formulas consists of matrix multiplications, and the computational scaling property is derived by the detailed analysis for the calculations. The chemical potential, conserving the total number of electrons, is determined by an iterative search which combines several interpolation and extrapolation methods. The iterative search permits to find the chemical potential by less than 5 iterations on an average for systems with a wide variety of gap. The good scalability in the parallel computation implies that the method is suitable for the massively parallel computation, and could extend the applicability of DFT calculations for large-scale systems together with the low-order scaling. It is also anticipated that the numerically exact method provides a platform in developing novel approximate efficient methods.
The author was partly supported by the Fujitsu lab., the Nissan Motor Co., Ltd., Nippon Sheet Glass Co., Ltd., and the Next Generation Super Computing Project, Nanoscience Program, MEXT, Japan. The author would like to thank Dr. M. Toyoda for critical reading of the manuscript.
Since the proposed method to calculate the inverse of matrix is largely
different from conventional methods, we show a simple but nontrivial
example to illustrate the calculation of the inverse
by using the set of recurrence formulas
Eq. (28)-(32), (38), and (39),
which may be useful to understand how the calculation proceeds.
We consider a finite chain molecule consisting of seven atoms described
by the same -valent NNTB model as in the subsection Nested dissection,
where all the on-site energies and hopping integrals are assumed to be 1.
After performing the nested dissection, we obtain the following structured matrix:
(50) |
(51) | |||
(52) | |||
(53) | |||
(54) |
(55) | |||
(56) |
(57) |
(58) | |||
(59) | |||
(60) | |||
(61) |
(62) | |||
(63) |
(64) | |||
(65) |
(66) |
In the appendix, it is shown that a few eigenstates around a selected energy can be obtained by a similar way with the same computational complexity as in the calculation for the density matrix, though the proposed method directly computes the density matrix without explicitly calculating the eigenvectors.
We compute the few eigenstates around using a block shift-invert iterative
method in which the generalized eigenvalue problem of Eq. (2) is transformed as
(67) |
(68) |
(69) |
Here we show that the matrix multiplication of
can be performed by a similar way with the same computational complexity
as in the calculation for the density matrix.
As an example of , let us consider the matrix given by
Eq. (22). After the recurrence calculation of
Eqs. (28)-(32), it turns out that the matrix is factorized as
(70) |
(71) |
(72) |
(73) |
By considering Eq. (B7) and the simple forms of ,
the matrix multiplication of
can be performed by
the following three steps:
(i) The first step,
,
is calculated by
(ii) The second step,
,
is calculated by
(iii) The third step,
,
is performed by the following recurrence formulas:
The computational effort of the three steps can be easily estimated by the same way as for the calculation of the inverse matrix, and summarized in Table IV. It is found that the the computational complexity of the three steps is lower than that of the calculation of the inverse matrix. Thus, if the number of selected eigenstates and the number of iterations for convergence are independent of the size of system, the computational effort of calculation of the selected eigenstates is governed by the recurrence calculation of Eqs. (28)-(32) even for the calculation of selected eigenstates. The scheme may be useful for calculation of eigenstates near the Fermi level.
1D | 2D | 3D | ||
Eq. (80) | ||||
Eq. (82) | ||||
Eq. (83) | ||||
Eqs. (84) +(85) |