Taisuke Ozaki, RCIS, JAIST
The development of O() or linear scaling method is a promising direction of extending applicability of ab initio electronic structure calculations based on density functional theory (DFT) to large scale realistic systems.[1,2,3] In fact, over the last decade, considerable efforts have been devoted to establish efficient and robust O() methods [2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,20], and successful applications within non-self consistent field (SCF) tight binding (TB) scheme [24,25,26] and a number of benchmark calculations within the fully SCF DFT have been reported.[9,10,11,27,28,29] Nevertheless, within the fully SCF DFT much improvement is still needed to reduce the error below chemical accuracy (a few milli-Hartree/atom at least), referred to as milli-Hartree accuracy hereafter, for a wide variety of materials with modest computational cost. In addition, the application to ab initio calculations tends to be practically hampered by an intractable feature that an approximate solution of eigenstates by the O() methods often induces instabilities in the SCF calculation. In this paper to overcome these difficulties we present an efficient and robust O() method for a wide variety of materials including metals in which ideas behind two O() methods, divide-conquer (DC)  and recursion methods,[14,15,23] are unified in a single framework.
The principles behind the O() methods which have been proposed so far can be roughly classified into two categories: variational principles [6,7,8,9] and perturbative principles,[4,5,12,13,14,16,17,18,19], while the other O() or O() methods that cannot be simply classified into these categories exist.[20,21,22] Three O() methods we discuss in this paper belong to the latter. In the former the local quantity of wave functions or density matrix in real space is explicitly affected by all the elements of the Hamiltonian for the whole system. On the other hand, in the latter it is determined by a finite range interaction with an implicit coupling with the whole system via the Poisson equation. This feature may be related to numerical robustness of both the classified methods, which has been not well understood yet. Meanwhile, it is known that the DC method provides rapid convergence for covalent systems such as biological molecules with numerical stability during the SCF calculation. However, the application of the DC method to metals is significantly restricted by requirement of the large size of truncated cluster. On the other hand, the recursion method based on Lanczos algorithms and the Green functions is one of suitable methods for metals,[5,14,15,23] although the SCF calculation with the recursion method becomes unstable. The main idea behind the recursion method is to employ a Krylov subspace [16,17,30] generated by the Lanczos algorithm in evaluating the Green functions. The dimension of the Krylov subspace, required for the convergence with respect to the total energy in metals by the recursion method, is generally much smaller than that of the original vector space defined by the truncated cluster, which is the reason why the recursion method can be efficient for metals. Thus, we propose a novel method which possesses the advantages in two methods and overcomes the drawbacks. This paper is organized as follows: In Sec. II the theory of the proposed method is presented with comparison among three O() methods and with technical details for the practical implementation. In Sec. III the convergence properties of the total energy and selected physical quantities are presented for insulators, semiconductors, metals, and molecular systems. In Sec. IV three applications of the method are discussed to illustrate aspects in practical applications. In Sec. V we summarize the theory, applicability, and limitation of the O() method.
Let us assume that the basis set consists of nonorthogonal
localized functions such as pseudo-atomic orbitals (PAO) 
and finite elements basis.
Throughout this paper we use the PAO as basis
function to expand one-particle wave functions.
The charge density
with spin component is evaluated via density matrix
The proposed method can be regarded as a unified scheme which bridges the DC and recursion methods, therefore, before discussing the proposed method, let us summarize two linear scaling methods: the DC and recursion methods. Both the methods provide different ways of evaluating the Green functions from a local Hamiltonian and overlap matrices constructed from the local environment for each site .[4,5,14] For each site a truncated cluster is constructed by taking account of the local arrangement of basis functions as depicted in Fig. 1(a), indicating that the truncated cluster can overlap largely each other and can contain the same sites as the neighboring sites. In this study we construct the truncated clusters by using both the physical and logical truncation schemes  which will be explained later. Basis functions belonging to the truncated cluster span a subspace in the total vector space spanned by a set of basis functions in the whole system, i.e., .
For each truncated cluster , a local Hamiltonian and overlap
matrices are constructed for the DC and recursion methods in
the completely same way. The simplest way of finding an approximate
Green function from the and is to diagonalize
directly the corresponding eigenvalue problem:
On the other hand, the recursion method [5,14,15,23]
adopts a different way of evaluating
the Green function in which the original vector space
by basis functions in the truncated cluster is mapped to a Krylov subspace
with a smaller dimension rather than solving directly the eigenvalue
problem like the DC method. In the recursion method  generalized
to non-orthogonal basis, the Krylov subspace
by a two-sided Lanczos algorithm is given by
As discussed above, although the DC method is a simple and
robust scheme, the direct diagonalization in this scheme is often
impractical for metallic systems, where the size of truncated cluster could
be large to achieve the milli-Hartree accuracy.
In contrast, the computational effort in the recursion method
can be reduced by mapping the original vector space
defined by the truncated cluster to a smaller subspace, while the numerical
instability during the SCF calculation is a serious problem. Thus, to overcome
the drawbacks we propose a unified method which bridges the DC and recursion
methods. It is worth mentioning that recently a similar idea has been independently
proposed by Takayama et al. within a non-SCF orthogonal TB model.[16,17]
To see why the recursion method can provide convergent results
by a smaller Krylov subspace
, taking account of
, the Green function given by Eq. (4) can be
Therefore, by taking account of the rapid convergence in the recursion method
and the robustness of the DC method with respect to numerical stability,
we now develop a DC method defined in a Krylov subspace. Considering that
the Krylov subspace [30,16] can be generally defined
by Eq. (5) for the non-orthogonal basis set, first let us introduce
a Krylov subspace
for each site given by
In this algorithm Eqs. (10)-(16), might be the inverse of a local overlap
matrix constructed from the same truncated cluster as in the construction
of the local Hamiltonian , where the cluster is constructed by the logical
truncation method as discussed above.
In case the large size of truncated cluster is required to achieve
the milli-Hartree accuracy, it can be a hard task to calculate
the inverse of . Thus, can be substituted by an approximate inverse.
Although a lot of efficient methods of inverting the overlap matrix have
been proposed so far, we now propose a method of calculating an approximate
inverse based on a Krylov subspace of which idea is very similar
to that considered in Ref. .
In the method an approximate inverse can be evaluated by
We have not discussed yet how we should choose the initial sets of states and used in these algorithms for generation of Krylov subspaces. In this study the initial sets of states and consist of block I-orthonormalized vectors and its -orthonormalized vectors. However, it should be noted that the optimum choice of depends on the system under consideration. A possible way of choosing is to use a set of basis functions on the central site , which is similar to that in the block bond order potential method.[14,15] Another choice is to use a set of basis functions on the central site plus the neighboring sites. In most of calculations in this study we use the latter choice for the numerical robustness, since the former tends not to conserve the symmetry of charge distribution to be expected in symmetric systems due to round-off error. The round-off error in the generation of Krylov subspace is accumulated as the iteration step in the procedure by Eqs. (10)-(16) increases, and the accumulation tends not to be negligible by the increase of the number of iteration steps in case of a smaller dimension of . In this study for the latter case the initial state is constructed by basis functions belonging to atoms within a sphere with a radius of , where is the distance between the central atom and the nearest atom.
For numerical stability, it is crucial to construct the Krylov subspace at the first SCF step, and to fix it during subsequent steps. If the Krylov subspace is regenerated at every SCF step, the SCF convergence becomes significantly worse because of fluctuation of the spanned space, which is the reason for the instability inherent in the recursion method coupled with the SCF calculation. If the dimension of is smaller than that of , the generated Krylov subspace depends on the Hamiltonian through its multiplication given by Eq. (11). The fluctuation of the spanned space may couple with the self consistent process of charge density in an indeterminate way as shown in the Sec. III.
We are now ready to make full use of the Krylov subspace generated by
above procedures. By using the Krylov subspace vectors generated by the
algorithm Eqs. (10)-(16), one can transform the representation of Hamiltonian
from the original basis set into the Krylov vectors.
In the transformation of the representation, we furthermore consider
a spatial division of the truncated cluster into a core and the remaining
buffer regions as shown in Fig. 1(b). The core region in this study is defined
by a cluster which is constructed by sites having non zero overlap
to the central site .
Then, the original generalized eigenvalue problem
for the truncated cluster can
be transformed to a standard eigenvalue problem
The Green functions are constructed from the eigenvalue and its corresponding eigenvectors as discussed above, therefore, it is apparent that the method cannot give a finite DOS at the Fermi energy properly for metallic systems, since the resultant density of states (DOS) is discrete. However, in the evaluation of the density matrix, and can be regarded as abscissa and weight in a quadrature, while to my knowledge the statement has not been proven rigorously within the non-orthogonal basis set. Thus, it is expected that the density matrix being an integrated quantity can be accurately evaluated unless we focus attention on details of the density of states. For a proper calculation of the DOS an alternative scheme will be discussed in the Sec. IV.
In this subsection technical details on the implementation of the proposed method are given in an algorithm fashion in order to comprehend the computational flow. The SCF calculation can be performed by a sequence (1)-(2)-(3)-(4)-(5)-(2)-(4)-(5)-(2)-(4)-(5)-... in terms of the following procedures:
(1) Truncation of system:
A large system is divided into small truncated clusters by the physical and logical truncation schemes. After picking neighboring sites up within a sphere with a radius of , which is the physical truncation, we furthermore reconstruct a truncated cluster defined by all neighbors that can be reached by hops in the cluster consisting of the chosen sites, where the hopping is made when the distance between sites is smaller than the sum of cutoff radii of two localized basis functions which are placed in each site, respectively. The second scheme is called the logical truncation, and effective to reduce the energy jump in MD simulations, since isolated sites having no overlap with the other sites can be excluded in the truncated cluster. As a result, the truncated clusters can overlap largely each other as shown in Fig. 1(a).
(2) Construction of cluster Hamiltonian:
For each truncated cluster made by the procedure (1) the Hamiltonian and overlap matrices are constructed by the same way as in the conventional DFT calculation. It is worth mentioning that the Hartree potential is calculated by considering not only the charge density from the truncated cluster, but also all the contributions of truncated clusters in the whole system, which means that each truncated cluster is not isolated in vacuum, but embedded in the system without neglecting the long range Coulomb interaction. This treatment by the embedded cluster would be crucial for computational accuracy. The matrix elements associated with site can be calculated in a parallel computation without any communication, and communications among processors are performed to construct the cluster Hamiltonian and overlap matrices for each truncated cluster.
(3) Generation of Krylov subspace:
At only the first SCF step, the initial state is given by a set of basis functions on the central site of each truncated cluster or a set of basis functions on the central site plus the neighboring sites, the initial state is the -orthogonalized vectors of . After generating the Krylov subspace using Eqs. (18)-(24), is generated using Eqs. (10)-(16), where it is recommended for numerical efficiency to perform the multiplication associated with and in a matrix by vector fashion. Finally, the orthogonal transformation Eq. (9) yields the Krylov subspace with -orthogonal vectors. Then, we only have to store for subsequent steps. It is quite important to keep the subspace generated at the first SCF step for numerical robustness. The Krylov subspace for each truncated cluster is separately calculated on each processor without communication.
(4) Construction of effective Hamiltonian:
The short and long range contributions to the effective Hamiltonian are calculated by Eq. (28) on each processor without communication. When a large size of the buffer region shown in Fig. 1(b) is taken into account, the long range contribution calculated at the first SCF step can be used at subsequent steps for numerical efficiency.
(5) Self consistency
The standard eigenvalue problem with the effective Hamiltonian is diagonalized,
and the back transformation
yields a set of components
of eigenvectors required for the calculation of charge density.
Then, a common chemical potential for all the truncated clusters in the total
system is found by a bisection method so that the total number of electrons
in the total system can be conserved. To find efficiently the common chemical
potential let us define a Green function in terms of Mulliken population by
All the calculations in this study were performed by a DFT code, OpenMX,[33,39] in which one-particle wave functions are expressed by the linear combination of pseudo-atomic basis functions (LCPAO) centered on atomic sites, norm-conserving pseudopotentials are used in a separable form with multiple projectors to replace the deep core potential into a shallow potential,[40,41] and a generalized gradient approximation (GGA)  to the exchange-correlation potential is used. The basis functions used are pseudo-atomic orbitals generated by a confinement scheme, which are specified by H4.5-s2, B4.5-s2p1, C4.5-s2p1, N4.5-s2p1, O4.5-s2p1, Li8.0-s2, Al6.0-s2p2, Si6.0-s2p1d1, P6.0-s2p1d1, Fe4.5-s2p2d1, and Mn5.5-s2p2d1, where the abbreviation of basis functions, such as H5.0-s1p1, represents that H stands for the atomic symbol, 5.0 the cutoff radius (Bohr) in the generation by the confinement scheme, and s1p1 means the employment of one primitive orbital for each of and orbitals. The pseudopotentials used are found in Ref. . The real space grid techniques are used with the energy cutoff of 180 Ry as a required cutoff energy in numerical integrations and the solution of Poisson equation using FFT, while the cutoff energy is adjusted depending on the size of the unit cell so that the difference in the cutoff energies used for the a-, b-, and c-axes can be minimized. In addition, the projector expansion method is employed in the calculation of three-center integrals for the deep neutral atom potentials. Electronic temperature of 300 K is used to count the number of electrons by the Fermi-Dirac distribution function for all the system we considered. In most cases for comparison the k-space conventional diagonalization is also performed with a large number of k-points so that the accuracy of Hartree/atom at least can be assured unless the number of k-points is explicitly specified.
Accuracy and efficiency of the proposed method can be controlled by two parameters: the size of the truncated cluster and the dimension of the Krylov subspace , and an optimum choice of these parameters to achieve the milli-Hartree accuracy depends on systems under consideration. Thus, for a wide variety of materials convergence properties of the total energy are shown as a function of the two parameters in Figs. 2-4 for bulks with a finite gap, metallic systems, and low-dimensional molecular like systems, respectively. For geometrical structures we studied, the experimental coordinates are used for carbon and silicon in the diamond structure, manganese mono-oxide in the rock salt structure, BCC lithium, FCC aluminum, aluminum lithium alloy in the B32 structure, and BCC iron. For hexagonal ice the coordinates were generated by a geometry optimization with a CHARMM force field, for deoxyribonucleic acid (DNA) and dynorphin A  the coordinates were generated using a software TINKER, and for (10,0) carbon nanotube (CNT) the coordinates were generated with a bond length of 1.42 Å. The geometrical structures of the other systems will be given when they are discussed.
For bulks with a finite gap, we see that a truncated cluster consisting of 100-150 atoms reproduces the total energy calculated by the conventional k-space scheme within the milli-Hartree accuracy. For carbon and silicon in the diamond structure, the total energy converges rapidly as a function of the dimension of the Krylov subspace , and it suggests that about 40% of the total number of basis functions in the truncated cluster is enough for the convergence, where the largest dimension of the Krylov subspace for each size of truncated cluster in figures corresponds to the number of basis functions in the truncated cluster. In all the calculations by the proposed method, the dimension of Krylov subspace for the overlap matrix is taken as four times that of , since it turned out that an excess of approximation of leads a slow convergence of the total energy as a function of the dimension of the Krylov subspace . To see the error introduced by the fixed long range contribution a calculation using the updated is also shown, indicating that for carbon and silicon the absolute error by the fixed is less than Hartree/atom. Even for ionic bulk systems such as MnO bulk and ice, the absolute error in the total energy is less than Hartree/atom at about 40% of the total number of basis functions in the truncated cluster. On the other hand, the absolute error by the fixed is larger than that in the case of carbon and silicon. This is due to large variation in the charge distribution during the SCF process, which implies that for ionic systems it is better to update at every SCF step.
For simple metals such as BCC lithium and FCC aluminum, the total energy converges very rapidly as a function of the dimension of the Krylov subspace. However, a relatively large size of truncated cluster is required to achieve the milli-Hartree accuracy. Especially for BCC lithium, even the largest size of the truncated cluster consisting of 537 atoms does not satisfy the milli-Hartree accuracy, which suggests that the DC method can be very time-consuming as shown later. The total energy for B32 aluminum lithium binary alloy converges at a relatively smaller dimension of the Krylov subspace, while the size of the truncated cluster required for the milli-Hartree accuracy exceeds 250 atoms which is larger than that for the gap systems. For the BCC iron total energy shows a little fluctuation as a function of the dimension of the Krylov subspace, while the milli-Hartree accuracy is reachable using a truncated cluster consisting of 254 atoms. The fluctuation of total energy originates from the density of states in BCC iron, since the density of states for the majority spin possesses a rather sharp peak assigned to the localized d-orbitals near the Fermi energy and this position relative to the Fermi energy can affect largely to the total energy and the magnetic moment. Although no information on whether the system is insulating or metallic is required for the construction of the Green functions Eq. (4), lots of eigenvalues will appear near the Fermi energy in metallic systems. This feature makes the convergence difficult, since the density matrix calculated by Eq. (2) is sensitive to the change in the eigenvalues near the Fermi energy, and the BCC iron is the case. The comparison between the fixed and updated shows that the absolute error by the fixed would be negligible for metallic systems. An important result in the calculations for metallic systems is that a large size of truncated cluster is required to achieve the milli-Hartree accuracy, which suggests that mapping to a smaller Krylov subspace is crucial for the efficient calculation rather than performing a direct diagonalization by the DC method.
For covalent and molecular like systems such as DNA, a small peptide, and a carbon nanotube, the absolute error in the total energy drops to below the milli-Hartree accuracy at a smaller size of truncated cluster compared to those for bulks with a finite gap and metallic systems. However, we see that for DNA and the small peptide being ionic and nonperiodic the total energy converges slowly as a function of the dimension of the Krylov subspace compared to that in homogeneous systems such as carbon, silicon in the diamond structure, BCC lithium, and FCC aluminum, which indicates that higher order moments in Eq. (6) contributes substantially to precise description of the complicated density of states. If a truncated cluster consisting of about 180 atoms is used for molecular systems, the absolute error in the total energy can be below Hartree/atom. In this case the long range contribution should be updated at every SCF step, since the absolute error by the fixed is comparable to the accuracy of Hartree/atom.
Figure 5 shows convergence properties of selected physical quantities. For MnO bulk the proposed method well reproduces the local magnetic moment calculated by the conventional k-space scheme. On the other hand, for BCC iron even the largest size of truncated cluster does not fully converge to that by the k-space scheme, which is attributed to a rather sharp peak near the Fermi energy in the density of states for the majority spin, suggesting that a larger size of truncated cluster is needed for the fully convergent result. The x-component of the largest force on atom in DNA and the dipole moment of the small peptide calculated by the k-space scheme are well reproduced by even a smaller size of the truncated cluster, while the complete convergent results are obtained by updating the long range contribution .
From these convergence properties of the total energy and physical quantities, several trends can be summarized as follows: (1) The size of truncated cluster required for the milli-Hartree accuracy increases in order of molecular systems of which chemical bonds are nearly one-dimensional, bulks with a finite gap, and metallic systems, and rough estimates for the size of the truncated cluster are about 50, 150, and 300 atoms, respectively. (2) In homogeneous systems consisting of a single element the rapid convergence can be obtained at a smaller dimension of the Krylov subspace , while the dimension required for the milli-Hartree accuracy increases as structural randomness and ionicity increase. (3) The absolute error in the total energy by the fixed is about Hartree/atom in most cases, and the magnitude seems to increase slightly as ionicity increases.
In Fig. 6 the absolute error in the total energy and the computational time calculated by two O() methods, the proposed and DC methods, are shown as a function of the number of atoms in each truncated cluster for BCC lithium bulk. It is found that two methods are equivalent as regards the accuracy. However, we see that the computational time of the proposed method is remarkably reduced compared to that of the DC method. In the proposed method the dimension of the Krylov subspace and that of the subspace for the approximate inverse of the overlap matrix are 10 % and 40 % of the total number of basis functions in the truncated cluster, respectively. In spite of the considerable reduction of the spanned space, the method gives the same result as that of the DC method, which clearly shows rapid convergence of the proposed method based on the Krylov subspace. Considering the general trends in the convergence properties, in comparison with the DC method, we see that the proposed method is more efficient especially for metallic systems.
To compare the numerical stability, the convergent history of the residual norm in the SCF calculation is shown in Fig. 7 for the conventional k-space diagonalization and four linear scaling methods for FCC aluminum. The residual norm of charge density by the conventional k-space diagonalization, proposed, and DC methods quickly decreases, while the convergent result is hardly obtained in the proposed method with the regeneration of the Krylov subspace and the recursion method. The comparison between the proposed method and its variant with the regeneration of the Krylov subspace suggests a reason why the recursion method tends to suffer from the numerical instability. The regeneration of the Krylov subspace makes the spanned subspace fluctuate, which means that an eigenvalue problem defined by a different subspace is solved at every SCF step. This fluctuation of the spanned space causes the difficulty in obtaining the SCF convergence for the recursion method. On the other hand, in the proposed method without the regeneration of the Krylov subspace the difficulty is avoided since the spanned space is fixed during the SCF calculation.
For diamond carbon the crossing point between the proposed and conventional k-space diagonalization methods with respect to the computational time is about 500 and 250 atoms using 8 and 32 processors, respectively, as shown in Fig. 8. The method can be easily parallelized on a distributed memory parallel machine, since the calculation of each truncated cluster is performed almost independently with a small communication for the calculation of the total number of electrons, suggesting that the parallel efficiency of this method is superior to that of the k-space diagonalization. Therefore, the crossing point is dependent on the number of processors used, and decreases as the number of processors increases as shown in Fig. 8. Since 50 % of the total number of basis functions in the truncated cluster is used for the dimension of Krylov subspace, the computational speed should be times faster than the DC method as far as the computational part for diagonalization, while the actual speed up ratio is 5.8 times in case the super cell containing 1000 atoms is calculated using 32 processors.
To illustrate practical aspects in applications of the proposed O() method, we present three applications of the O() methods: (A) calculation of full wave functions of DNA, (B) interaction between a finite carbon nanotube and aluminum surface, and (C) geometry optimization of boron doped diamond.
Since the proposed method is based on the evaluation of a local Green function rather than wave functions, one cannot obtain full one-particle wave functions of large scale systems from the O() calculation directly. However, the full wave functions can be often useful for gaining an insight into the system and getting information on the phase factor of wave functions. An alternative scheme of obtaining the wave functions is to diagonalize once the whole system by a conventional diagonalization scheme after getting self-consistent charge distribution using the O() method. For the one-shot diagonalization it is a simple but key idea to make use of the self-consistent charge distribution obtained by the O() method. Figure 9 shows the full wave functions calculated by the alternative scheme which correspond to the highest occupied (HO) state and the lowest unoccupied (LU) state, at the point, of the same DNA as discussed in the previous section. For this calculation the cutoff radius to construct a physical truncated cluster is 10.0 Å, and the number of hopping is three for the construction of a logically truncated cluster, giving that the average number of atoms in each truncated cluster is 183. The dimension of the Krylov subspace is 400, and the inverse of overlap matrix is exactly calculated by the Cholesky decomposition instead of the Krylov subspace method. The long range contribution is updated at every SCF step. Under the calculation condition the absolute error, with respect to a conventional diagonalization scheme, in the total energy is 0.000064 Hartree/atom. We see that the HO state consists of guanines and that the LU wave function extends on only cytosines in which two parallel guanines (cytosines) form a bonding state in the HO (LU) state. No significant contribution from the backbone consisting of pentoses connected with phosphodiester linkage is found for both the states. The feature originates from the HO state, located at a relatively higher energy, of a guanine molecule and the low lying LU state of a cytosine molecules.[47,48] A comparison on the DOS between the alternative scheme and a conventional diagonalization method in which the calculation was performed self-consistently using the conventional diagonalization method from the beginning is shown in Fig. 10. The alternative scheme well reproduces the DOS calculated by the conventional scheme for a wide range of energy, supporting that the proposed O() method is accurate enough for assuring the validity of the alternative scheme of calculating the full wave functions.
A significant aspect of the proposed method is that metallic systems can be treated on the same footing as for systems with a finite gap. To illustrate the capability, we show a calculation on interaction between a finite sized CNT and aluminum surface. A (10,0) zigzag carbon nanotube we considered consisting of 339 carbon atoms with a finite length of 35 Å possesses dangling bonds introduced by the chopping at both the edges which are expected to exhibit local magnetic moments. No hydrogen passivation is made at both the edges. In addition, a defect, generated by removing a carbon atom, is introduced at the central region with respect to the long axis and the counter side of that of CNT facing the aluminum surface. Carbon atoms around this defect may also exhibit a local magnetic moment due to the dangling bonds. The structure of CNT is constructed by assuming that all the bond lengths between two carbon atoms are 1.42 Å. The aluminum surface we considered consists of cubic cells with a lattice constant 4.05 Å, each of which contains four aluminum atoms, thus totally it contains 768 aluminum atoms. The CNT is placed along  axis on the (001) surface so that some of carbon atoms facing the surface can be on top of the aluminum atoms consisting of the (001) surface with an interatomic distance of 2.0 Å between the carbon and aluminum atoms. For both the structures of the CNT and the aluminum surface no optimization is performed. The total composite system is treated as a slab model of which unit cell volume is . For the proposed method the cutoff radius to construct a physical truncated cluster is 11.0 Å, and the number of hopping is three for the construction of a logically truncated cluster. The dimension of the Krylov subspace is 800, and the inverse of overlap matrix is exactly calculated by the Cholesky decomposition instead of the Krylov subspace method. The long range contribution is updated at every SCF step. Under the calculation condition the absolute error, with respect to a conventional diagonalization scheme, in the total energy is 0.00443 and 0.00014 Hartree/atom for FCC aluminum and a periodic (10,0) carbon nanotube without the edges, respectively. For a comparison between the proposed and the conventional methods we also performed a conventional k-space calculation with only the point for the same composite system. The elapsed time for the diagonalization part is about 19 and 185 minutes per SCF step for the proposed method and the conventional diagonalization, respectively, using 45 Opteron processors.
Figure 11(a) shows an isosurface map of spin charge density and a profile of charge density along a line denoted by an arrow in the composite system. A comparison between the proposed and the conventional k-space methods in the profile of charge density shows that the proposed method reproduces very accurately the spatial charge distribution in the vicinity of not only the CNT, but also the metallic slab. Large magnetic moments can be found around the edges of CNT and the neighboring carbon atoms to the defect. Although the magnetic moment around the edges is mainly attributed to the dangling bonds corresponding to lone pair electrons in the most outside atoms, the relatively inner atoms without the dangling bond also exhibit considerable magnetic moments in which the local spins are ferro magnetically ordered along the circumference and anti ferro magnetically ordered between two circumferences. This feature is consistent with a result by the other DFT calculation, showing the spin polarization of electrons at near the zigzag edge in a finite sized zigzag CNT with hydrogen passivation. The magnetic moments of the most outside atoms labeled by C-C and the next inner atoms labeled by C-C are listed in Table I, where C and C are the most distant carbon atom in the circumferences from the aluminum surface, C and C are the closest ones, and the other ones are in between the former and the latter. The magnetic moments of the most outside atoms and the next inner atoms are about 1.2 and 0.13 (), respectively, while they vary depending on the relative position to the aluminum surface. The magnetic moment of neighboring atoms to the defect is much smaller than that of the most outside atoms in spite of the existence of lone pair electrons. In addition, the same trend as in the edge regions is found in the magnetic ordering between two circumferences, i.e., two local spins of C and C siting in the same circumference are ferro magnetically ordered and anti ferro magnetic ordering is observed between atoms C and C or C and C in two circumferences. The decrease in the magnetic moment of the neighboring atoms to the defect may be attributed to the decrease in the number of neighboring atoms with a dangling bond in the same circumference, which may contribute to the enhancement of the ferro magnetic ordering. However, a further study will be needed for a conclusive understanding on the magnitude of magnetic moments. A comparison between the proposed and the conventional k-space methods in the Mulliken charge and spin moment listed in Table I clearly shows the accuracy of the proposed method. The Mulliken charge is highly accurately reproduced with an absolute error of 0.005 at most, while the error with respect to the conventional k-space result in the spin moment tends to be larger than that for the Mulliken charge.
Isosurface maps of the difference charge density and difference spin density are shown in Figs. 11(b) and (c), respectively, where the difference charge (spin) density is defined as the difference between the composite system and the sum of the isolated CNT and the slab with respect to the charge (spin) density. We see that electrons in the region just above the aluminum surface transfer to the region around the bottom of the CNT corresponding to the side facing to the aluminum surface, the most outside atoms, and the peripheral atoms to the defect. The charge transfer between the region just above the aluminum surface and the region around the bottom of the CNT can be regarded as a typical Ohmic contact. On the other hand, the charge transfer into atoms with the dangling bond can be considered as a charge doping to impurity states lying between the valence and conduction states, while the magnitude of charge transfer decreases rapidly as the distance between the carbon and the surface increases. Another interesting finding is a Friedel oscillation in the difference charge density, i.e., the increase and decrease of charge density appear alternatively along the depth direction in the aluminum slab, which suggests that the composite system cannot be modeled by a thin slab. Also we see that the charge doping to dangling bonds depresses the magnetic moment as shown in Fig.11(c), indicating that the states for the minority spin locate just above those of the majority spin.
Table II shows the total energies of the isolated CNT, the aluminum slab, and the composite system calculated by the proposed O() method and the conventional diagonalization scheme. For the isolated CNT the absolute error with respect to the conventional diagonalization in the total energy is about 0.02 Hartree (0.00006 Hartree/atom), while that for the aluminum slab is 0.00086 Hartree/atom which is fourteen times larger than the former. The interaction energy between the CNT and the aluminum slab is calculated as 0.301 and 0.203 Hartree by the proposed method and the conventional diagonalization, respectively. Although there is a cancellation of the error in the calculation of the interaction energy, the absolute error in the interaction energy is 0.098 Hartree (2.7 eV), and it is not negligible, which suggests that a further severe calculation condition is needed for a more accurate calculation of this composite system. Also the positive interaction energy implies importance of the geometry optimization to treat the interaction of the composite system properly. The further study and the details will be discussed elsewhere.
The force on atom calculated by the proposed O() method is not derived from a variational principle, but evaluated by employing an expression based on the Hellman-Feynman theorem with a Pulay correction which can be variationally derived only if the eigenvalue problem is exactly solved. Thus, the force is not consistent with the total energy within the proposed O() method unless a infinitely large truncated cluster is used, which can be a disadvantage in the method. However, if the calculated forces are accurate enough so that the geometry optimization can be converged within a criterion of Hartree/Bohr, such a geometry optimization may be applicable to a wide variety of problems in a practical sense. Along this line a geometry optimization by the proposed O() method is demonstrated for boron doped diamond of which super cell contains 62 carbon atoms and two boron atoms being nearest neighbors. The lattice constant of 3.567 Å is used, and the unit cell is fixed during the relaxation. Figure 12 shows the absolute maximum force on atom in the geometry optimization as a function of the optimization steps, calculated by a conventional k-space diagonalization and the proposed O() method with four calculation conditions in which the number of atoms in each truncated cluster is 159, 275, 381, and 525, and the dimension of the Krylov subspace is 425, 800, 1100, and 1600 for Krylov1, Krylov2, Krylov3, and Krylov4, respectively, and for all the cases the long range contribution is updated at every SCF step, and the inverse of overlap matrix is exactly calculated by the Cholesky decomposition. It is found that the absolute maximum force by the proposed O() method fluctuates after converging at a certain level ranging from to Hartree/atom which implies the error in the force calculated by the proposed O() method, while in the conventional scheme it converges rapidly. The error in the force decreases as the number of atoms in the truncated cluster and the dimension of the Krylov subspace increase, and the criterion of Hartree/Bohr is satisfied in the most severe condition we studied. As shown in Table III selected bond lengths, bond angles, and the relaxation energy obtained by the proposed method converge to those by the conventional k-space scheme as the calculation condition becomes severe, and the most severe condition Krylov4 almost reproduces the k-space results. Thus, for practical purposes it is possible to perform the geometry optimization using the proposed O() method with a careful attention to the accuracy.
To conclude, we have developed an efficient and robust O() method based on a Krylov subspace for fully self-consistent large scale ab initio electronic structure calculations of a wide variety of materials including metals. Based on the Krylov subspace an embedded cluster problem is solved with an effective Hamiltonian consisting of the detailed short range and the effective long range contributions. The charge flow between embedded clusters is taken into account by introducing a common chemical potential, and also the long rang Coulomb interaction is properly included by solving the Poisson equation for the whole system. The method can be regarded as a unified approach connecting the DC and recursion methods, and enables us to obtain convergent results with the milli-Hartree accuracy for a wide variety of materials. The almost independent calculation for each truncated cluster assures easiness of the parallel implementation and the high parallel efficiency.
The accuracy and efficiency of the method are mainly controlled by two parameters, the size of the truncated cluster and the dimension of the Krylov subspace, and the optimum choice depends on the system under consideration. A systematic study for the convergence properties of the total energy and physical quantities provides a practical guidelines for adjusting the accuracy and efficiency: (1) the milli-Hartree accuracy can be achieved by the truncated cluster including about 50, 150, and 300 atoms for molecular systems, bulks with a finite gap, and metallic systems, respectively, which can be a minimum size of truncated cluster. (2) In most cases the convergent result for a given truncated cluster can be obtained by less than 50% of the total number of basis functions in the truncated cluster for the dimension of the Krylov subspace, which means that the proposed O() method is faster than the DC method, while the dimension required for the milli-Hartree accuracy increases as structural randomness and ionicity increase.
The applications of the O() method to three examples, (A) calculation of full wave functions of DNA, (B) interaction between a carbon nanotube and metal surface, and (C) geometry optimization of boron doped diamond, show that the O() method can be applicable even for a composite system including metallic systems with considerable accuracy. However, it should be noted that the required accuracy depends on the physical quantity under consideration, and that a careful consideration to the convergence property of the physical quantity is indispensable before performing large scale calculations. With the careful consideration it can be concluded that the proposed O() method is a quite promising approach for realization of large scale ab initio electronic structure calculations for a wide variety of materials.
The author would like to thank Prof. K. Terakura for his continuous encouragement. The author was partly supported by CREST-JST and NAREGI Nanoscience Project, the Ministry of Education, Science, Sports, and Culture, Japan. Part of the computation in this work was done using the computational facilities of the Japan Advanced Institute of Science and Technology (JAIST).
In the Appendix we show that it is possible to minimize the computational
time in the DC method by optimizing the size of the central region that the
associated Green functions are evaluated. The argument with a little
modification holds even for the Krylov subspace method proposed in this paper.
In the original DC method  only the Green functions associated with the single
central site are evaluated using Eq. (4). However, instead of the single site
one can adopt the clustered sites including multiple sites as the central
region that the associated Green functions are evaluated. In this generalization
to the central region, there is an optimum size of the central region to
minimize the computational time as shown by the following simple
consideration: In the conventional diagonalization the computational time
is given by
|Carbon site||Charge||Spin moment|
|CNT + Al slab||-39077.59289||-39078.27711||0.68421|
|CNT on Al slab||-39077.29141||-39078.07384||0.78243|
|(CNT on Al slab)-(CNT + Al slab)|