Geometry Optimization: Ver. 1.0

Taisuke Ozaki, ISSP, the Univ. of Tokyo

1 Newton method

The total energy $E$ of a system can be expanded by the Taylor series with respect to atomic coordinates $\{x_{i}\}$ around $E_{0}$ with $\{x_{i}^{(0)}\}$ as follows:

\displaystyle E

\displaystyle=

\displaystyle E_{0}+\sum_{i}^{3N}\left(\frac{\partial E}{\partial x_{i}}\right% )_{0}(x_{i}-x_{i}^{(0)})+\frac{1}{2}\sum_{i,j}^{3N}\left(\frac{\partial^{2}E}{% \partial x_{i}\partial x_{j}}\right)_{0}(x_{i}-x_{i}^{(0)})(x_{j}-x_{j}^{(0)})% +\cdots,

(1)

where the derivatives $()_{0}$ mean the partial derivatives at $\{x_{i}^{(0)}\}$ , and $N$ is the number of atoms. By differentiating Eq. (1) with respect to $x_{k}$ , to the second order we have

\displaystyle\frac{\partial E}{\partial x_{k}}

\displaystyle=

\displaystyle\left(\frac{\partial E}{\partial x_{k}}\right)_{0}+\sum_{i}^{3N}% \left(\frac{\partial^{2}E}{\partial x_{k}\partial x_{i}}\right)_{0}(x_{i}-x_{i% }^{(0)}).

(2)

In case the coordinates $\{x_{i}\}$ give a local minimum, assuming $\frac{\partial E}{\partial x_{k}}=0$ , we have the following matrix equation:

\displaystyle\left(\begin{array}[]{ccc}(\frac{\partial^{2}E}{\partial x_{1}% \partial x_{1}})_{0}&(\frac{\partial^{2}E}{\partial x_{1}\partial x_{2}})_{0}&% \cdots\\ (\frac{\partial^{2}E}{\partial x_{2}\partial x_{1}})_{0}&(\frac{\partial^{2}E}% {\partial x_{2}\partial x_{2}})_{0}&\cdots\\ \cdots&\cdots&\cdots\end{array}\right)\left(\begin{array}[]{c}(x_{1}-x_{1}^{(0% )})\\ (x_{2}-x_{2}^{(0)})\\ \cdots\end{array}\right)=-\left(\begin{array}[]{c}\left(\frac{\partial E}{% \partial x_{1}}\right)_{0}\\ \left(\frac{\partial E}{\partial x_{2}}\right)_{0}\\ \cdots\end{array}\right).

(3)

The short notation is

\displaystyle H\Delta{\bf x}=-{\bf g},

(4)

where the matrix consisting of the second derivatives in the left-hand side is called Hessian $H$ . Using Eq. (4), $\{x_{i}\}$ can be updated by

\displaystyle{\bf x}^{(n+1)}={\bf x}^{(n)}-(H^{(n)})^{-1}{\bf g}^{(n)}.

(5)

This is the well known Newton method.

2 RMM-DIIS

In the OpenMX, ${\bf x}^{(n)}$ and ${\bf g}^{(n)}$ in Eq. (5) are replaced by ${\bf\bar{x}}^{(n)}$ and ${\bf\bar{g}^{(n)}}$ given by the residual minimization method in the direct inversion of iterative subspace (RMM-DIIS) [1, 2] as follows:

\displaystyle{\bf x}^{(n+1)}={\bf\bar{x}}^{(n)}-\alpha(H^{(n)})^{-1}{\bf\bar{g% }}^{(n)},

(6)

where $\alpha$ is a tuning parameter for acceleration of the convergence, which can be small (large) for a large (small) ${\bf\bar{g}}^{(n)}$ . ${\bf\bar{g}}$ in the RMM-DIIS can be found by a linear combination of previous upto p-th gradients ${\bf g}$ as

\displaystyle{\bf\bar{g}}^{(n)}=\sum_{m=n-(p-1)}^{n}a_{m}{\bf g}^{(m)},

(7)

where $a_{m}$ is found by minimizing $\langle{\bf\bar{g}}^{(n)}|{\bf\bar{g}}^{(n)}\rangle$ with a constraint $\sum_{m=n-(p-1)}^{n}a_{m}=1$ . According to Lagrange’s multiplier method, $F$ is defined by

	$\displaystyle F$	$\displaystyle=$	$\displaystyle\langle{\bf\bar{g}}^{(n)}\|{\bf\bar{g}}^{(n)}\rangle-\lambda(1-% \sum_{m}^{n}a_{m}),$		(8)
		$\displaystyle=$	$\displaystyle\sum_{m,m^{\prime}}a_{m}a_{m^{\prime}}\langle{\bf g}^{(m)}\|{\bf g% }^{(m^{\prime})}\rangle-\lambda(1-\sum_{m}^{n}a_{m}).$		(8)

Considering $\frac{\partial F}{\partial a_{k}}=0$ and $\frac{\partial F}{\partial\lambda}=0$ , an optimum set of $\{a\}$ can be found by solving the following linear equation:

\displaystyle\left(\begin{array}[]{cccc}\langle{\bf g}^{(n-(p-1))}|{\bf g}^{(n% -(p-1))}\rangle&\cdots&\cdots&1\\ \cdots&\cdots&\cdots&1\\ \cdots&\cdots&\langle{\bf g}^{(n)}|{\bf g}^{(n)}\rangle&\cdots\\ 1&1&\cdots&0\\ \end{array}\right)\left(\begin{array}[]{c}a_{(n-(p-1))}\\ a_{(n-(p-1)+1)}\\ \cdot\\ \frac{1}{2}\lambda\end{array}\right)=\left(\begin{array}[]{c}0\\ 0\\ \cdot\\ 1\end{array}\right).

(9)

An optimum choice of ${\bf\bar{x}}^{(n)}$ may be obtained by the set of coefficients $\{a\}$ as

\displaystyle{\bf\bar{x}}^{(n)}=\sum_{m=n-(p-1)}^{n}a_{m}{\bf x}^{(m)}.

(10)

If the Hessian $H$ is approximated by the unity $I$ , Eq. (6) becomes

\displaystyle{\bf x}^{(n+1)}={\bf\bar{x}}^{(n)}-\alpha{\bf\bar{g}}^{(n)}.

(11)

This scheme in the Cartesian coordinate has been implemented as ’DIIS’ in OpenMX.

3 Broyden-Fletcher-Goldfarb-Shanno (BFGS) method

Define

	$\displaystyle\Delta{\bf g}^{(n)}$	$\displaystyle=$	$\displaystyle{\bf g}^{(n)}-{\bf g}^{(n-1)},$		(12)
	$\displaystyle\Delta{\bf x}^{(n)}$	$\displaystyle=$	$\displaystyle{\bf x}^{(n)}-{\bf x}^{(n-1)}.$		(13)

Then, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [3] gives the following rank-2 update formula for $(H^{(n)})^{-1}$ :

	$\displaystyle(H^{(n)})^{-1}$	$\displaystyle=$	$\displaystyle(H^{(n-1)})^{-1}+\frac{\langle\Delta{\bf x}^{(n)}\|\Delta{\bf g}^{% (n)}\rangle+\langle\Delta{\bf g}^{(n)}\|(H^{(n-1)})^{-1}\|\Delta{\bf g}^{(n)}% \rangle}{\left(\langle\Delta{\bf x}^{(n)}\|\Delta{\bf g}^{(n)}\rangle\right)^{2% }}\|\Delta{\bf x}^{(n)}\rangle\langle\Delta{\bf x}^{(n)}\|$		(14)
			$\displaystyle-\frac{(H^{(n-1)})^{-1}\|\Delta{\bf g}^{(n)}\rangle\langle\Delta{% \bf x}^{(n)}\|+\|\Delta{\bf x}^{(n)}\rangle\langle\Delta{\bf g}^{(n)}\|(H^{(n-1)}% )^{-1}}{\langle\Delta{\bf x}^{(n)}\|\Delta{\bf g}^{(n)}\rangle},$		(14)

where $(H^{(0)})^{-1}=I$ . An optimization scheme using Eq. (6) and the BFGS update formula for the inverse of an approximate Hessian matrix in the Cartesian coordinate has been implemented as ’BFGS’ in OpenMX.

4 Rational function (RF) method

The BFGS update by Eq. (14) without any care gives an ill-conditioned approximate inverse of Hessian having negative eigenvalues in many cases. This leads to the optimization to saddle points rather than the optimization to a minimum. The rational function (RF) method [4] can avoid the situation in principle. Instead of Eq. (1), we may consider the following expression:

\displaystyle E

\displaystyle=

\displaystyle E_{0}+\sum_{i}^{3N}\left(\frac{\partial E}{\partial x_{i}}\right% )_{0}(x_{i}-x_{i}^{(0)})+\frac{1}{2}\sum_{i,j}^{3N}\left(\frac{\partial^{2}E}{% \partial x_{i}\partial x_{j}}\right)_{0}(x_{i}-x_{i}^{(0)})(x_{j}-x_{j}^{(0)})% +\frac{1}{2}\lambda\sum_{i}^{3N}(x_{i}-x_{i}^{(0)})^{2}.

(15)

Then, the equation corresponding to Eq. (4) becomes

\displaystyle(H^{(n)}+\lambda I)\Delta{\bf x}^{(n)}=-{\bf g}^{(n)}.

(16)

Therefore, a large $\lambda$ assures that $(H^{(n)}+\lambda I)$ is positive definite. If $\lambda^{(n)}(=-\lambda)$ is given by

\displaystyle\lambda^{(n)}=\langle{\bf g}^{(n)}|{\bf s}^{(n)}\rangle.

(17)

With Eq. (17), Eq. (16) may be equivalent to

\displaystyle\left(\begin{array}[]{cc}H^{(n)}&{\bf g}^{(n)}\\ ({\bf g}^{(n)})^{T}&0\end{array}\right)\left(\begin{array}[]{c}\Delta{\bf x}^{% (n)}\\ 1\end{array}\right)=\lambda^{(n)}\left(\begin{array}[]{c}\Delta{\bf x}^{(n)}\\ 1\end{array}\right),

(18)

where the size of the matrix in the left-hand side is $(3N+1)\times(3N+1)$ , and called the augmented Hessian. The lowest eigenvalue of the eigenvalue problem defined by Eq. (18) may give an optimum choice for $\lambda$ , and the corresponding eigenvector, the last component is scaled to 1, gives an optimization step $\Delta{\bf x}^{(n)}$ . In Eq. (18), the approximate Hessian can be estimated by the following BFGS formula:

\displaystyle H^{(n)}

\displaystyle=

\displaystyle H^{(n-1)}+\frac{|\Delta{\bf g}^{(n)}\rangle\langle\Delta{\bf g}^% {(n)}|}{\langle\Delta{\bf x}^{(n)}|\Delta{\bf g}^{(n)}\rangle}-\frac{H^{(n-1)}% |\Delta{\bf x}^{(n)}\rangle\langle\Delta{\bf x}^{(n)}|H^{(n-1)}}{\langle\Delta% {\bf x}^{(n)}|H^{(n-1)}|\Delta{\bf x}^{(n)}\rangle},

(19)

where $H^{(0)}=I$ . An optimization scheme using Eq. (18) and the BFGS update formula Eq. (19) in the Cartesian coordinate has been implemented as ’RF’ in OpenMX.

5 Eigenvector following (EF) method

By diagonalizing the approximate Hessian given by Eq. (19), the ill-conditioned situation can be largely reduced [5]. The approximate Hessian is diagonalized as

\displaystyle E^{(n)}=(V^{(n)})^{T}H^{(n)}V^{(n)},

(20)

where $E^{(n)}$ is a diagonal matrix of which diagonal parts are eigenvalues of $H^{(n)}$ . If the eigenvalue of the approximate Hessian is smaller than a threshold (0.02 a.u. in OpenMX3.3), the eigenvalue is set to the threshold. The modification of eigenvalues gives a corrected matrix $E^{\prime(n)}$ instead of $E^{(n)}$ . Then, we have the inverse of a corrected Hessian matrix $H^{\prime(n)}$ being a positive definite as

\displaystyle(H^{\prime(n)})^{-1}=V^{(n)}(E^{\prime(n)})^{-1}(V^{(n)})^{T}.

(21)

A optimization scheme using the inverse Eq. (21) in Eq. (6) in the Cartesian coordinate has been implemented as ’EF’ in OpenMX.

In addition, there are two important prescriptions for the stable optimization: (i) If $\langle\Delta{\bf x}^{(n)}|\Delta{\bf g}^{(n)}\rangle$ is positive in the update of Hessian by Eq. (19), it is assured that the updated Hessian is positive definite. Therefore, if $\langle\Delta{\bf x}^{(n)}|\Delta{\bf g}^{(n)}\rangle$ is negative, the update should not be performed. (ii) The maximum step should be always monitored, so that an erratic movement of atomic position can be avoided.

References

[1] P. Csaszar and P. Pulay, J. Mol. Struc. 114, 31 (1984).
[2] F. Eckert, P. Pulay, and H.-J. Werner, J. Comp. Chem. 18, 1473 (1997).
[3] C. G. Broyden, J. Inst. Math. Appl. 6, 76 (1970); R. Fletcher, Comput. J. 13, 317 (1970); D. Goldrarb, Math. Comp. 24, 23 (1970); D. F. Shanno, Math. Comp. 24, 647 (1970).
[4] A. Banerjee et al., J. Phys. Chem. 89, 52 (1986).
[5] J. Baker, J. Comput. Chem. 7, 385 (1986).