1. Introduction
Classical Newton method is one of the popular gradient based iterative meth-ods and widely used for its quadratic convergence property. In recent years, a lot of research is going on for developing higher order iterative algorithms which are based on the logic of Newton’s method, in different areas of numerical computa-tions. Some important higher order iterative methods for finding the root of a nonlinear equation are seen in the literature. Homeier proposed a modification of Newoton method for finding the zero of univariate functions that converges cubically [6,7]. Kou et al. have proposed a cubic order convergent algorithm for solving nonlinear equations [8] and also some variant of Ostrowski’s method with seventh-order convergence [9]. Chun has contributed on the schemes with fourth order convergence and their family [2,3]. Liu et. al have proposed eighth order method with high efficiency index [10] and Cordero et. al have proposed sixth and seventh order schemes [4] for finding root of univariate nonlinear equation. Employing any of these iterative methods one can optimize a univariate, nonlin-ear differentiable function more efficiently. However in this paper, an attempt is made to develop a higher order iterative process for optimizing a multivari-ate function. For developing this scheme, trapezoidal approximation of definite integral is used and classical Newton method is considered in an implicit form. Theory of Taylor expansion of matrix valued function has helped to establish the convergence of the algorithm. It is proved that the proposed algorithm has cubic order convergence property.
Calculus of matrix valued functions has been widely used in various fields of mathematics. This theory has been developed in several directions by many researchers like Turnbull [12,13], Dwyer et. al [5], Vetter [14,15]. Theory of matrix calculus by Vetter [14,15], which uses Kronecker algebra, is the most popular one for its consistency and completeness. In the later period it has been adopted by researchers from various fields like system theory [1], sensitivity anal-ysis [18], stochastic perturbation [17], statistical models [20,16]), econometric forecasting [11], neural network [19] etc. In this paper we use the Taylor ex-pansion of a matrix valued function as developed by Vetter [15] to prove the convergence of our algorithm.
Content of this paper is summarized in the following sections. In Section 2, the new scheme is proposed. In Section 3, detailed convergence analysis of the proposed scheme is given. A comparative study between the classical Newton method and the proposed method is discussed in Section 4. Finally, a table with several test functions and a graphical illustration have been given in Appendix.
2. Proposing a new multivariate optimization algorithm
Consider an optimization problem
(P) f(x) where, f : ℝs → ℝ is a sufficiently differentiable function.
Denote and φ(θ) = ∇f(xn +θ(x−xn)). Then φ(0) = ∇f(xn), φ(1) = ∇f(x), and φ′ (θ) = [∇2f(xn+θ(x−xn))](x−xn). From fundamental theorem of calculus,
So
Hence
Hence
Let xn+1 be the root of the equation
Thus,
This is an implicit functional relation in xn+1 at xn. We replace ∇2 f(xn+1) by ∇2f(zn), where zn is the next iteration point, derived by classical Newton method at xn. Then the new iteration scheme becomes
3. Convergence analysis of the new scheme
To study the convergence analysis of the new scheme (1), following notations and definitions are explained as prerequisites in Subsection 3.1. In the Subsection 3.2, some new definitions and lemmas are introduced which will be used to prove the convergence theorem in Subsection 3.3.
3.1. Prerequisite.
Is = s × s dimensional identity matrix.
ρ(A)= Spectral radius of the matrix A.
A ⊗ B = Kronecker product of two matrices A and B.
For matrices A = (aij)m×n and B = (bij)s×t, A ⊗ B = (aijB)ms×nt.
A×k = A ⊗ A ⊗ . . . ⊗ A (The kth Kronecker power of A).
AB = Matrix product of two matrices A and B.
For the matrices A, B, C and D the following properties hold.
(P1) A ⊗ (B + C) = A ⊗ B + A ⊗ C.
(P2) (A + B) ⊗ C = AA ⊗ C + B ⊗ C.
(P3) (kA) ⊗ B = A ⊗ (kB) = k(A ⊗ B), k is a scalar.
(P4) (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C).
(P5) (A ⊗ B)(C ⊗ D) = AC ⊗ BD, matrix dimension must agree to hold the matrix product AC and BD.
(P6) (A B) ⊗ Is = (A ⊗ Is)(B ⊗ Is) (This follows from (P5)).
Definition 3.1 (Matrix function). A matrix function maps a matrix of s × t dimension to a matrix of p × q dimension.
Definition 3.2 (Matrix derivative [15]). The derivative structure of a matrix-valued function with respect to a scalar bkl and with respect to the matrix Bs×t are defined as
Higher order derivatives are given as
Matrix Taylor Expansion: The Taylor expansion structures for a matrix valued function of a column vector u ∈ ℝs about the column vector described in [15] is:
where
3.2. New Definition and Lemmas.
Definition 3.3. Let f : ℝs → ℝ be a sufficiently differentiable function, gradi-ent of f be ∇f. A function is defined as
where is as defined in (2). xT is the row vector (x1, x2, . . . , xs).
Lemma 3.4.
Proof. This result follows from Definition 3.3 directly.
Lemma 3.5.
Proof.
Lemma 3.6. If a ∈ ℝm and b ∈ ℝn, then || a ⊗ b || = || a || || b || where || . || is Euclidean norm.
Proof. For a = (a1, a2, . . . , am) and b = (b1, b2, . . . , bn),
So
Lemma 3.7. If u ∈ ℝs, then || u×n || = || u ||n for n ∈ ℕ.
Proof. For n = 1, || u×1 || = || u || = || u ||1.
Suppose || u×k || = || u ||k for some k. Then for n = k + 1,
Lemma 3.8. If u ∈ ℝs, then (u×n ⊗ Is)(u×1 ⊗ I1) = u×(n+1) for n ∈ ℕ.
Proof. For n = 1, (u×1 ⊗ Is)(u×1 ⊗ I1) = (u×1 ⊗ Is)u = u×2
Suppose (u×k ⊗ Is)(u×1 ⊗ I1) = u×(k+1) for some k. Then for n = k + 1,
3.3. Third order convergence of the algorithm.
Let α ∈ Rs be the solution of ∇f = 0. Using matrix Taylor expansion (3) about α, ∇f(x) and ∇2f(x) can be expressed as
where
where
Using Definition 3.3 and replacing x by xn, (4) can be rewritten as
Using Lemma 3.5 and replacing x by xn, (5) can be rewritten as
Denote
Neglecting remainder terms for large M and using (8) we rewrite (6) and (7) as
respectively. Now, (10) can be written as
For large n, xn is in a sufficiently close neighborhood of α such that ρ(B) < 1. So from (11),
From (9) and (12), we have
Expanding each term in the right hand side of the above expression, we have
Rearranging the terms in the right side of the above expression according to Kronecker power,
Using Lemma 3.8, putting C1 = Is and rearranging the terms according to Kronecker power in the above expression, we get
zn is the classical Newton iterate at xn (See (1)). Replacing xn by zn in (10), we get
Substituting the expression of zn from (1) in the above expression, we have
Denote
and
One may observe that in the expression of D in (15) the lowest Kronecker power of (xn − α) is 2. As we are writing the terms which produce at most the third kronecker power of (xn − α) there is no need of writing D×2 and D×3 explicitly. After simplifying, expression for P becomes
Hence (14) can be expressed as
So small
(For large n, xn is in a sufficiently close neighborhood of α, so ρ(P) < 1. Hence (Is + P)−1 = Is − P + P2 − · · · . Substituting the value of P,
From the iteration scheme (See (1)) :
Denote en = xn − α. Then,
Using Lemma 3.7,
Since || en ||→ 0, for some large n onwards,
ε is a small positive real number or, where r is a positive real constant.
This implies that the new scheme has third order convergence. Hence the following result holds.
Theorem 3.9. Let f : ℝs → ℝ be sufficiently differentiable function and locally convex at α ∈ ℝs such that ∇f(α) = 0. Then the algorithm (1), with initial point x0, which is sufficiently close to α, converges cubically to the local minimizer α of the problem minx∈ℝs f(x).
4. Numerical Result
The new algorithm is executed in MATLAB (version- R2013b) and the nu-merical computations are summerized in Table 1. One may observe that the total number of iterations in proposed method is less than the total number of iterations in classical Newton method. All the steps of one of these test functions are illustrated graphically in Fig. 1, where it is seen that the proposed process reaches more rapidly than the existing process. CNM denotes classical Newton method and PM denotes proposed method. The Table 1 and Fig.1 are provided in the appendix.
5. Conclusion
Several higher order optimization algorithms exist for single dimension opti-mization problems in the literature of numerical optimization. Newton, Quasi Newton and Conjugate gradient algorithms, which are used for multidimen-sional optimization problems have second and super linear rate of convergence. This paper has developed a cubic order iterative algorithm for unconstrained optimization problems in higher dimension. Taylor expansion of matrix valued function is the key concept to prove the convergence of the algorithm. Using this logic the reader may extend the present work to develop similar algorithms for order of convergence more than 3. In the process of developing the recurrence relation, trapezoidal approximation is used. However one may try with other type approximations also.
References
- John Brewer, Kronecker products and matrix calculus in system theory, Circuits and Systems, IEEE Transactions on 25 (1978), no. 9, 772-781. https://doi.org/10.1109/TCS.1978.1084534
- Changbum Chun, A family of composite fourth-order iterative methods for solving non-linear equations, Applied mathematics and computation 187 (2007), no. 2, 951-956. https://doi.org/10.1016/j.amc.2006.09.009
- Changbum Chun, Some fourth-order iterative methods for solving nonlinear equations, Applied Mathematics and Computation 195 (2008), no. 2, 454-459. https://doi.org/10.1016/j.amc.2007.04.105
- Alicia Cordero, Jose L Hueso, Eulalia Martinez, and Juan R Torregrosa, A family of iterative methods with sixth and seventh order convergence for nonlinear equations, Mathematical and Computer Modelling 52 (2010), no. 9, 1490-1496. https://doi.org/10.1016/j.mcm.2010.05.033
- Paul S. Dwyer and M.S. MacPhail, Symbolic matrix derivatives, The annals of mathematical statistics 19 (1948), no. 4, 517-534. https://doi.org/10.1214/aoms/1177730148
- H.H.H. Homeier, A modified newton method for rootfinding with cubic convergence, Journal of Computational and Applied Mathematics 157 (2003), no. 1, 227-230. https://doi.org/10.1016/S0377-0427(03)00391-1
- H.H.H. Homeier, On newton-type methods with cubic convergence, Journal of Computational and Applied Mathematics 176 (2005), no. 2, 425-432. https://doi.org/10.1016/j.cam.2004.07.027
- Jisheng Kou, Yitian Li, and Xiuhua Wang, A modification of newton method with third-order convergence, Applied Mathematics and Computation 181 (2006), no. 2, 1106-1111. https://doi.org/10.1016/j.amc.2006.01.076
- Jisheng Kou, Yitian Li, and Xiuhua Wang, Some variants of ostrowski's method with seventh-order convergence, Journal of Computational and Applied Mathematics 209 (2007), no. 2, 153-159. https://doi.org/10.1016/j.cam.2006.10.073
- Liping Liu and Xia Wang, Eighth-order methods with high efficiency index for solving nonlinear equations, Applied Mathematics and Computation 215 (2010), no. 9, 3449-3454. https://doi.org/10.1016/j.amc.2009.10.040
- Alfredo Martinez Estrada, A treatise on econometric forecasting, Ph.D. thesis, California Institute of Technology, 2007.
- H.W. Turnbull, On differentiating a matrix, Proceedings of the Edinburgh Mathematical Society (Series 2) 1 (1928), no. 02, 111-128.
- H.W. Turnbull, A matrix form of taylor's theorem, Proceedings of the Edinburgh Mathematical Society (Series 2) 2 (1930), no. 01, 33-54.
- William J. Vetter, Derivative operations on matrices, Automatic Control, IEEE Transactions on 15 (1970), no. 2, 241-244. https://doi.org/10.1109/TAC.1970.1099409
- William J. Vetter, Matrix calculus operations and taylor expansions, SIAM review 15 (1973), no. 2, 352-369. https://doi.org/10.1137/1015034
- Eric Walter and Luc Pronzato, Identification of parametric models, Communications and Control Engineering (1997).
- Zhang Yimin, Suhuan Chen, Qiaoling Liu, and Tieqiang Liu, Stochastic perturbation finite elements, Computers & Structures 59 (1996), no. 3, 425-429. https://doi.org/10.1016/0045-7949(95)00267-7
- Yimin Zhang, Bangchun Wen, and Qiaoling Liu, Sensitivity analysis of rotor-stator systems with rubbing*, Mechanics of structures and machines 30 (2002), no. 2, 203-211. https://doi.org/10.1081/SME-120003016
- Y.M. Zhang, L. Zhang, J.X. Zheng, and B.C. Wen, Neural network for structural stress concentration factors in reliability-based optimization, Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering 220 (2006), no. 3, 217-224. https://doi.org/10.1243/09544100g00505
- Qi Zhao and Christian Bohn, A linearization free recursive prediction error method for combined state and parameter estimation for nonlinear systems, American Control Conference (ACC), 2013, IEEE, 2013, pp. 899-904.
Cited by
-
A NEW PROOF TO CONSTRUCT MULTIVARIABLE GEOMETRIC MEANS BY SYMMETRIZATION
† vol.33, pp.3_4, 2015, https://doi.org/10.14317/jami.2015.379 - Two-phase quasi-Newton method for unconstrained optimization problem vol.30, pp.5, 2019, https://doi.org/10.1007/s13370-019-00680-5