A generalized multivariable Newton method

It is well known that the Newton method may not converge when the initial guess does not belong to a specific quadratic convergence region. We propose a family of new variants of the Newton method with the potential advantage of having a larger convergence region as well as more desirable properties near a solution. We prove quadratic convergence of the new family, and provide specific bounds for the asymptotic error constant. We illustrate the advantages of the new methods by means of test problems, including two and six variable polynomial systems, as well as a challenging signal processing example. We present a numerical experimental methodology which uses a large number of randomized initial guesses for a number of methods from the new family, in turn providing advice as to which of the methods employed is preferable to use in a particular search domain.


Introduction
Newton's method and its variants are a fundamental tool for solving nonlinear equations.Namely, given a C 1 -function f : R n → R n , Newton's method is designed to converge iteratively to a solution of the problem Problem (1) arises in practically every pure and applied discipline, including mathematical programming, engineering, physics, health sciences and economics.As a result, studies of Newton's method form an extremely active area of research, with new variants being constantly developed and tested.Basic results on Newton's method and comprehensive lists of references can be found, e.g., in the books by Dennis and Schnabel [3], Ostrowski [12], Ortega and Rheinboldt [11] and Deuflhard [4].The interested reader will find an excellent survey of Newton's method in [13].
When started at an initial guess close to a solution, Newton's method is well-defined and converges quadratically to a solution of (1), unless the Jacobian of f is singular and the second partial derivatives of f are not bounded.Hence, if the user has an idea of where a solution might be lying, Newton's method is well-known to be the fastest and most effective method for solving (1).
To ensure global convergence (i.e., to ensure convergence to a solution from any initial point), suitable modifications of the Newton method are needed.An example of a globally convergent variant is the so-called Levenberg-Marquardt method [7,8].This involves a modification of Newton's search direction at each step of the method.Without this modification, however, quadratic convergence can only be ensured when the initial guess belongs to a quadratic convergence region, namely a region from which every starting point generates a quadratically convergent Newton sequence.
For a given solution x * of (1), a quadratic convergence region is, in general, not known a priori.Kantorovich [6] and Smale [15] establish quadratic convergence to a solution of (1) under suitable assumptions on the initial guess, and they do so without modifying Newton's original search direction.Our aim, on the other hand, is to devise a suitable modification of the Newton iteration, so that the quadratic convergence region is (ideally) larger than the one resulting from the classical Newton iterations.Our approach can, in some sense, be seen as a preconditioning of the iterations, so that a larger convergence region is obtained.This preconditioning might be helpful when very little is known about the location of the solutions of (1).
In [1], the authors presented a generalized version of the classical univariate Newton iteration in which the original problem (1) is replaced by a "modified" system (for n = 1) where s : R → R is a C 1 -invertible function in a neighbourhood of the solution.The classical Newton method then corresponds to the choice s(x) = x.By judiciously choosing s in a way that relates to the nature of Problem (2), the authors illustrate in [1] via numerical experiments that the region of quadratic convergence can be enlarged, and hence a wider choice of initial guesses are likely to result in quadratic convergence.
In the present paper, we propose a multivariate version of the generalized Newton method proposed in [1].We establish the quadratic convergence under suitable assumptions, and test this new method in our numerical experiments.For suitable choices of s, we illustrate via extensive numerical experiments that the region of convergence corresponding to the new method may be larger than the one observed for the classical Newton iteration.
Recall that, if a sequence (x k ) converges to x * (with x k = x * , ∀k), it is said to converge quadratically to x * whenever we have that where λ denotes the so-called asymptotic error constant (see Definition 2.5).Moreover, the smaller λ is, the faster the convergence will be.
With different choices of s, the value of λ will also be different, in general.Our generalized Newton methods provide a tool for enlarging the region where λ = λ(s) is finite.Moreover, a suitable choice of s might produce a smaller value λ(s), thus resulting in an improvement of the convergence speed.We illustrate this phenomenon in Section 4.
The choice of a suitable function s is, however, not clear in general, and more studies are needed to develop a systematic way of designing such choices.An appropriate choice of s may result in a more robust behaviour of the generalized Newton method over a larger search domain, as can be observed in the numerical experiments we carry out in Section 5.
The present paper is organized as follows.In Section 2, we state the basic definitions, useful remarks, and properties.In Section 3, we establish the quadratic convergence results for the generalized Newton's method.In Section 4, we establish bounds on the asymptotic error constant.In Section 5, we test and compare the classical and a number of generalized methods for example problems with two and six variables, one of the problems being a challenging signal processing example.In the last section, the conclusion and a discussion are presented.

Preliminaries
We present first the main definitions and assumptions.Definition 2.1.Let x ∈ R n and let g : R n → R n be twice continuously differentiable.We write . . .
   ∈ R n and, for each i = 1, . . ., n, we have ∇g i (x) := where the vector ∇g i (x) is called the gradient of g i at x for every i = 1, . . ., n.The Jacobian of g at x, denoted by J g (x), is the n × n matrix which has for row i the (transpose of the) gradient of each g i , for i = 1, . . ., n.More precisely, J g (x) is defined as For each i = 1, . . ., n, the Hessian of g i at x is denoted by ∇ 2 g i (x) and defined as In what follows, we denote by • the Euclidean norm, i.e., the ℓ 2 -norm in R n .Denote by B(x, r) := {z ∈ R n : z − x < r} the open ball centered at x with radius r.Similarly, denote by B[x, r] := {z ∈ R n : z − x ≤ r} the closed ball centered at x with radius r.Given a matrix A ∈ R n×n , recall that a norm of A, denoted as A , can be given by A := max{ Ax : x ≤ 1}, often referred to as the ℓ 2 -norm of A.
The following simple lemma will be used in the proof of Proposition 4.1.
where A i is the ℓ 2 -norm of the matrix A i for i = 1, . . ., n.Then v ≤ u 2 w .
Proof.By Cauchy-Schwartz and the definition of the norm, u T A i u ≤ u 2 A i .Hence, which yields the statement.
We recall next some standard definitions and notation we will use in our analysis.
(c) Assume that m = n and h : A → B is bijective.So there exists h −1 : for every x, x ′ ∈ D.
Remark 2.1.Let D ⊂ R n be an open set and let s : R n → R n be a C 1 -homeomorphism from D to s(D).Then for every x ∈ D, we have that J s (x) ∈ R n×n is invertible and Indeed, given x ∈ D, there is a unique y ∈ s(D) such that y = s(x).Differentiate both sides of the equality s • s −1 (y) = y to derive where we used the Chain Rule.Now (3) directly yields the claim.
In particular, if J f (x) is invertible, we obtain for every x ∈ D.
Let g : R n → R n , so g(x) := (g 1 (x), . . ., g n (x)) T .For each i = 1, . . ., n, the gradient and Hessian of g i collect the first-and second-order information, respectively, of g i .The Jacobian, on the other hand, collects in a single operator all the first-order information for all coordinates of g.Similarly, we will need an operator that encapsulates all second order information for all coordinates of g.We formally introduce these operators next.Definition 2.3.Let g : R n → R n be twice continuously differentiable.With the notation of Definition 2.1, define the function . . .
where z j ∈ R n for j = 1, . . ., n.Given n vectors z 1 , . . ., z n ∈ R n , define the map where u j ∈ R n for j = 1, . . ., n.Finally, define the following norm-like concept for the map T g (z 1 , . . ., z n ): where the norms in the right hand-side are the ℓ 2 -norms of the Hessians of the g i 's.When z i = z j = z for every i, j ∈ {1, . . ., n}, we use the short-hand notation Definition 2.4.Given a symmetric matrix A ∈ R n×n , denote its set of eigenvalues by Σ(A).
Recall that, when the matrix norm is induced by the ℓ 2 -norm we have that A = max{|λ| : λ ∈ Σ(A)} =: SR(A), the spectral radius of A. Denote by λ min (A) the minimum eigenvalue of A, and by λ max (A) the maximum eigenvalue of A.
Remark 2.3.Definition 2.4 and the fact that SR(∇ For future use, we now give an elementary fact.( The concepts of rate of convergence and the asymptotic error constant will have an important role in our analysis, so we recall their definitions next.Definition 2.5.Consider a method that generates a sequence (x k ) ⊆ R n such that the sequence converges to x * , where ) is said to converge to x * with order α and asymptotic error constant λ.When α = 2, we say that the method converges quadratically.

Main Assumptions
The following are our main assumptions for establishing quadratic convergence.We follow the analysis from [3].
(H 1 ) J f (x) is nonsingular for all x ∈ D 1 and there exists γ 1 > 0 such that (H 2 ) The function s is a C 1 -homeomorphism from D 1 to s(D 1 ) and there exists γ 2 > 0 such that J s (x) ≤ γ 2 , for all x ∈ D 1 .
(H 3 ) Denote D 2 := s(D 1 ).There exist Given a set A ⊂ R n , we denote by cl(A) the closure of the set A.
Remark 2.4.Assumption (H 3 ) implies the existence of M 1 , M 2 > 0 such that Indeed, this follows from the fact that the mappings are continuous over the compact sets cl(D 1 ) and cl(D 2 ), respectively.
Remark 2.5.Assumption (H 3 ) allows us to apply Lemma 4.1.16in [3] to deduce that there exists ǫ > 0, L 0 , L 1 > 0 such that The authors of [1] proposed a generalized Newton method for solving (1) for n = 1.This method can be described by the following iterative formula: where s : R → R is a C 1 -invertible function in a neighbourhood of the solution.As mentioned in the Introduction, this modification can be seen as a preconditioning of the problem (1), in which the choice of a suitable function s can improve/enlarge the region of convergence of the method.Next we extend the above approach in a natural way to higher dimensions.Namely, for a C 1 -function f : R n → R n , consider the problem of solving In order to solve problem (8), we replace the derivatives in (7) by the corresponding Jacobians.More precisely, consider the function g : R n → R n defined by where s : R n → R n has an inverse s −1 , and J s (x), J f (x) are the (assumed nonsingular) Jacobians of s and f at x, respectively.It can be directly checked that a fixed point x * of g is a solution of (8), as long as both Jacobians are invertible at x * .The Generalized Newton iteration is obtained by the rule g(x k ) = x k+1 , where g is the function defined in (9).
Definition 2.6.Assume that (H 0 ) − (H 2 ) hold.Given x k ∈ D 1 , define In the next section we extend the standard quadratic convergence results to the sequence defined by (10).

Convergence of the Multivariate Generalized Newton Method
The following is Lemma 4.1 from [1] rewritten for the multivariate case.This lemma states that the iteration (10) coincides with the classical Newton iteration for the composite function Lemma 3.1.With the notation and hypothesis of Definition 2.6, let y k = s(x k ) and F := f • s −1 .The iteration (10) can be written as Proof.Using the definitions of y k and y k+1 , and Remark 2.2, we can write the iteration in (11) as Now applying s −1 to this equality yields the iteration (10).
We will use [3, Theorem 5.2.1] which we quote next.
Assume that there exists x ∈ R n such that F (x) = 0, and δ, β, γ > 0 such that the following hold.
(i) B(x, δ) ⊆ D, with J F (x) invertible and Then, there exists ǫ > 0 such that for all x0 ∈ B(x, ǫ), the sequence (x k ) generated by the rule for all k ≥ 0, has the following properties: (a) (x k ) is well defined (i.e.[J F (x k )] −1 exists for all k ≥ 0).
(b) The convergence to x is quadratic, namely, for all k ≥ 0.
Next we show that we can apply this theorem to our setting for a suitable choices of F, x and D. Lemma 3.2.Assumptions (H 0 )-(H 3 ) imply that Conditions (i)-(ii) in Theorem 3.1 hold for F := f • s −1 , x := s(x * ) and D := D 1 .Consequently, there exists ε > 0 such that, for y 0 ∈ B(s(x * ), ε), the sequence for all k ≥ 0, has the following properties: (a) (y k ) is well defined (i.e.[J F (y k )] −1 exists for all k ≥ 0).
(b) The convergence to s(x * ) is quadratic, namely, Proof.Note that the statements (a) and (b) involving the sequence (y k ) will follow directly from Theorem 3.1 once we establish Conditions (i)-(ii) for suitable constants.Therefore, we proceed to prove (i) and (ii).By (H 0 ) and the definitions of F and x, we can write By (H 2 ) and (H 3 ) we have that ) is an open neighbourhood of x = s(x * ).Hence we can take δ > 0 such that B(x, δ) ⊂ D 2 = s(B(x * , r)).Using Remark 2.2 we can write By (H 1 )-(H 2 ) and Remark 2.1, we deduce that the matrix on the right hand side of ( 13) is nonsingular and hence J F (x) is nonsingular.Using Remark 2.1 again gives where we used (H 1 )-(H 2 ) in the last inequality.The expressions ( 13)-( 14) imply that condition (i) in Theorem 3.1 holds for x ∈ B(x, δ) with Next, we check Condition (ii) in Theorem 3.1 for Namely, we show now that there exists γ > 0 such that Adding and subtracting a suitable term and using Remark 2.2 we obtain By (H 3 ) we know that J s −1 ∈ Lip β 2 (D 2 ) and hence for every x, x ′ ∈ D 1 we have where we have used Remark 2.5 in the last inequality.By (H 3 ) we also have that for every x, x ′ ∈ D 1 .Using ( 16)-( 17) in ( 15), together with Remark 2.4 gives This completes the proof of conditions (i) and (ii).Using now Theorem 3.1 and the definitions of β and γ, we obtain (a) and (b) for the sequence (y k ) with the stated value of η.
Theorem 3.2.With the notation of Definition 2.6, assume that (H 0 )-(H 3 ) hold.The sequence (x k ) given by the rule (10) is well defined and converges quadratically to x * .
Proof.By Lemma 3.2, the sequence (y k ) with y k := s(x k ) is well defined and converges quadratically to s(x * ).By Lemma 3.1, the iteration on (y k ) can be equivalently written as We will show now that (x k ) converges quadratically to x * .Indeed, by part (b) of Lemma 3.2, we have that for η as in Lemma 3.2(b).We can write Applying Remark 2.5 to bound the right-most expression, we obtain where we have used the definition of y k+1 in the last inequality.We can now use (18) in the above expression to derive Applying again Remark 2.5 to bound the right-most expression, we obtain So (x k ) converges quadratically to x * , as desired.

Bounds on the Asymptotic Error Constant λ
While the results of the previous section hold for the specific iteration (10), the following results are true for any fixed-point iteration of the form g(x k ) = x k+1 .In this section we will always assume that g is twice continuously differentiable.
Proposition 4.1.Assume that g(x * ) = x * and that J g (x * ) = 0. Consider the sequence (x k ) defined by the fixed point iteration x k+1 = g(x k ).Then there exist sequences (ξ k 1 ), . . ., (ξ k n ) converging to x * such that the asymptotic error constant λ given by Definition 2.5 verifies where T g is as in Definition 2.3.
Proof.Since J g (x * ) = 0, it is well known that (x k ) converges quadratically to x * .We begin by writing g(x) into a Taylor polynomial around x * , coordinate by coordinate, where ξ j , j = 1, . . ., n, are between x and x * .Using now Definition 2.3 as well as the equalities g(x * ) = x * and J g (x * ) = 0, we obtain By taking x := x k and using the definition of the fixed-point iteration, we obtain where ξ k j , j = 1, . . ., n, are between x k and x * .Re-arranging, taking norms, and then dividing by We know that the sequences (x k ) and (ξ k j ) converge to x * as k → ∞ for j = 1, . . ., n.Using these facts and taking limits yield where we have also invoked Definition 2.5.This proves the proposition.
Remark 4.1.With the notation of Definition 2.4, it is well-known that the spectral radius SR(A) is a continuous function of the matrix A. Hence, if g is twice continuously differentiable in a neighbourhood around x, the function ρ will be a continuous function of x.Therefore, whenever (ξ k 1 ), . . ., (ξ k n−1 ) and (ξ k n ) are sequences converging to the same point z, we will have ρ(ξ k 1 , . . ., ξ k n ) converging to ρ(z, . . ., z) = ρ(z).A similar fact can be established for the function µ.Remark 4.2.Clearly, the function [µ(•)] j can be equivalently defined as for j = 1, . . ., n and all nonzero u ∈ R n .
Proof.Recall that by Rayleigh quotient properties, for all j = 1, . . ., n.So for each fixed j we can apply Fact 2.1 with Assume that a = λ min (∇ 2 g j (x j )) ≥ 0.
Theorem 4.1.Suppose g(x * ) = x * and J g (x * ) = 0, with g twice continuously differentiable in a neighbourhood of x * .Consider the asymptotic error constant λ ≥ 0 for the multivariate iterative method x k+1 = g(x k ).Then, it holds that where µ(x * ) is as in Definition 4.1, and T g as in Definition 2.3.
Proof.Recall that, by (21), T g (x * ) = ρ(x * ) .Hence, it is enough to establish the upper bound with ρ(x * ) instead of T g (x * ) .By Proposition 4.1 we have that where (ξ k 1 ), . . ., (ξ k n ) are n sequences converging to x * .Using now Proposition 4.2 for x j = ξ k j and u = x k − x * = 0 we deduce that By definition of T g and Lemma 2.1 we have Combine this fact with (22) and Definition 2.3 to derive By Remarks 4.1, 4.2 and the fact that g is twice continuously differentiable, the functions T g and µ are continuous, so where we also used (21) in rightmost equality.These facts, (23) and the definitions of λ, ρ and µ yield as required.

Numerical Experiments
In this section we compare the classical and generalized Newton methods on five example systems of equations of the form f (x) = 0 as in (1), with two variables (where visualization is possible) as well as six variables.The equations in these test problems involve cubic, quartic and exponential functions.Using various choices of the generalizing function s, we look at both local and global behaviour of the classical and generalized methods, and we do this in the following sense.
• Local behaviour : For each test problem we check that, with either method, convergence to a solution is quadratic, verifying Theorem 3.2.We do this by obtaining numerical estimates of the asymptotic error constant λ.Comparisons of the estimates of λ for each method give us an idea as to which method is faster locally.Recall that for any method that we consider the convergence rate is quadratic.So by comparing λ's, we compare the local "speeds" of quadratically convergent methods.In other words, the ratio of the λ's of two methods will tell us how many times a method is "faster" or "slower" than the other, locally.We also provide theoretical bounds on λ, i.e., intervals in which λ lies, for each example by using Theorem 4.1.
• Global behaviour : It is well known that the main drawback of the classical Newton method is its dependence on the quality of the initial guess (or the starting point) used in the Newton iterations.For the systems with two equations in two unknowns, we visualize graphically the colour-coded number of iterations that either method requires to converge, if at all, over domains of various sizes, with the set tolerance of 10 −8 .These graphs serve to demonstrate the value of the generalized method: The domain in which the generalized method converges in a reasonable number of iterations can be made larger, by choosing the generalizing function s carefully.We also present statistical information about the global convergence properties by means of a large number of randomly generated starting points for each method, in order to verify the information conveyed by the graphs.This information ultimately leads to a decision as to which of the methods considered is best to use, on the average, in a given search domain.
For computations, we use Matlab Release 2019b, update 5.In getting the estimates of λ, variable precision arithmetic (vpa of Matlab) making use of a large number of digits is utilized to be able to obtain these estimates with relatively reliable number of significant figures.
The CPU times are reported by running Matlab on a 13-inch 2018 model MacBook Pro, with the operating system macOS Mojave (version 10.14.6), the processor 2.7 GHz Intel Core i7 and the memory 16 GB 2133 MHz LPDDR3.
The Matlab code we have written to generate the colour-coded portraits of number of iterations is based on Cleve Moler's code for viewing fractals generated by univariate Newton iterations [9] in complex plane.
The methodology used in obtaining the statistical information, such as the average number of iterations, the rate of success of a method, and the CPU times for each iteration and each successful run, on the average, are explained in detail only once, in the first example in Section 5.1.For brevity, we avoid repetitions of this information in the subsequent four examples.

Quartic equations
To compare the classical and generalized methods we will first consider the following example system involving simple quartic functions.
where f : R 2 → R 2 and 0 ∈ R 2 .Clearly, the system (24) has two real solutions; namely x * = (1, 1) and x * = (−1, −1).The expression for the fixed-point map g associated with this system can be derived by using ( 9) with a chosen s.In Table 1, we do this first with s(x) = (x 1 , x 2 ) for the system in (24) and get g for the classical Newton method.The appearance of x 3 1 and x 3 2 in the first and second equations, respectively, prompts us to choose 2 ) for the generalized Newton method.Then we use this s to get the g for the generalized method as displayed in Table 1.We refer to the method obtained in this way the cube-generalized Newton method.
  Table 1: System (24): Fixed-point map g with different choices of s.
By Theorem 3.2, the fixed-point methods (classical and generalized Newton) using the choices of g listed in Table 1 are quadratically convergent and this can numerically be verified.What can one say about the asymptotic error constant?Table 2 encapsulates the ensuing answer.The values listed for λ are the numerical estimates of the asymptotic error constant lim k→∞ x k+1 − x * / x k − x * 2 from Definition 2.5.These estimates are the same for both of the solutions x * = (1, 1) and x * = (−1, −1) (referred to as Solutions 1 & 2) because of the symmetry of the equations in System (24).We note that the estimates of λ consistently fall into the intervals defined by the theoretical bounds established for λ in Theorem 4.1, which are also shown in the table.The (estimated) ratio λ N /λ GN of the asymptotic error constants of the classical and generalized Newton methods, respectively, implies that the generalized method is about three times faster near a solution, for this example.
Table 2: System (24): Asymptotic error constants of the classical (s i (x) = x i , i = 1, 2) and generalized We pointed out in the Introduction that, for a given solution, a quadratic convergence region is in general not known a priori.Next we graphically illustrate in Figure 1 that the (quadratic) convergence regions about the solutions for the generalized Newton method we have devised for this example are larger than those resulting from the classical Newton method.We also look at the regions of convergence over larger domains than just the neighbourhoods of the solutions.
In the graphs in Figure 1, the number of iterations needed to converge with tolerance 10 −8 to a solution from a given point (i.e., an initial guess) is colour-coded as indicated by the colour bar next to each graph: while 2-4 iteration runs are represented by dark blue, 14 or more iteration runs, which are regarded as "unsuccessful ," are represented by yellow.The initial guesses are generated over a 1000 × 1000 grid in the search domains [−3, 3] 2 , [−10, 10] 2 and [−100, 100] 2 .The following immediate observations point to some desirable properties of the cube-generalized method for this example: • Overall, the graphs associated with the cube-generalized method have far smaller yellow regions.
• We note by looking at the [−3, 3] 2 -domain that the regions in which convergence is achieved in 2-6 iterations are much larger than that for the classical method.
• In particular, if the search domain is chosen to be much larger, for example [−100, 100] 2 , then the classical method is unlikely to converge, while the cube-generalized method has a much better chance to converge.
Next we carry out further numerical experiments to support some of our visual observations in Figure 1.In each of the domains [−3, 3] 2 , [−10, 10] 2 and [−100, 100] 2 , we randomly generate one million starting points and record the number of iterations needed to converge from each point.We re-iterate that if the number of iterations is 14 or greater, then we deem that particular run unsuccessful.In Table 3, we list, for several typical choices of s, the average number of iterations over each of the search domains for the successful runs.We also list the percentage of the runs that were successful, namely the success rate.
The CPU time taken by a single iteration of the successful runs on the average cannot be found reliably by simply measuring and recording each successful run time and then averaging them, since the very short CPU time of a single run (to the order of 10 −6 ) cannot be measured reliably.Therefore, with 100 random starting points, we repeat each successful run 10 5 times and take the average.This provides an accurate averaged measure of the CPU time per iteration, which is listed for each method in the last column of Table 3  Table 3 tells us that over the search domain [−3, 3] 2 the cube-generalized method is successful 77% of the time it is run, while the success rate of the classical method is 56%.When we generate initial points randomly over a much larger domain, i.e., over [−100, 100] 2 (this might as well be the situation when we have no knowledge of the location of a solution), the difference in the success rates of the two methods is striking: while the cube-generalized method is successful 36% of the time, the classical method is successful a mere 2% of the time it is run.Although the latter case tells clearly what method to use in the domain [−100, 100] 2 , in the other cases, the success rates alone are not sufficient to tell which method will be (globally) "better" to use.To be able to have a clear idea about which method is more desirable than the others, we need to find the time a method needs before it obtains a solution.Suppose that, for a given method, the CPU time for a successful run is 3.1 × 10 −5 sec and the success rate is 50%.Then, statistically speaking, on average one will need to run that method twice to get a single solution and the time required for this effort will be 6.2 × 10 −5 sec.So we can find the time required to obtain a solution by a given method as: the CPU time per successful iteration, times the average number of iterations, divided by the success rate written as a decimal.The CPU times obtained in this way for each method are tabulated in Table 4.   3.
From the global convergence point of view, the method with the smallest CPU time over a domain in Table 4 should be selected, which are framed for each of the three domains of concern.For the domain [−3, 3] 2 , the time required by the classical Newton method is about 12% worse than the generalized method with s i (x) = sinh(x i ), i = 1, 2, which we refer to as the sinh-generalized Newton method.For the domain [−10, 10] 2 , the classical method seems to be the best to use, although its closest contender, the cube-generalized method, takes only 6% longer time to find a solution.Over the domain [−100, 100] 2 , the cube-generalized method is clearly the best method to use, as the classical method needs about 10 times more time in obtaining a solution.To rephrase the latter statement: the cube-generalized method is expected to obtain 10 solutions by the time the classical method finds one.

Equations involving exponentials
The following system is a special instance of the Jennrich and Sampson test problem presented in [5,10].
System (25) has two solutions, namely x * = (a, b) and x * = (b, a), referred to here as Solutions 1 and 2, respectively, where a = ln((3 + √ 3)/2) ≈ 0.861211502516490 and b = ln((3− √ 3)/2) ≈ −0.455746394408326, with the approximations correct to 15 dp.The appearance of the exponential functions in the equations prompts us to choose s(x) = (e x 1 , e x 2 ) for the generalized Newton method's fixed-point map in (9).We refer to this method as exp-generalized Newton method.As before, s(x) = (x 1 , x 2 ) is used for the classical Newton method.Table 5 lists the numerical estimates and the theoretical intervals for the asymptotic error constant λ, giving some idea about the local behaviour around a solution.However, we note that the ratio λ N /λ GN is not so accurate in this case as the values obtained in the later iterations for λ N seem to fluctuate between 0.4 and 1.4, which we have averaged as 0.9.The approximate value listed for λ N /λ GN implies that, close enough to a solution, the exp-generalized method is more than twice faster.As in the example in Section 5.1, we depict, in Figure 2 the colour-coded number of iterations needed to converge to any one of the two solutions.The success of the expgeneralized method is even more striking in this case: (i) the graphs for the exp-generalized method have far smaller yellow regions, (ii) local convergence regions (that are achieved in 4-6 iterations, shown in darker shades of blue) for the exp-generalized method are much larger and (iii) over the larger domain [−10, 10] 2 , the exp-generalized method has a far better chance of converging in less than 14 iterations.
Table 6 provides some statistical data as in the case of Table 3 for System (24) in the previous subsection.It should be noted that the percentage success rates in the table are in agreement with the percentage of the regions which are not yellow in Figure 2, for the cases of s(x) = (x 1 , x 2 ) and s(x) = (e x 1 , e x 2 ).When successful the CPU time one iteration of the classical Newton method spends on the average (over the domain [−3, 3] 2 ) is 2.7 × 10 −6 sec.The same CPU time for the expgeneralized Newton method is 6.7 × 10 −6 sec, which is about 2.5 times longer.On the other hand, over the domain [−3, 3] 2 , the chance of finding a solution for the exp-generalized method in less than 14 iterations is nearly 4 times higher than using the classical method.Moreover, over the domain [−10, 10] 2 , the exp-generalized method is 23 times more likely to find a solution in the same manner.These likelihoods of success which are greatly in favour of the exp-generalized method seems to offset the higher computational times per iteration.
To make sure of the conclusion we have just drawn above as to which method is preferred, we can again prepare a table listing the average CPU time needed for a successful run by each method, as it was previously done in Table 4 for System (24).Table 7 lists these times, which immediately reconfirms that the exp-generalized method should indeed be the preferred method over [−3, 3] 2 , as it would take the classical method 37% more time to find a solution.Over the larger domain [−10, 10] 2 , by the time the classical method finds a solution the exp-generalized method will have already found about seven solutions-see the framed CPU times.
Time needed to get a single soln [sec] Table 7: System (25): CPU time needed on the average by the classical and generalized Newton methods to obtain a solution in less than 14 iterations, based on the data in Table 6.This is yet another example which clearly illustrates how the structure of the problem can be exploited to solve a system of equations by means of a generalized Newton method.

Cubic equations in two variables
The example we deal with in this section emanates from an unconstrained global optimization problem solved in [2], which asks to minimize the function ϕ : R 2 → R given as Although the numerical optimization method proposed in [2] can find the global minimizer of ϕ, common numerical optimization approaches often only find a stationary point of the function ϕ, by finding a zero of the gradient of ϕ, namely, effectively, they find a solution to the system of equations In [2], five extremal solutions of the function in (26) are listed as in Table 8.Solutions 1-4 are all local minima while Solution 5 is a local maximum.The appearance of x 3 1 and x 3 2 in (27) prompts us to consider s(x) = (x 3 1 , x 3 2 ) as the first generalized method in Table 9. Via experiments we observe that the choice s(x) = (sinh x 1 , sinh x 2 ) yields another worthwhile generalization of the Newton method.Table 9 reveals that the estimates of λ for both of the generalized methods are (by two to four times) smaller at Solutions 1-4, and larger only at Solution 5. To illustrate the overall behaviour, Figure 3 provides a visualization of the success of three methods in terms of the number of iterations.In addition to the classical method, we consider the cube-and sinh-generalized methods.
Glancing at the 3 × 3 matrix of graphs of Figure 3, while the graphs in the entries (2,2) and (3,2) have more of the shades of blue than those in the same rows, the graph in (1,3) appears to have more of the shades of blue and almost no yellow.The success rates by judging from the non-yellow regions in these graphs are corroborated by the success rates presented in Table 10.What seem to be the best-performing methods by looking at these graphs are also in agreement with the ones corresponding to the framed entries in Table 11.By looking at Table 11, we deduce easily that the sinh-generalized method is the best, although the classical method is only slightly worse, in terms of the time they take for a successful run in the domain [−3, 3] 2 .The classical Newton is the best for [−10, 10] 2 , with this time the cubic-generalized method being slightly worse, taking 8% longer time in finding a solution.In the largest domain [−100, 100] 2 , the cube-generalized method is by far the best, as its nearest contender, the classical Newton method, takes more than 11 times longer to obtain a single solution.to obtain a solution in less than 14 iterations, based on the data in Table 10.
Going back to Figure 3, we deduce from the first row of graphs that the regions of convergence in 4-6 iterations of the classical method are considerably enlarged by both of the generalized methods.This is in agreement with the estimated values of λ N /λ GN in Table 9.

Cubic equations in six variables
We consider another system of cubic equations, but this time the number of equations and unknowns is six.The system originates from the problem of (globally) minimizing the function ϕ : R 6 → R, which was studied in [2,14], given by where We consider the problem of finding the zeroes of the gradient ∇ϕ(x) of ϕ(x), in other words, the zeroes of . . .
Solutions of (29) are stationary points of ϕ, in other words, they are candidates for (locally) optimal solutions of ϕ, three of which are listed in Table 12.The first solution listed in Table 12 is a global minimizer of ϕ, as reported in [2].Our aim here is to look at the behaviour of the classical and generalized methods in finding a zero of f , which is only a stationary point of ϕ.First we look at the (local) behaviour near the solutions listed in Table 12.Table 13 tabulates the theoretical intervals where λ lies, found using Theorem 4.1, as well as the λ estimated numerically, for each method.We observe that the numerical estimates fall into the theoretical intervals.We also observe that the cube-generalized method has λ consistently 2 to 3.5 times smaller than that of the classical method, and therefore locally faster by the same factors.The sinh-generalized method, on the other hand, is observed to be not so fast.The reason we have included the sinh-generalized method here is that as we will see in Tables 14  and 15   Since System (29) has six variables, we cannot have the kind of visualization of performance as we had in the previous (two-variable) examples.However, we can still carry out runs with randomized (one million) initial points and make some statistical observations as we did for the previous example systems.Table 14,and   The framed average CPU times required to get a single solution in Table 15 indicate that the sinh-generalized method should be chosen in the domain [−3, 3] 2 , while the cubegeneralized method should be preferred in the larger domains.In [−3, 3] 2 , compared to the sinh-generalized method, the cube-generalized method takes about 21% more time to find a solution, while the classical method requires 34% more time.In the search domain [−100, 100] 2 , the cube-generalized method is unrivalled as none of the other methods is viable to use.We note that in the largest search domain since the other methods has a success rate less than 0.04%, their success rates have been entered as 0.0% into Table 14, with no average number of iterations reported.to obtain a solution in less than 14 iterations, based on the data in Table 14.

A signal processing problem
In optimum broad-band antenna processing the minimization of the mean output power subject to linear constraints is a common problem [16,17].In [17], a 70-tuple example from [16] about this signal processing problem has been transformed into the global minimization of the quartic polynomial function ϕ : R 2 → R given in equation ( 30) below.The details of this transformation can be found in [17,Appendix C].
Here we are interested in the problem of finding a stationary point of ϕ(x), namely a zero of The extremal points of ϕ, which are zeroes f , and the corresponding functional values are given in Table 16 (also see [17]).Solutions 3 and 4 are the global minimizers of ϕ.The missing entries for Solution 5 in Table 17 warrants an explanation.Numerical experiments imply that the classical and sinh-generalized Newton methods' rate of convergence at x = (0, 0) is higher than quadratic, since the asymptotic error constants λ of each method for quadratic convergence is estimated to be zero.In fact, interestingly, the rate of convergence for either method (numerically) turns out to be cubic, with the associated asymptotic error constants estimated as 1.0 and 1.6, respectively.The exp-generalized method is the only method in the list which is verified to be quadratically convergent at Solution 5. On the other hand, the cube-generalized method has a singularity at x = (0, 0), and it fails to converge to the solution, no matter how close to the solution the initial guess is chosen.
A visualization of the global performances of the methods listed in Table 17 is provided in Figure 4.By looking at the graphs, the sinh-generalized method appears to be the best to use in the search domain [−3, 3] 2 since it has smaller yellow regions and the domain is dominated more by shades of blue, a sign of quicker convergence.In the slightly bigger search domain of [−10, 10] 2 , however, while the cube-generalized method seems to have the largest regions of shades of blue, the classical method looks to have the smallest regions of yellow.In view of the difficulty of judging from Figure 4 as to which method is preferable, we will resort to the statistical data presented in Tables 18 and 19 for a more conclusive decision.
Figure 4: System (31): Portraits of colour-coded number of iterations required for convergence.
As in the previous examples, we have run the classical and various generalized methods, with randomized initial points, to obtain the statistics in Tables 18.The only clear case for the choice of a method is when the search domain is [−100, 100] 2 , for which the cube-generalized method should certainly be the method of preference, given the relatively very high success rate, smaller number of iterations and not so much longer CPU time per iteration, all on the average.To determine, ultimately, which of the methods will be preferable in the other search regions, we need to refer to  The boxed CPU times in Table 19 dictate that while the sinh-generalized method should be the method of choice in [−3, 3] 2 , the classical Newton method should better be used in [−10, 10] 2 .We observe that the cube-generalized method is more than 700 times more efficient than its nearest contender, the classical method, in the large search domain [−100, 100] 2 .This equivalently means that by the time the classical method finds a single solution, the cubegeneralized method will have obtained more than 13 solutions, on the average.to obtain a solution in less than 14 iterations, based on the data in Table 18.

Conclusion and Discussion
We have proposed a family of generalized Newton methods facilitated by an auxiliary, or generalizing, function s, for solving systems of nonlinear equations.The method reduces to the classical Newton method if the generalizing function is the identity map, i.e., s(x) = x.Under mild assumptions, we have proved that the new family of methods are quadratically convergent just like the classical one.We derived expressions for the bounds on the asymptotic error constants of the family.These bounds, which can be computed for practical problems easily as illustrated in the numerical experiments, can provide an idea about the relative local speeds of the classical and generalized methods, although they are not tight.
For numerical experimentation, we have considered three types of problems, namely systems of equations involving quartic (in two variables), cubic (in two and six variables), and exponential (in two variables), functions.We carried out extensive numerical experiments using s(x) = x (the classical method), and s(x) = x 3 , sinh x, e x , and tan x (the cube-, sinh-, exp-and tan-generalized methods, respectively), for each of the example problems.
For the problems in two variables, we constructed graphs depicting, for search domains of various sizes, a portrait of the colour-coded number of iterations a method would need to converge to a solution.These graphs, or portraits, were observed to provide a broad idea as to which method is likely to be more preferable.Using one million randomly generated initial points in each chosen search domain, we presented tables reporting the success rate and average number of iterations of a method as well as the average CPU time one iteration of that method takes.By using this data, we were able to identify the method with the smallest CPU time required for finding a single solution, as the preferred method in a particular search domain.
For the cubic and quartic problems we have considered, the sinh-generalized method seems to be particularly successful in relatively smaller search domains.In slightly larger domains where the sinh-generalized method is not so successful anymore, the classical Newton method looks like the method of preference for the two-variable problems.For very large domains, the cube-generalized method certainly looks to be the method of choice, if not the only successful method for some problems.We found that for the exponential problem we studied, the exp-generalized method seems to be the only method one should use in domains of any size.
The kind of numerical exploration we performed could be particularly useful in the case when a system of equations involves parameters and these parameters change only slightly so that the portraits and the statistical data are not altered much.In other words, given a system of equations f (x, p) = 0 , with a vector of n unknowns, x ∈ R n , and a fixed vector of m parameters, p ∈ R m , the task would be to find a solution of (32) as efficiently as possible.Suppose that we have identified the preferred generalized method through the numerical exploration we have devised in this paper.When p has changed slightly, the preferred method might then be employed to find a new solution of (32).
We have demonstrated that a suitable choice of s is possible for some specific forms of systems of equations.Making informed choices of s for more general problems still stands as a challenging research problem.Like most available modifications on Newton method, our generalized version may switch to the classical one (i.e., with s(x) = x) or to another generalized method when the current choice of s is either inconvenient or provides no detectable advantage.One may refer to this version as a hybrid generalized Newton method.For example, when solving the cubic problems, based on the results in the tables with CPU times to get a single solution, it might be interesting to devise a hybrid method which switches from the classical or the cube-generalized method to the sinh-generalized method as iterations fall into the search domain [−3, 3] 2 .
In the future, it would be valuable to study quadratic convergence regions as it was done in [6,15] by Kantorovich and Smale, for the new generalized methods.It would also be interesting to consider extending the work presented in this paper to situations where global convergence to a solution is guaranteed, such as the Levenberg-Marquardt approach [7,8].

Remark 2 . 2 .
Let D ⊂ R n be an open set and consider two functions s, f : R n → R n such that s is a C 1 -homeomorphism from D to s(D), and that f ∈ C 1 (D).Define F := f • s −1 : s(D) → f (D).By the Chain Rule and Remark 2.1, we have, for every x ∈ D,
Time needed to get a single soln [sec]

Figure 2 :
Figure 2: System (25): Portraits of colour-coded number of iterations required for convergence.

Figure 3 :
Figure 3: System (27): Portraits of colour-coded number of iterations required for convergence.
Time needed to get a single soln [sec]

6 ×
Time needed to get a single soln [sec] 10 −2 e xi 5.4 × 10 −5 2.0 × 10 −4 1.9 × 10 −2 tan x i 7.5 × 10 −5 1.2 × 10 −4 7.7 × 10 −3 n.So the definition of [µ(•)] j is given over two complementary sets, one of them open and the other closed.Since in each of these sets [µ(•)] j is given by a continuous function, and the values of these two functions coincide at the boundary of the two complementary sets, we deduce that µ is a continuous function.The next technical result will be used in Theorem 4.1.Recall from Remark 2.3 and Definition 2.3 that T g (x 1 , . . ., x n ) = ρ(x 1 , . . ., x n ) . .

Table 3 :
System (24): Performance of the classical and generalized Newton methods with one million randomly generated starting points in domains of various sizes.

Table 4 :
System (24): CPU time needed on average by the classical and generalized Newton methods to obtain a solution in less than 14 iterations, based on the data in Table

Table 5 :
System (25): Asymptotic error constants of the classical and generalized Newton methods.

Table 6 :
System (25): Performance of the classical and generalized Newton methods with one million randomly generated starting points in domains of various sizes.

Table 9 :
System (27): Asymptotic error constants of the classical and generalized Newton methods.

Table 10 :
System (27): Performance of the classical and generalized Newton methods with one million randomly generated starting points in domains of various sizes.

Table 11 :
System (27): CPU time needed on the average by the classical and generalized Newton methods

Table 12 :
Some of the stationary points of ϕ in (28).
it can have a desirable performance on a larger scale, in search domains of moderate size.

Table 13 :
System (29): Asymptotic error constants of the classical and generalized Newton methods.
subsequently Table15, provide advice as to which method can be chosen for efficiency.

Table 14 :
System (29): Performance of the classical and generalized Newton methods with one million randomly generated starting points in domains of various sizes.

Table 15 :
System (29): CPU time needed on the average by the classical and generalized Newton methods

Table 17
reconfirms that, for Solutions 1-4, convergence is quadratic and the numerically estimated values of λ lie in the theoretical intervals found by using Theorem 4.1.We observe that the sinh-generalized method has λ consistently 1.2 to 2.7 times smaller than that of the classical method, and therefore locally faster by the same factors.

Table 17 :
System (31): Asymptotic error constants of the classical and generalized Newton methods.
Table 19, as in the previous examples.

Table 18 :
System (31): Performance of the classical and generalized Newton methods with one million randomly generated starting points in domains of various sizes.

Table 19 :
System (31): CPU time needed on the average by the classical and generalized Newton methods