Asymptotic behavior of Newton-like inertial dynamics involving the sum of potential and nonpotential terms

In a Hilbert space $\mathcal{H}$
 H
 , we study a dynamic inertial Newton method which aims to solve additively structured monotone equations involving the sum of potential and nonpotential terms. Precisely, we are looking for the zeros of an operator $A= \nabla f +B $
 A
 =
 ∇
 f
 +
 B
 , where ∇f is the gradient of a continuously differentiable convex function f and B is a nonpotential monotone and cocoercive operator. Besides a viscous friction term, the dynamic involves geometric damping terms which are controlled respectively by the Hessian of the potential f and by a Newton-type correction term attached to B. Based on a fixed point argument, we show the well-posedness of the Cauchy problem. Then we show the weak convergence as $t\to +\infty $
 t
 →
 +
 ∞
 of the generated trajectories towards the zeros of $\nabla f +B$
 ∇
 f
 +
 B
 . The convergence analysis is based on the appropriate setting of the viscous and geometric damping parameters. The introduction of these geometric dampings makes it possible to control and attenuate the known oscillations for the viscous damping of inertial methods. Rewriting the second-order evolution equation as a first-order dynamical system enables us to extend the convergence analysis to nonsmooth convex potentials. These results open the door to the design of new first-order accelerated algorithms in optimization taking into account the specific properties of potential and nonpotential terms. The proofs and techniques are original and differ from the classical ones due to the presence of the nonpotential term.


Introduction and preliminary results
Let H be a real Hilbert space endowed with the scalar product ·, · and the associated norm · . Many situations coming from physics, biology, human sciences involve equations containing both potential and nonpotential terms. In human sciences, this comes from the presence of both cooperative and noncooperative aspects. To describe such situations we will focus on solving additively structured monotone equations of the type Find x ∈ H : ∇f (x) + B(x) = 0. (1.1) In the above equation, ∇f is the gradient of a convex continuously differentiable function f : H → R (that's the potential part), and B : H → H is a nonpotential operator 1 which is supposed to be monotone and cocoercive. To this end, we will consider continuous inertial dynamics whose solution trajectories converge as t → +∞ to solutions of (1.1). Our study is part of the active research stream that studies the close relationship between continuous dissipative dynamical systems and optimization algorithms which are obtained by their temporal discretization. To avoid lengthening the paper, we limit our study to the analysis of the continuous dynamic. The analysis of the algorithmic part and its link with first-order numerical optimization will be carried out in a second companion paper. From this perspective, damped inertial dynamics offers a natural way to accelerate these systems. As the main feature of our study, we will introduce into the dynamic geometric dampings which are respectively driven by the Hessian for the potential part, and by the corresponding Newton term for the nonpotential part. In addition to improving the convergence rate, this will considerably reduce the oscillatory behavior of the trajectories. We will pay particular attention to the minimal assumptions which guarantee convergence of the trajectories, and which highlight the asymmetric role played by the two operators involved in the dynamic. We will see that many results can be extended to the case where f : H → R ∪ {+∞} is a convex lower semicontinuous proper function, which makes it possible to broaden the field of applications.

Dynamical inertial Newton method for additively structured monotone problems
Let us introduce the following second-order differential equation which will form the basis of our analysis: x(t) + γẋ(t) + ∇f (x(t)) + B(x(t)) + β f ∇ 2 f (x(t))ẋ(t) + β b B (x(t))ẋ(t) = 0, t ≥ t 0 . (DINAM) We use (DINAM) as an abbreviation for Dynamical Inertial Newton method for Additively structured Monotone problems. We call t 0 ∈ R the origin of time. Since we are considering autonomous systems, we can take any arbitrary real number for t 0 . For simplicity, we set t 0 = 0. When considering the corresponding Cauchy problem, we add the initial conditions: x(0) = x 0 ∈ H andẋ(0) = x 1 ∈ H.
The term B (x(t))ẋ(t) is interpreted as d dt (B(x(t))) taken in the distribution sense. Likewise the term ∇ 2 f (x(t))ẋ(t) is interpreted as d dt (∇f (x(t))) taken also in the distribution sense. Because of the assumptions made below these terms are indeed mesurable functions which are bounded on the bounded time intervals. So, we will consider strong solutions of the above equation (DINAM) . Throughout the paper we make the following standing assumptions: (A1) f : H → R is convex, of class C 1 , ∇f is Lipschitz continuous on the bounded sets; (A2) B : H → H is a λ-cocoercive operator for some λ > 0; (A3) γ > 0, β f ≥ 0, β b ≥ 0 are given real damping parameters.
We emphasize the fact that we do not assume the gradient of f to be globally Lipschitz continuous. Developping our analysis without using any bound on the gradient of f is a key to further extend the theory to the nonsmooth case. As a specific property, the inertial system (DINAM) combines two different types of driving forces associated respectively with the potential operator ∇f and the nonpotential operator B. It also involves three different types of friction: (a) The term γẋ(t) models viscous damping with a positive coefficient γ > 0.
(b) The term β f ∇ 2 f (x(t))ẋ(t) is the so-called Hessian driven damping, which allows to attenuate the oscillations that naturally occur with the inertial gradient dynamics.
(c) The term β b B (x(t))ẋ(t) is the nonpotential version of the Hessian driven damping. It can be interpreted as a Newton-type correction term.
Note that each driving force term enters (DINAM) with its temporal derivative. In fact, we have This is a crucial observation which makes (DINAM) equivalent to a first order system in time and space, and makes the corresponding Cauchy problem well posed. This will be proved later (see subsection 2.1 for more details). The cocoercivity assumption on the operator B plays an important role in the analysis of (DINAM), not only to ensure the existence of solutions, but also to analyze their asymptotic behavior as time t → +∞.
Recall that the operator B : H → H is said to be λ-cocoercive for some λ > 0 if Note that B is λ-cocoercive is equivalent to B −1 is λstrongly monotone, i.e. cocoercivity is a dual notion of strong monotonicity. It is easy to check that B is λ-cocoercive implies that B is 1/λ-Lipschitz continuous. The reverse implication holds true in the case where the operator is the gradient of a convex and differentiable function. Indeed, according to Baillon-Haddad's Theorem [19], ∇f is L-Lipschitz continuous implies that ∇f is a 1/L-cocoercive operator (we refer to [20,Corollary 18.16] for more details).

Historical aspects of the inertial systems with Hessian-driven damping
The following inertial system with Hessian-driven damping was first considered by Alvarez-Attouch-Peypouquet-Redont in [6]. Then, according to the continuous interpretation by Su-Boyd-Candès [31] of the accelerated gradient method of Nesterov, Attouch-Peypouquet-Redont [16] replaced the fixed viscous damping parameter γ by an asymptotic vanishing damping parameter α t , with α > 0. At first glance, the presence of the Hessian may seem to entail numerical difficulties.
However, this is not the case as the Hessian intervenes in the above ODE in the form ∇ 2 f (x(t))ẋ(t), which is nothing but the derivative with respect to time of ∇f (x(t)). So, the temporal discretization of these dynamics provides first-order algorithms of the form As a specific feature, and by comparison with the classical accelerated gradient methods, these algorithms contain a correction term which is equal to the difference of the gradients at two consecutive steps. While preserving the convergence properties of the accelerated gradient method, they provide fast convergence to zero of the gradients, and reduce the oscillatory aspects. Several recent studies have been devoted to this subject, see Attouch-Chbani-Fadili-Riahi [8], Boţ-Csetnek-László [22], Kim [27], Lin-Jordan [28], Shi-Du-Jordan-Su [30], and Alesca-Lazlo-Pinta [4] for an implicit version of the Hessian driven damping. Application to deep learning has been recently developed by Castera-Bolte-Févotte-Pauwels [25]. In [3], Adly-Attouch studied the finite convergence of proximal-gradient inertial algorithms combining dry friction with Hessian-driven damping.

Inertial dynamics involving cocoercive operators
Let's come to the transposition of these techniques to the case of maximally monotone operators.Álvarez-Attouch [5] and Attouch-Maingé [12] studied the equation when A : H → H is a cocoercive (and hence maximally monotone) operator, (see also [21]). The cocoercivity assumption plays an important role in the study of (1.2), not only to ensure the existence of solutions, but also to analyze their long-term behavior. Assuming that the cocoercivity parameter λ and the damping coefficient γ satisfy the inequality λγ 2 > 1, Attouch-Maingé [12] showed that each trajectory of (1.2) converges weakly to a zero of A, i.e. x(t) x ∞ ∈ A −1 (0) as t → +∞. Moreover, the condition λγ 2 > 1 is sharp. For general maximally monotone operators this property has been further exploited by Attouch-Peypouquet [15], and by Attouch-Laszlo [10,11]. The key property is that for λ > 0, the Yosida approximation A λ of A is λ-cocoercive and A −1 λ (0) = A −1 (0). So the idea is to replace the operator A by its Yosida approximation, and adjust the Yosida regularization parameter. Another related work has been done by Attouch-Maingé [12] who first consider the asymptotic behavior of the second order dissipative evolution equation with f : H → R convex and B : H → H cocoercivë combining potential with nonpotential effects. Our study will therefore consist initially in introducing the Hessian term and the Newton-type correcting term into this dynamic.

Link with Newton-like methods for solving monotone inclusions
Let us specify the link between our study and Newton's method for solving (1.1). To overcome the illposed character of the continuous Newton method for a general maximally monotone operator A, the following first order evolution system was studied by Attouch-Svaiter [18], This system can be considered as a continuous version of the Levenberg-Marquardt method, which acts as a regularization of the Newton method. Remarkably, under a fairly general assumption on the regularization parameter γ(t), this system is well posed and generates trajectories that converge weakly to equilibria (zeroes of A). Parallel results have been obtained for the associated proximal algorithms obtained by implicit temporal discretization, see [2], [14], [17]. Formally, this system is written as Thus (DINAM) can be considered as an inertial version of this dynamical system for structured monotone operator A = ∇f + B. Our study is also linked to the recent works by Attouch-Laszlo [10,11] who considered the general case of monotone equations. By contrast with [10,11], according to the cocoercivity of B, we don't use the Yosida regularization, and exhibit minimal assumptions involving only the nonpotential component.

Contents
The paper is organized as follows. Section 1 introduces (DINAM) with some historical perspective. In section 2, based on the first order equivalent formulation of (DINAM), we show that the Cauchy problem is well-posed (in the sense of existence and uniqueness of solutions). In section 3, we analyze the asymptotic convergence properties of the trajectories generated by (DINAM). Using appropriate Lyapunov functions, we show that any trajectory of (DINAM) converges weakly as t → +∞, and that its limit belongs to S = (∇f + B) −1 (0). The interplay between the damping parameters β f , β b , γ and the cocoercivity parameter λ will play an important role in our Lyapounov analysis. In Section 4, we perform numerical experiments showing that the well-known oscillations in the case of the heavy ball with friction, are damped with the introduction of the geometric (Hessian-like) damping terms. An application to the LASSO problem with a nonpotential operator as well as a coupled system in dynamical games are considered. Section 5 deals with the extension of the study to the nonsmooth and convex case. Section 6 contains some concluding remarks and perspectives.

Well posedness of the Cauchy-Lipschitz problem
We first show the existence and the uniqueness of the solution trajectory for the Cauchy problem associated with (DINAM) for any given initial condition data (x 0 , x 1 ) ∈ H × H.

First-order in time and space equivalent formulation
The following first-order equivalent formulation of (DINAM) was first considered by Alvarez-Attouch-Bolte-Redont [6] and Attouch-Peypouquet-Redont [16] in the framework of convex minimization. Specifically, in our context, we have the following equivalence, which follows from a simple differential and algebraic calculation.

Proposition 2.1
Suppose that β f > 0. Then, the following problems are equivalent: which gives the first equation of (ii). By differentiating y(·) and using (i), we geṫ By combining (2.1) and (2.2), we obtaiṅ This gives the second equation of (ii).
(ii) =⇒ (i). By differentiating the first equation of (ii), we obtain Let us eliminate y from this equation to obtain an equation involving only x. For this, we successively use the second equation in (ii), then the first equation in (ii) to obtaiṅ Therefore,ẏ From (2.4) and (2.5), we obtain (i).

Well-posedness of the evolution equation (DINAM)
The following theorem shows the well-posedness of the Cauchy problem for the evolution equation (DI-NAM).
Proof. The system (ii) in Proposition 2.1 can be written equivalently aṡ is a Lipschitz continuous map. Indeed, the Lipschitz continuity of G is a direct consequence of the Lipschitz continuity of B. The existence of a classical solution tȯ follows from Brézis [23,Proposition 3.12]. In fact, the proof of this result relies on a fixed point argument. It consists in finding a fixed point of the mapping It is proved that the sequence of iterates (w n ) generated by the corresponding Picard iteratioṅ converges uniformly on [0, T ] to a fixed point of K. When returning to (DINAM), that's equation (i) of Proposition 2.1, we recover a strong solution. Precisely,ẋ is Lipschitz continuous on the bounded time intervals, andẍ taken in the distribution sense is locally essentially bounded.
Remark 2.1 Note that when ∇f is supposed to be globally Lipschitz continuous, the above proof can be notably simplified, by just applying the classical Cauchy-Lipschitz theorem.

Asymptotic convergence properties of (DINAM)
In this section, we study the asymptotic behavior of the solution trajectories of (DINAM). For each solution trajectory t → x(t) of (DINAM) we show that the weak limit, w-lim t→+∞ x(t) = x ∞ exists, and satisfies Before stating our main result, notice that B(p) is uniquely defined for p ∈ S.
By monotonicity of ∇f we have

General case
The general line of the demonstration is close to one given by Attouch-Laszlo in [10,11]. A first major difference with the approach developed in [10,11] is that in our context thanks to the hypothesis of cocoercivity on the nonpotential part, we do not need to go through the Yosida regularization of the operators. A second difference is that we treat the potential and nonpotential operators in a differentiated way. These points are crucial for applications to numerical algorithms, because the computation of the Yosida regularization of the sum of the two operators is often out of reach numerically.
The following Theorem states the asymptotic convergence properties of (DINAM).
(ii) (integral estimates) Set A := B + ∇f and p ∈ S. Then, Proof. Lyapunov analysis. Set A := B + ∇f and A β : where c and δ are coefficients to adjust. Using the derivation chain rule for absolutely continuous functions (see [24,Corollary VIII.10]) and (DINAM), we geṫ We have Using the fact that p ∈ S, ∇f is monotone, and B is λ-cocoercive, we have and E p : [0, +∞[→ R be the energy function given by By using (3.7) and (3.8), the equation (3.6) can be rewritten aṡ Let us eliminate the term ∇f (x(t))−∇f (p) from this relation by using the elementary algebraic inequality We obtainĖ According to Lemma 7.3, and since a = δ = cγ − 1 > 0, we have that q is positive definite if and only if Our aim is to find c such that cγ − 1 > 0 and such that (3.11) is satisfied. Take δ := cγ − 1 > 0 as a new variable. Equivalently, we must find δ > 0 such that After development and simplification we obtain Therefore, we just need to assume that Elementary optimization argument gives that Therefore we end up with the condition When β b = β f = β we recover the condition λγ > β + 1 γ .
Note that cγ = 1 + δ and δ > 0 implies c > 0. Therefore, there exist positive real numbers c, µ such thaṫ Estimates. We have shown that there exist positive real numbers c, µ such that, for all t ≥ 0 (3.14) By integrating (3.14) on an interval [0, t], we obtain that for all t ≥ 0, From (3.15) and the definition of E p we immediately deduce Let us return to (3.9). We recall thaṫ After integration on [0, t], and by using the integral estimates +∞ 0 ẋ(t) 2 dt < +∞, and +∞ 0 B(x(t))− B(p) 2 dt < +∞, we obtain the existence of a constant C > 0 such that Therefore, for any > 0, we have Combining this with Moreover, we also have , for all t ≥ 0.  .23), we obtain that Similarly, we also have By using (DINAM) we havë Since the second member of the above equality belongs to L 2 (0, +∞; H), we finally get +∞ 0 ẍ(t) 2 dt < +∞.
The limit. To prove the existence of the weak limit of x(t), we use Opial's lemma (see [29] for more details). Given p ∈ S, let us consider the anchor function defined by, for every t ∈ [0, +∞[ Equivalently,q According to the derivation formula for a product, we can rewrite (3.26) as follows By Cauchy-Lipschitz inequality we geẗ Then note that the second member of (3.27) is nonnegative and belongs to L 1 (0, +∞). Indeed, we have Using (3.18) and (3.22), we deduce that +∞ 0 g(t)dt < +∞.
Note that the left member of (3.27) can be rewritten as a derivative of a function, preciselÿ So we haveḣ (t) ≤ g(t), for every t ≥ 0.
Let us prove that the function h given in (3.28) is bounded below by some constant. Indeed, since the terms q p (t) and Using (3.16) and the fact thatẋ(·) is bounded, we deduce that there exists some m ∈ R such that h(t) ≥ m for every t ≥ 0.
Using the fact that A β (x(t)) − A β (p), x(t) − p tends to zero as t → +∞ (a consequence of (3.25) and x(·) bounded), we obtainq p (t) + γq p (t) = θ(t) with limit of θ(t) exists as t → +∞. The existence of the limit of q p then follows from a classical general result concerning the convergence of evolution equations governed by strongly monotone operators (here γ Id, see Theorem 3.9 page 88 in [23]). This means that for all p ∈ S lim t→+∞ x(t) − p exists.
To complete the proof via the Opial's lemma, we need to show that every weak sequential cluster point of x(t) belongs to S. Let t n → +∞ such that x(t n ) x * , n → +∞. We have A(x(t n )) → 0 strongly in H and x(t n ) x * weakly in H.
From the closedness property of the graph of the maximally monotone operator A in w − H × s − H, we deduce that A(x * ) = 0, that is x * ∈ S.
Consequently, x(t) converges weakly to an element of S as t goes to +∞. The proof of Theorem 3.1 is thereby completed.

Case β b = β f
Let us specialize the previous results in the case β b = β f . We set β b = β f := β > 0, and A := ∇f + B. We thus consider the evolution system The existence of strong global solutions to this system is guaranteed by Theorem 2.1. The convergence properties as t → +∞ of the solution trajectories generated by this system is a consequence of Theorem 3.1 and are given below. Remark 3.1 It is worth stating the result of Corollary 3.1 apart because this is an important case. This also makes it possible to highlight this result compared to the existing literature for second-order dissipative evolution systems involving cocoercive operators. Indeed, letting β go to zero in (3.29) gives the condition introduced by Attouch-Maingé in [12] to study the second order dynamic (1.3) without geometric damping. With respect to [12], the introduction of the geometric damping, i.e., taking β > 0, provides some useful additional estimates.

Numerical illustrations
In this section, we give some numerical illustrations by using a temporal discretization of the dynamic (DINAM). Let us recall the condensed formulation of (DINAM) where A := ∇f + B and A β := β b B + β f ∇f . Take a fixed time step h > 0, and consider the following implicit finite-difference scheme for (DINAM): After expanding (4.1), we obtain Set s := h 1 + γh and α := 1 1 + γh . So we have where Initialize: x k+1 = (Id +sA h ) −1 (y k ).

(4.7)
Remark 4.1 (i) The convergence analysis of the algorithm (DINAAM) will be postponed to an other research investigation. In the current version, we focus only on the continuous dynamic (DINAM) and its asymptotic convergence. The numerical experiments below are given for illustrative purposes.
(ii) A general method to generate monotone cocoercive operators which are not gradients of convex functions is to start from a linear skew symmetric operator A and then take its Yosida approximation A λ . As a model situation, take H = R 2 and start from A equal to the rotation of angle π 2 . We have An elementary computation gives that, for any λ > 0 which is therefore λ-cocoercive. As a consequence, for λ = 1 we obtain that the matrix With these basic blocks, one can easily construct many other cocoercive operators which are not potential operators. For that, use Lemma 7.1 which gives that the sum of two cocoercive operators is still cocoercive, and therefore the set of cocoercive operators is a convex cone.
Example 4.1 Let us start this section by a simple illustrative example in R 2 . We take H = R 2 equipped with the usual Euclidean structure. Let us consider B as a linear operator whose matrix in the canonical basis of R 2 is defined by B = A λ for λ = 5. According to Remark 4.1, we can check that B is λcocoercive with λ = 5 and that B is a nonpotential operator. To observe the classical oscillations, in the heavy ball with friction, we take f : We set γ = 0.9. It is clear that f is convex but not strongly convex. We study 3 cases: As a straight application of Theorem 3.1, we obtain that the trajectory x(t) generated by (DINAM) converges to The trajectory obtained by using Matlab is depicted in Figure 1, where we represent the components x 1 (t) and x 2 (t) in red and blue respectively.
Since M M is positive semidefinite for any matrix M , the quadratic function f is convex. Furthermore, if M has full column rank, i.e. rank(M) = n, then M M is positive definite. Therefore f is strongly convex. Take B defined as below Then, B is cocoercive. Indeed, for any x, y ∈ R n , we have In our experiment, we pick M a random 10 × 100 matrix which has not full column rank. Set γ = 3, β b = 1, β f = 1 and the operator B as presented above. Thanks to Corollary 3.1, we conclude that the trajectory x(t) generated by the system (DINAM) converges to x ∞ = (M M + B) −1 M b. Implementing the algorithm (DINAAM) in Matlab, we obtain the plot of k versus the norm of B(x k ) + ∇f (x k ). Similarly, we study several cases by changing the parameters β b , β f . This is depicted in Fig 3. Before ending this part, we discuss an application of our model to dynamical games. The following example is taken from Attouch-Maingé [12] and adapted to our context.

Example 4.3
We make the following standing assumptions: (i) H = X 1 × X 2 is the Cartesian product of two Hilbert spaces equipped with norms · X 1 and · X 2 respectively. In which, x = (x 1 , x 2 ), with x 1 ∈ X 1 and x 2 ∈ X 2 , stands for an element in H; (ii) f : X 1 × X 2 → R is convex function whose gradient is Lipschitz continuous on bounded sets; is the maximally monotone operator which is attached to a smooth convexconcave function L : X 1 × X 2 → R. The operator B is assumed to be λ-cocoercive with λ > 0.
In our setting, with x(t) = (x 1 (t), x 2 (t)) the system (DINAM) is written x 2 (t))) = 0. (4.8) Structured systems such as (4.9) contain both potential and nonpotential terms which are often present in decision sciences and physics. In game theory, (4.9) describles Nash equilibria of the normal form game with two players 1, 2 whose static loss functions are respectively given by f (·, ·) is their joint convex payoff, and L is a convex-concave payoff with zero-sum rule. For more details, we refer the reader to [12]. As an example, take X 1 = X 2 = R, and L : The Nash equilibria described in (4.9) can be solved by using (DINAAM). Take γ = 3, β b = 0.5, β f = 0.5 and x 0 = (1, −1),ẋ 0 = (−10, 10) as initial conditions, then the numerical solution for (DINAM) converges to x ∞ = ( 3 4 , 1) which is the solution of (4.9) as well. The numerical trajectories and phase portrait of our model applied to dynamical games are depicted in Figure 4.

The nonsmooth case
The equivalence obtained in Proposition 2.1 between (DINAM) and a first-order evolution system in time and space allows a natural extension of both our theoretical and numerical results to the case of a convex, lower semicontinuous and proper function f : H → R ∪ {+∞}. It suffices to replace the gradient of f by the convex subdifferential ∂f . We recall that the subdifferential of f at x ∈ H is defined by and the domain of f is equal to domf = {x ∈ H : f (x) < +∞}. This leads to consider the system The prefix g in front of (DINAM) stands for generalized. Note that the first equation of (g-DINAM) is now a differential inclusion, because of the possiblity for ∂f (x(t)) to be multivalued. By taking f = f 0 + δ C , where δ C is the indicator function of a constraint set C, the system (g-DINAM) allows to model damped inelastic shocks in mechanics and decision sciences, see [13]. The original aspect comes from the fact that (g-DINAM) now involves both potential driven forces (attached to f 0 ) and nonpotential driven forces (attached to B). As we will see, taking into account shocks created by nonpotential driving forces is a source of difficulties. Let us first establish the existence and uniqueness of the solution trajectory of the Cauchy problem. Proof. The proof is similar to that of Lemma 3.1. It is based on the monotonicity of the subdifferential of f and the cocoercivity of the operator B.
For sake of simplicity, we give a detailed proof in the case β f = β b = β > 0. The system (g-DINAM) takes the simpler form: To formulate the convergence results and the corresponding estimates, we write the first equation of (g-DINAM) as followsẋ where ξ(t) ∈ ∂f (x(t)) and we set A(x(t)) = ξ(t) + B(x(t)). Then, for any solution trajectory x : [0, +∞[→ H of (g-DINAM) the following properties are satisfied: , with ξ(t) ∈ ∂f (x(t)) as defined in (5.2) and p ∈ S. Then, A(x(t)), x(t) − p dt < +∞.
(ii) (convergence) For any p ∈ S, Proof. Let us adapt the Lyapunov analysis developed in the previous sections to the case where f is nonsmooth. We have to pay attention to the following points. First, we must invoke the (generalized) chain rule for derivatives over curves (see [23,Lemme 3.3]), that is, for a.e t ≥ 0 The second ingredient is the validity of the subdifferential inequality for convex functions. So, let us consider the function t ∈ [0, +∞[ → E p (t) ∈ R + defined by When derivating E p (t), we use the formulation (g-DINAM) which giveṡ which allows to derivateẋ(t) + βA(x(t)), and obtain similar formulas as in the smooth case. Then a close examination of the Lyapunov analysis shows that we can obtain the additional estimate Set 0 ∈ ∂f (p) + B(p). To obtain (5.5), we return to (3.5), and consider the following minorization that we split into a sum with coefficients and 1 − (where > 0 will be taken small enough) Note that in the second above inequality we have used the monotonicity of ∂f . So the proof continues with λ replaced by (1 − )λ. This does not change the conditions on the parameters since in our assumptions, the inequality λγ > β + 1 γ is strict, it is still satisfied by (1 − )λ when is taken small enough. So, after integrating the resulting strict Lyapunov inequality, we obtain the supplementary property (5.5). Until A(x(t)) 2 dt < +∞.
But then we can no longer invoke the Lipschitz continuity on the bounded sets of ∇f . To overcome this difficulty, we modify the end of the proof as follows. Recall that given p ∈ S, the anchor function is defined by, for every t ∈ [0, +∞[ and that we need to prove that the limit of the anchor functions exists, as t → +∞. The idea is to play on the fact that we have in hand a whole collection of Lyapunov functions, parametrized by the coefficient c. Recall that we have obtained that the limit of E p (t) exists as t → +∞, and this is satisfied for a whole interval of values of c. So, for such c, the limit of W c (t) : Take two such values of c, let c 1 and c 2 , and make the difference (recall that δ = cγ − 1). We obtain where W (t) := γ 2 x(t) − p 2 + 1 2 ẋ(t) + βA(x(t)) 2 + ẋ(t) + βA(x(t)), x(t) − p .
So, we obtain the existence of the limit as t → +∞ of W (t). Then note that W (t) = γq p (t) + d dt w(t) where w(t) := q p (t) + β Reformulate W (t) in terms of w(t) as follows As a consequence of (5.5) and of the previous estimates, we have that the limit of the two above integrals exists as t → +∞. Therefore, according to the convergence of W (t) we obtain that lim t→+∞ γw(t) + d dt w(t) exists.
The existence of the limit of w follows from a classical general result concerning the convergence of evolution equations governed by strongly monotone operators (here γ Id, see Theorem 3.9 page 88 in [23]). In turn, using the same argument as above, we obtain that for all p ∈ S lim t→+∞ x(t) − p exists.
As in the smooth case, the strong convergence of B(x(t)) to B(p) is a direct consequence of the integral A(x(t)) 2 dt < +∞, which implies that A(x(t)) converges strongly to zero in an "essential" way. According to Opial's lemma, this allows to complete the convergence proof as in the smooth case. This is a seemingly difficult question to examine in the future.
The convergence of the trajectory t → x(t) is then a consequence of the convergence of the semigroup generated by the sum of a cocoercive operator with the subdifferential of a convex lower semicontinuous and proper function, see Abbas-Attouch [1]. Note that is this case the condition for the convergence of the trajectories generated by (g-DINAM) does not depend any more on the cocoercivity parameter λ.

Conclusion, perspectives
In this paper, in a general real Hilbert space setting, we investigated a dynamic inertial Newton method for solving additively structured monotone problems. The dynamic is driven by the sum of two monotone operators with distinct properties: the potential component is the gradient of a continuously differentiable convex function f , and the nonpotential is a monotone and cocoercive operator B. The geometric damping is controlled by the Hessian of the potential f and by a Newton-type correction term attached to B. The well-posedness of the Cauchy problem is shown as well as the asymptotic convergence properties of the trajectories generated by the continuous dynamic. The convergence analysis is carried out through the parameters β f and β b attached to the geometric dampings as well as the parameters γ and λ (the viscous damping and the coefficient of cocoercivity respectively). The introduction of geometric damping makes it possible to control and attenuate the oscillations known for viscous damping of inertial systems, giving rise to faster numerical methods. It would be interesting to extend the analysis for both the continuous dynamic and its discretization to the case of an asymptotic vanishing damping γ(t) = α t , with α > 0 as in [31]. This is a decisive step towards proposing faster algorithms for solving structured monotone inclusions, which are connected to the accelerated gradient method of Nesterov. The study of the corresponding splitting methods is also an important topic which needs further investigations. In fact, replacing ∇f by a general maximally monotone operator A, whose resolvent can be computed easily, would be an interesting problem to study an inertial forward-backward algorithm with Hessian-driven damping for solving structured monotone inclusions of the form: Ax + Bx 0. This is beyond the scope of the current manuscript and will be the subject of a future separate work.