A primal-dual fixed-point algorithm for minimization of the sum of three convex separable functions

Many problems arising in image processing and signal recovery with multi-regularization can be formulated as minimization of a sum of three convex separable functions. Typically, the objective function involves a smooth function with Lipschitz continuous gradient, a linear composite nonsmooth function and a nonsmooth function. In this paper, we propose a primal-dual fixed-point (PDFP) scheme to solve the above class of problems. The proposed algorithm for three block problems is a fully splitting symmetric scheme, only involving explicit gradient and linear operators without inner iteration, when the nonsmooth functions can be easily solved via their proximity operators, such as $\ell_1$ type regularization. We study the convergence of the proposed algorithm and illustrate its efficiency through examples on fused LASSO and image restoration with non-negative constraint and sparse regularization.


Introduction
In this paper, we aim to design a primal-dual fixed-point algorithmic framework for solving the following minimization problem: where f 1 , f 2 and f 3 are three proper lower semi-continuous convex functions, and f 1 is differentiable on R n with a 1/β-Lipschitz continuous gradient for some β ∈ (0, +∞], while B : R n → R m is a bounded linear transformation. This formulation covers a wide application in image processing and signal recovery with multi-regularity terms. For instance, in many imaging and data processing applications, the functional f 1 corresponds to a data-fidelity term, and the last two terms are related to regularity terms. As a direct example of (1.1), we can consider the fused LASSO penalized problems [21] defined by min x∈R n 1 2 Ax − a 2 + µ 1 Bx 1 + µ 2 x 1 .
On the other hand, in the imaging science, total variation regularization with B being the discrete gradient operator together with ℓ 1 regularization has been adopted in some image restoration applications, for example in [10].
As far as we know, Condat [6] tackled a problem with the same form as given in (1.1) and proposed a primal dual splitting scheme. Extensions to multi-block composite functions are also discussed in detail. For the special case B = I (I denotes the usual identity operator), Davis and Yin [7] proposed a three block operator splitting scheme based on monotone operators. When the problem (1.1) reduces to two-block separable functions, many splitting and proximal algorithms have been proposed and studied in the literature. Among them, extensive research have been conducted on the alternating direction of multiplier method (ADMM) [9] (also known as split Bregman [10], see for example [1] and the references therein). The primal-dual hybrid gradient method (PDHG) [23,8,2,19], also known as Chambolle-Pock algorithm [2], is another class of popular algorithm, largely adopted in imaging applications. In [22,3], several completely decoupled schemes, such as inexact Uzawa solver and primal-dual fixedpoint algorithm, are proposed to avoid subproblem solving for some typical ℓ 1 minimization problems.
Komodakis and Pesquet [12] recently gave a nice overview of recent primal-dual approaches for solving large-scale optimization problems (1.1). A general class of multi-step fixed-point proximity algorithms is proposed in [14], which covers several existing algorithms [2,19] as special cases. In the preparation of this paper, we notice that Li and Zhang [15] also studied the problem (1.1) and introduced a quasi-Newton based scheme as preconditioned operators and the overrelaxation strategies to accelerate the algorithms. Both algorithms can be viewed as a generalization of Condat's algorithm [6]. The theoretical analysis is established based on the multi-step techniques present in [14].
In the following, we mainly review some most relevant work for a concise presentation. We first consider a constrained regularization problem where C ⊂ R n is a closed convex set arising from physical requirements of the solutions. This problem can be reformulated as the form (1.1) by introducing a set indicator function χ C (see (2.3)) as f 3 . For example, this problem (1.2) has been studied in [13] in the context of maximum a posterior ECT reconstruction, and a preconditioned alternating projection algorithm (PAPA) is proposed for solving the resulted regularization problem. For f 3 = 0 in (1.1), we proposed a primal-dual fixed-point algorithm PDFP 2 O (primal-dual fixed-point algorithm based on proximity operator) in [3]. Based on the fixed point theory, we have shown the convergence of the scheme PDFP 2 O and its convergence rate under suitable conditions.
In this work, we aim to extend the ideas of PDFP 2 O in [3] and PAPA in [13] for solving (1.1) without subproblem solving and provide a convergence analysis on the primal dual sequences. The specific algorithm, namely primal-dual fixed-point (PDFP) algorithm, is formulated as follows: where 0 < λ < 1/λ max (BB T ), 0 < γ < 2β. Here prox f is the proximity operator [18] of a function f , see (2.2). When f 3 = χ C , the proposed algorithm (1.3) is reduced to PAPA proposed in [13], see (4.1). For the special case f 3 = 0, we obtain PDFP 2 O proposed in [3]. The convergence analysis of the proposed PDFP algorithm is built upon fixed point theory on the primal and dual pairs. The overall scheme is completely explicit, which allows an easy implementation and parallel computing for many large scale applications. This will be further illustrated through application to the problems arising in statistics learning and image restoration. The PDFP is a symmetric form and it is different from Condat's algorithm proposed in [6]. In addition, we point out that the ranges of the parameters are larger than those of [6,15] and may lead to a significant advantage of parameter selection in practice. This will be further discussed in section 4.
The rest of the paper is organized as follows. In section 2, we will present some preliminaries and notations, and deduce PDFP from the first order optimality condition. In section 3, we will provide the convergence results and the linear convergence rate results for some special cases. In section 4, we will make a comparison on the form of the PDFP algorithm (1.3) with some existing algorithms. In section 5, we will show the numerical performance and the efficiency of PDFP through some examples on fused LASSO and pMRI (parallel magnetic resonance image) reconstruction.
2 Primal dual fixed point algorithm

Preliminaries and notations
For the self completeness of this work, we list some relevant notations, definitions, assumption and lemmas in convex analysis. One may refer to [5,3] and the references therein for more details.
For the ease of presentation, we restrict our discussion in the Euclidean space R n , equipped with the usual inner product ·, · and norm · = ·, · 1/2 . We first assume that the problem (1.1) has at least one solution and f 2 , f 3 , B satisfy 0 ∈ int(dom f2 − B(dom f3 )), (2.1) where the symbol int(·) denotes the strong relative interior of a convex subset, and the effective domain of f is defined as dom f = {x ∈ R n |f (x) < +∞}. The ℓ 1 norm of a vector x ∈ R n is denoted by · 1 and the spectral norm of a matrix is denoted by · 2 . Let Γ 0 (R n ) be the collection of all proper lower semi-continuous convex functions from R n to (−∞, +∞]. For a function f ∈ Γ 0 (R n ), the proximity operator of f : prox f [18] is defined by For a nonempty closed convex set C ⊂ R n , let χ C be the indicator function of C, defined by Let proj C be the projection operator onto C, i.e.
It is easy to see that prox γχC = proj C for all γ > 0, and the proximity operator is a generalization of projection operator. Note that many efficient splitting algorithms rely on the fact that prox f has a closed form solution. For example, when f = γ · 1 , the proximity solution is given by element-wised soft-shrinking. We refer the reader to [5] for more details about proximity operators. Let ∂f be the subdifferential of f , i.e. (2.4) and f * be the convex conjugate function of f , defined by and T is firmly nonexpansive if It is obvious that a firmly nonexpansive operator is nonexpansive. An operator T is δ-strongly monotone if there exists a positive real number δ such that For any two functions f 2 ∈ Γ 0 (R m ) and f 3 ∈ Γ 0 (R n ), and a bounded linear transformation . Then prox f and I − prox f are firmly nonexpansive. In addition, there hold for all x ∈ R n and γ > 0. (2.8) If f has 1/β-Lipschitz continuous gradient further, there holds Lemma 2.3 Let T be an operator, and u * be a fixed-point of T . Let {u k } be the sequence generated by the fixed point iteration Then the sequence {u k } is bounded and converges to a fixed-point of T .
The proof of Lemma 2.3 is standard, and one may refer to the proof of Theorem 3.5 in [3] for more details.
Let γ and λ be two positive numbers. To simplify the presentation, we use in what follows the following notations: For a pair u = (v, x) ∈ R m × R n , we also define a norm on the product space R m × R n as

Derivation of PDFP
On extending the ideas of PAPA proposed in [13] and PDFP 2 O proposed in [3], we derive the following primal-dual fixed-point algorithm (1.3) for solving the minimization problem (1.1). Under the assumption (2.1), by using the first order optimality condition of (1.1) and lemma 2.1, In other words u * = T (u * ) for u * = (v * , x * ). Meanwhile, if u * = T (u * ), we can get that x * meets the first order optimality condition of (1.1) and thus x * is a minimizer of (1.1).
To sum up, we have the following theorem.
It is easy to confirm that the sequence {(v k+1 , x k+1 )} generated by PDFP algorithm (1.3) is the Picard iteration (v k+1 , x k+1 ) = T (v k , x k ). So we will use the operator T to analyze the convergence of PDFP in section 3.

Convergence analysis
In the following, we denote u * = (v * , x * ) as a fixed-point of the operator T . Let {u k+1 = (v k+1 , x k+1 )} be the sequence generated by the operator T .

Convergence
Lemma 3.1 There hold the following estimates: Proof. We first prove (3.1). By Lemma 2.2, we know I − prox γ λ f2 is firmly nonexpansive, and use (1.3b) and (2. Next we prove (3.2). By the optimality condition of (1.3c) (cf. (2.6)), we have By the property of subdifferentials (cf. (2.4)), Therefore, On the other hand, by the optimality condition of (1.3a), it follows that Thanks to the property of subdifferentials, there holds Replacing the term − x k+1 −x k 2 in (3.3) with the right side term of the above inequality, we immediately obtain (3.2).

Lemma 3.2 There holds
Proof. Summing the two inequalities (3.1) and (3.2) and re-arranging the terms, we have where · M is given in (2.15) and (2.16). Meanwhile, by the optimality condition of (2.20), we have which implies On the other hand, it follows from (2.9) that Recalling (2.17), we immediately obtain (3.4) in terms of (3.5)-(3.7).
As a direct consequence of Lemma 3.3 and Lemma 2.3, we obtain the convergence of PDFP as follows.
Theorem 3.1 Let 0 < λ < 1/λ max (BB T ) and 0 < γ < 2β. Then the sequence {u k } is bounded and converges to a fixed-point of T , and {x k } converges to a solution of (1.1).
Proof. By Lemma 2.2, both prox γf3 and I − prox γ λ f1 are firmly nonexpansive, thus the operator T defined by (2.10)-(2.13) is continuous. From Lemma 3.3, we know that the sequence { u k − u * λ } is non-increasing and lim k→+∞ u k+1 − u k λ = 0. By using Lemma 2.3, we know that the sequence {u k } is bounded and converges to a fixed-point of T . By using Theorem 2.1, {x k } converges to a solution of (1.1).

Remark 3.1
For the special case f 3 = 0, the PDFP reduces naturally to PDFP 2 O (4.2) proposed in [3], where the conditions for the parameters are 0 < λ ≤ 1/λ max (BB T ), 0 < γ < 2β. In Theorem 3.1, the condition for the parameter λ is slightly more restricted as 0 < λ < 1/λ max (BB T ). It is easy to see when f 3 = 0, the equation (3.4) in Lemma 3.2 reduces to and the conditions in the proof of Lemma 3.3 can be relaxed to 0 < λ ≤ 1/λ max (BB T ). However, for a general f 3 , the condition can not be relaxed to ensure the positive definitiveness of the matrix M , which is needed for the uniform convergence. Remark 3.2 For the special case f 1 = 0, the problem (1.1) reduces to two-block proper lower semicontinuous convex functions without Lipschitz continuous gradient assumptions. The condition 0 < γ < 2β in PDFP becomes 0 < γ < +∞. Although, γ is an arbitrary positive number in theory, but the range of γ will affect the convergence speed and it is also a difficult problem to choose a best value in practice.

Linear convergence rate for special cases
In the following, we will show the convergence rate results with some additional assumptions on the basic problem (1.1). In particular, for f 3 = 0, the algorithm reduces to PDFP 2 O proposed in [3]. The conditions for a linear convergence given there as condition 3.1 in [3] is as followed: for 0 < λ ≤ 1/λ max (BB T ) and 0 < γ < 2β, there exist η 1 , η 2 ∈ [0, 1) such that where g(x) is given in (2.14). It is easy to see that a strongly convex function f 1 satisfies the condition (3.16). For a general f 3 , we need stronger conditions on the functions.
Theorem 3.2 Suppose that (3.16) holds and f * 2 is strongly convex. Then, we have where 0 < η < 1 is the convergence rate (indicated in the proof ) and δ > 0 is a parameter describing the strongly monotone property of ∂f * 2 (cf. (2.5)).
Proof. Use Moreau's identity (cf. (2.8)) to get So (1.3b) is equivalent to According to the optimality condition of (3.17), Similarly, according to the optimality condition of (2.19), Observing that ∂f * 2 is δ-strongly monotone, we have by (3.18) and (3.19) that Summing the two inequalities (3.20) and (3.2), and then using the same argument for driving (3.5), we arrive at where we have also used the condition (3.16).
We note that a linear convergence rate for strongly convex f * 2 and f 3 are obtained in [15]. They introduced two preconditioned operators for accelerating the algorithm, while a clear relation between the convergence rate and the preconditioned operators is still missing. Meanwhile, introducing preconditioned operators could be beneficial in practice, and we can also introduce a preconditioned operator to deal with ∇f 1 in our scheme. Since the analysis is rather similar to the current one, we will omit it in this paper.

Connections to other algorithms
In this section, we present the connections of the PDFP algorithm to some algorithms proposed previously in the literature. In particular, when f 3 = χ C , due to prox γf3 = proj C , the proposed algorithm (1.3) is reduced to PAPA proposed in [13] (PDFP) where 0 < λ < 1/λ max (BB T ), 0 < γ < 2β. We note that the conditions of the parameters for the convergence of PDFP are larger than those in [13]. Here we still refer (4.1) as PDFP, since PAPA originally proposed in [13] incorporates other techniques such as diagonal preconditioning. For the special case f 3 = 0, due to prox γf3 = I, we obtain the PDFP 2 O scheme proposed in [3] (PDFP 2 O) where 0 < λ ≤ 1/(λ max (BB T ) + 1), 0 < γ < 2β. Similar technique of extension to multi composite functions have also been used in [6,14,20]. Compared to PDFP (4.1), the algorithm PDFP 2 O C introduces an extra variable, while PDFP requires two times projections. Most importantly, the primal variable at each iterate of PDFP is feasible, but maybe not for that of PDFP 2 O C . In addition, the permitted ranges of the parameters are also tighter in PDFP 2 O C .
The other most related algorithm to the PDFP algorithm (1.3) is the algorithm proposed by L.
By grouping multi-block as a single block, the authors in [20] extended the PDHG algorithm [19] to multi composite functions penalized problems. The authors in [14] proposed a class of multi-step fixedpoint proximity algorithms, including several existing algorithms as special examples, for example the algorithms in [2,19]. The three-block method proposed by Davis and Yin in [7] are based on operator splitting but subproblem solving is required when it is applied to solve (1.1) for the case B = I. Li and Zhang [15] also introduce preconditioned operators based on the techniques present in [14] and including Condat's algorithm in [6] as a special case, and further introduce quasi-Newton and the overrelaxation strategies to accelerate the algorithms. Specifically, we compare the PDFP algorithm (1.3) with the basic Algorithm 3.2 in [6].
In the following, we mainly compare PDFP to Condat's algorithm [6] for a simple presentation. We first change the form of PDFP algorithm (1.3) by using Moreau's identity, see (2.8), i.e.
where v k = λ γ v k . A direct comparison is presented in Table 4.1. From Table 4.1, we can see that the ranges of the parameters in Condat's algorithm are relatively smaller than PDFP. Also since the condition for Condat's algorithm is mixed with all the parameters, it is not always easy to choose them in practice. This is also pointed out in [6]. While the rules for the parameters in PDFP are separate, and they can be chosen independently according to the Lipschitiz constant and the operator norm of BB T . In this sense, our parameter rules are relatively more practical. In the numerical experiments, we can set λ to be close to 1/λ max (BB T ) and γ to be close to 2β for most of tests. Nevertheless, PDFP has an extra step (1.3a) compared to Condat's algorithm and the computation cost may increase due to the computation of prox γf3 . In practice, this step is often related to ℓ 1 shrinkage, so the cost could be still ignorable in practice.

Numerical experiments
In this section, we will apply the PDFP algorithm to solve two problems: the fused LASSO penalized problems and parallel Magnetic Resonance Imaging (pMRI) reconstruction. All the experiments are implemented under MATLAB7.00 (R14) and conducted on a computer with Intel (R) core (TM) i5-4300U CPU@1.90G.

The fused LASSO penalized problems
The fused LASSO (Least absolute shrinkage and selection operator) penalized problems is proposed for group variable selection, and one can refer to [16,17] for more details for the applications of this model. It can be described as min x∈R n 1 2 Ax − a 2 + µ 1 Here A ∈ R r×n , a ∈ R r . The row of A: A i for i = 1, 2, · · · , r represent the i th observation of the independent variables and a i denotes the response variable, and the vector x ∈ R n is the regression coefficient to recover. The first term is corresponding to the data-fidelity term, and the last two terms aim to ensure the sparsity in both x and their successive differences in Then the forgoing problem can be reformulated as For this example, we can set We want to show that the PDFP algorithm (1.3) can be applied to solve this generic class of problem (5.1) directly and easily.
The following tests are designed for the simulation. We set r = 500, n = 10000, and the data a is generated as Ax + σe, where A and e are random matrices whose elements are normally distributed with zero mean and variance 1, and σ = 0.01, and x is a generated sparse vector, whose nonzeros elements are showed in Figure 5.1 by green '+'. We set µ 1 = 200, µ 2 = 20 and the maximum iteration number as We compare the PDFP algorithm with Condat's algorithm [6]. For the PDFP algorithm, the parameter λ and γ are chosen according to Theorem 3.1. In practice, we set λ to be close to 1/λ max (BB T ) and γ to be close to 2β. Here we set λ = 1/4 as the n − 1 eigenvalues of BB T can be analytically computed as 2 − 2 cos(iπ/n), i = 1, 2, · · · , n − 1 and γ = 1.99/λ max (A T A). For Condat's algorithm, we set λ = 0.19/4, γ = 1.9/λ max (A T A), which is chosen for a relative better numerical performance. The computation time, the attained objective function values, and the relative errors to the true solution are close for Condat's algorithm and PDFP. From Figure 5.1, we see that both Condat's algorithm and PDFP can quite correctly recover the positions of the non-zeros and the values.

Image restoration with nonnegative constraint and sparse regularization
A general image restoration problem with nonnegative constraint and sparse regularization can be written as min x∈C 1 2 Ax − a 2 + µ Bx 1 , where A is some bounded linear operator describing the image formation process, Bx 1 is the usual ℓ 1 based regularization in order to promote sparsity under the transform B, µ > 0 is the regularization parameter. Here we use isotropic total variation as the regularization functional, thus the matrix B represents for the discrete gradient operator. For this example, we can set f 1 (x) = 1 2 Ax − a 2 , f 2 = µ · 1 , and f 3 = χ C .  We consider pMRI reconstruction, where A = (A T 1 , A T 2 , · · · , A T N ) T for each A j is composed of a diagonal downsampling operator D, Fourier transform F and a diagonal coil sensitivity mapping S j for receiver j, i.e. A j = DF S j and S j are often estimated in advance. It is well known in total variation application that λ max (BB T ) = 8. The related Lipschitz constant of ∇f 1 can be estimated as β = 1.
Therefore the two parameters in PDFP are set as λ = 1/8 and γ = 2. The same simulation setting as in [3] is used in this experiment and we still use artifact power (AP) and two-region signal to noise ratio (SNR) to measure image quality. One may refer to [3,11] for more details.
In the following, we compare PDFP algorithm with the previous proposed algorithms PDFP 2

Conclusion
We have extended the algorithm PAPA [13] and PDFP 2 O [3] to derive a primal-dual fixed-point algorithm PDFP (see (1.3)) for solving the minimization problem of three-block convex separable functions (1.1). The proposed PDFP algorithm is a symmetric and fully splitting scheme, only involving explicit gradient and linear operators without any inversion and subproblem solving, when the proximity operator of nonsmooth functions can be easily handled. The scheme can be easily adapted to many inverse problems involving many terms minimization and it is suitable for large scale parallel implementation.
In addition, the parameter range determined by the convergence analysis is rather simple and clear, and it could be useful for practical application. Finally as discussed in Section 5 in [6], we can also extend the current PDFP algorithm to solve multi-block composite (more than three) minimization problems. Preconditioning operators, as proposed in [20,15] can be also introduced to accelerate PDFP, which could be a future work for some specific applications.