Projecting onto rectangular matrices with prescribed row and column sums

In 1990, Romero presented a beautiful formula for the projection onto the set of rectangular matrices with prescribed row and column sums. Variants of Romero's formula have been rediscovered by Khoury and by Glunt, Hayden, and Reams, for bistochastic (square) matrices in 1998. These results have found various generalizations and applications. In this paper, we provide a formula for the more general problem of finding the projection onto the set of rectangular matrices with prescribed scaled row and column sums. Our approach is based on computing the Moore-Penrose inverse of a certain linear operator associated with the problem. In fact, our analysis holds even for Hilbert-Schmidt operators and we do not have to assume consistency. We also perform numerical experiments featuring the new projection operator.


Motivation
A matrix in R n×n is called bistochastic if all entries of it are nonnegative and all its row and column sums equal 1. More generally, a matrix is generalized bistochastic if the requirement on nonnegativity is dropped. The bistochastic matrices form a convex polytope B, commonly called the Birkhoff polytope, in R n×n , with its extreme points being the permutation matrices (a seminal result due to Birkhoff and von Neumann). A lovely formula provided in 1998 by Khoury [8] -and also by Glunt et al. [5] -gives the projection of any matrix onto G, the affine subspace of generalized bistochastic matrices (see Example 3.8 below). More generally, nonnegative rectangular matrices with prescribed row and column sums are called transportation polytopes. If the nonnegativity assumption is dropped, then Romero provided already in 1990 an explicit formula (see Remark 3.5 below) which even predates the square case! On the other hand, the projection onto the set of nonnegative matrices N is simple -just replace every negative entry by 0. No explicit formula is known to project a matrix onto the set of bistochastic matrices; however, because B = G ∩ N, one may apply algorithms such as Dykstra's algorithm to iteratively approximate the projection onto B by using the projection operators P G and P N (see, e.g., Takouda's [12] for details). In the case of transportation polytopes, algorithms which even converge in finitely many steps were provided by Calvillo and Romero [4].
The goal of this paper is to provide explicit projection operators in more general settings. Specifically, we present a projection formula for finding the projection onto the set of rectangular matrices with prescribed scaled row and column sums. Such problems arise, e.g., in discrete tomography [13] and the study of transportation polytopes [4]. Our approach uses the Moore-Penrose inverse of a certain linear operator A. It turns out that our analysis works even for Hilbert-Schmidt operators because the range of A can be determined and seen to be closed. Our main references are [3], [7] (for Hilbert-Schmidt operators), and [6] (for the Moore-Penrose inverse). We also note that consistency is not required.
The paper is organized as follows. After recording a useful result involving the Moore-Penrose inverse at the end of this section, we prove our main results in Section 2. These result are then specialized to rectangular matrices in Section 3. We then turn to numerical experiments in Section 4 where we compare the performance of three popular algorithms: Douglas-Rachford, Method of Alternating Projections, and Dykstra.
We conclude this introductory section with a result which we believe to be part of the folklore (although we were not able to pinpoint a crisp reference). It is formulated using the Moore-Penrose inverse of an operator -for the definition of the Moore-Penrose inverse and its basic properties, see [6] (and also [3, pages 57-59] for a crash course). The formula presented works even in the case when the problem is inconsistent and automatically provides a least squares solution. Proposition 1.1. Let A : X → Y be a continuous linear operator with closed range between two real Hilbert spaces. Let b ∈ Y, setb := P ran A b, and set C := A −1b . Then where A † denotes the Moore-Penrose inverse of A.

Hilbert-Schmidt operators
From now on, we assume that X and Y are two real Hilbert spaces, which in turn give rise to the real Hilbert space Hilbert-Schmidt operators encompass rectangular matrices -even with infinitely many entries as long as these are square summable -as well as certain integral operators. (We refer the reader to [7, Section 2.6] for basic results on Hilbert-Schmidt operators and also recommend [10, Section VI.6].) Moreover, H (is generated by and) contains rank-one operators of the form where (v, u) ∈ Y × X, and with adjoint and Moreover, For the rest of the paper, we fix e ∈ X and f ∈ Y, (10) and set Proposition 2.1. A is a continuous linear operator and A = e 2 + f 2 .
Proof. Clearly, A is a linear operator. Moreover, because the Hilbert-Schmidt norm dominates the operator norm. It follows that A is continuous and A ≤ e 2 + f 2 . On the other hand, if T = f ⊗ e, then T = e f , A(T) = ( e 2 f , f 2 e) and hence A(T) = T e 2 + f 2 . Thus A ≥ e 2 + f 2 . Combining these observations, we obtain altogether that A = e 2 + f 2 .
We now prove that ran A is always closed. Consequently, ran A is always a closed linear subspace of Y × X.
and set Then x, e f + η e, e y = 0 f + η e 2 y = y and We now turn to the adjoint of A: Proof. Let T ∈ H and (y, x) ∈ Y × X. Let B be any orthonormal basis of X. Then which proves the result.
Proof. Let T ∈ H.
(i): In this case, A(T) = (0, T * f ) and A * (y, x) = f ⊗ x. Let us verify the Penrose conditions [6, page 48]. First, using (7), which shows that AA † is indeed self-adjoint. Secondly, and if S ∈ H and B is any orthonormal basis of X, then which yields the symmetry of A † A.
Thirdly, using (36) and the assumption that e = 0, we have And finally, using (34), we have (ii): This can be proved similar to (i).
(iii): In this case, A is the zero operator and hence the Desoer-Whalen conditions (see [6, page 51]) make it obvious that A † is the zero operator as well.
Let us define the auxiliary function which allows us to combine the previous two results into one: We now turn to formulas for P ran A and P ran A * .
If e = 0 and f = 0, then The case when e = 0 and f = 0 is treated similarly.
Finally, if e = 0 and f = 0, then A † = 0 and the result follows.
Let T ∈ H. If e = 0 and f = 0, then Moreover,  (11) yield Now we consider all possible cases. If e = 0 and f = 0, then, using (26), Next, if e = 0 and f = 0, then using Theorem 2.5(i) yields Similarly, if e = 0 and f = 0, then using Theorem 2.5(ii) yields And finally, if e = 0 and f = 0, then A † = 0 and hence P C (T) = T. Remark 2.9. Consider Theorem 2.8 and its notation. If (s, r) ∈ ran A, then (s,r) = (s, r) and hence C = A −1 (s, r) which covers also the consistent case. Note that the auxiliary function defined in (40) allows us to combine all four cases into The last two results in this section are inspired by [ and let γ ∈ R. Then and Proof. The projection identities in (56) follow from (9). Note that γ Id ∈ C and hence C = ∅. We may and do assume without loss of generality that e = 1 = f . Now let T ∈ H. Applying (52) with r := γ f and s := γe, we deduce that as claimed.
We conclude this section with a beautiful projection formula that arises when the last result is specialized even further.
Proof. Let T ∈ H. Applying Corollary 2.10 with f = e and γ = 1, we obtain

Rectangular matrices
In this section, we specialize the results of Section 2 to which gives rise to the space of real m × n matrices. Given u and x in R n , and v and y in R m , we have Corresponding to (11), we have The counterpart of (24) reads Translated to the matrix setting, Theorem 2.4 and Theorem 2.5 turn into: Theorem 3.1. Let x ∈ R n and y ∈ R m . If e = 0 and f = 0, then Furthermore, In turn, Corollary 2.7 now states the following: Corollary 3.2. Let x ∈ R n , let y ∈ R m , and let T ∈ R m×n . If e = 0 and f = 0, then and Furthermore, Now let T ∈ R m×n . If e = 0 and f = 0, then Moreover, and for every T ∈ R m×n , Remark 3.5. (Romero; 1990) Consider Corollary 3.4 and its notation. Assume that [s, r] ∈ ran A, which is equivalent to requiring that e, r = f , s (which is sometimes jokingly called the "Fundamental Theorem of Accounting"). Then one verifies that the entries of the matrix in (78) are given also expressed by for every i ∈ {1, . . . , m} and j ∈ {1, . . . , n}. Formula (79) was proved by Romero (see [11,Corollary 2.1]) who proved this result using Lagrange multipliers and who has even a K-dimensional extension (where (79) corresponds to K = 2). We also refer the reader to [4] for using (79) to compute the projection onto the transportation polytope.
Next, Corollary 2.10 turns into the following result: 1 and let γ ∈ R. Then and We conclude this section with a particularization of Corollary 2.11 which immediately follows when X = Y = R n and thus H = R n×n : Corollary 3.7. Suppose that e ∈ R n {0}, and set Then and Proof. Apply Corollary 3.7 with e = u for which e 2 = n.

Remark 3.9.
For some applications of Example 3.8, we refer the reader to [12] and also to the recent preprint [2].

Remark 3.10.
A reviewer pointed out that projection algorithms can also be employed to solve linear programming problems provided a strict complementary condition holds (see Nurminski's work [9]). This does suggest a possibly interesting future project: explore whether the projections in this paper are useful in solving some linear programming problems on rectangular matrices with prescribed row and column sums.

Numerical experiments
We consider the problem of finding a rectangular matrix with prescribed row and column sums as well as some additional constraints on the entries of the matrix. To be specific and inspired by [1], we seek a real matrix of size m × n = 4 × 5 such that its row and column sums are equal tos := 32, 43, 33, 23 andr := 24, 18, 37, 27, 25 , respectively. One solution featuring actually nonnegative integers to this problem is given by is an affine subspace of R 4×5 and that an explicit formula for P B is available through Corollary 3.4. Next, we define the closed convex "hyper box" For instance, the (1, 3)-entry of any nonnegative integer solution must lie between 0 and 32 = min{32, 37}; thus A 1,3 = [0, 32]. The projection of a real number ξ onto the interval [0, min(s i ,r j )] is given by max{0, min{s i ,r j , ξ}}. Because A is the Cartesian product of such intervals, the projection operator P A is nothing but the corresponding product of interval projection operators.
Our problem is thus to We shall tackle (90) with three well known algorithms: Douglas-Rachford (DR), Method of Alternating Projections (MAP), and Dykstra (Dyk). Here is a quick review of how these methods operate, for a given starting matrix T 0 ∈ R 4×5 and a current matrix T k ∈ R 4×5 .

DR updates via
MAP updates via and finally Dyk initializes also R 0 = 0 ∈ R 4×5 and updates via For all three algorithms, it is known that in fact, Dyk satisfies even P A (T k ) → P A∩B (T 0 ) (see, e.g., [3, Corollary 28.3, Corollary 5.26, and Theorem 30.7]). Consequently, for each of the three algorithms, we will focus on the sequence (P A (T k )) k∈N (92) which obviously lies in A and which thus prompts the simple feasibility criterion given by Each algorithm is run for 250 iterations and for 100, 000 instances of T 0 that are produced with entries generated uniformly in [−100, 100]. The plot of the median value for δ k of the iterates is shown in Fig. 1. The shaded region for each line represents the range of values attained at that iteration. We assume an algorithm to have achieved feasibility when δ k = 0. While MAP and DR always achieve feasibility, as can be seen from the range of their values in Fig. 1, DR achieves it the fastest in most cases. To support this, we order these algorithms in Table 1 according to their performance. The first column reports what percent of the instances achieved feasibility in the given order and if any of the algorithms did not converge. So the row labelled "DR<MAP" represents cases where DR achieved feasibility the fastest, MAP was second and Dyk did not converge. The second column report what percent of the first feasible matrices obtained were closest to the starting point T 0 in the given order. This is done by measuring T 0 − T , where · is the operator norm, and T k is the first feasible matrices obtained using a given algorithm (Dyk, DR or MAP) We consider the algorithms tied, if the distance between the starting point and the estimate for both differs by a value less than or equal to 10 −15 . As is evident, a majority of the cases have DR in the lead for feasibility. However, the distance of these matrices is not as close as the ones given by MAP and Dyk when feasible. This is consistent with the fact that DR explores regions further away from the starting point to look for matrices and Dyk is built to achieve the least distance. It's also worth noting that at least one of these algorithms converges in every instance. (Convergence for all three algorithms is guaranteed in theory.)

The convex case
Last but not least, because our problem deals with unscaled row and column sums, we point out that the sought-after projection may also be computed by using the algorithm proposed by Calvillo and Romero [4]

The nonconvex case
We exactly repeat the experiment of Section 4.1 with the only difference being that the (new) set A in this section is the intersection of the (old) set A from the previous section (see (89)) and Z 4×5 . This enforces nonnegative integer solutions. The projection operator P A is obtained by simply rounding after application of P A from Section 4.1.
In this nonconvex case, MAP fails to converge in most cases, whereas DR and Dyk converge to solutions as shown in Fig. 2. This is corroborated by Table 2 where the rows where MAP converges corresponds to only a quarter of the total cases. Again, DR achieves feasibility the fastest in more than half the cases, but Dykstra's algorithm gives the solution closest to T 0 among these, as shown in the second column of Table 2. In this nonconvex case convergence of the any of the algorithms is not guaranteed; in fact, there are several instances when no solution is found. However, in the 10 5 runs considered, we did end up discovering several distinct solutions (see Table 3). It turned out that all solutions found were distinct even across all three algorithms resulting in 113622 different nonnegative integer solutions in total.   Availability of data and materials The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.
Funding The research of HHB and XW was partially supported by Discovery Grants from the Natural Sciences and Engineering Research Council of Canada. Authors' contributions All authors contributed equally in writing this article. All authors read and approved the final manuscript. Authors' acknowledgments We thank the editor Aviv Gibali, three anonymous reviewers, and Matt Tam for constructive comments and several pointers to literature we were previously unaware of.