A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization

Hu, Wenyu; Zheng, Weidong; Yu, Gaohang

doi:10.3390/sym11101277

Open AccessArticle

A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization

by

Wenyu Hu

^1,*

,

Weidong Zheng

¹ and

Gaohang Yu

²

¹

College of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China

²

Department of Mathematics, School of Science, Hangzhou Dianzi University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Symmetry 2019, 11(10), 1277; https://doi.org/10.3390/sym11101277

Submission received: 9 September 2019 / Revised: 4 October 2019 / Accepted: 8 October 2019 / Published: 11 October 2019

(This article belongs to the Special Issue Fixed Point Theory and Computational Analysis with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The nuclear norm minimization (NNM) problem is to recover a matrix that minimizes the sum of its singular values and satisfies some linear constraints simultaneously. The alternating direction method (ADM) has been used to solve this problem recently. However, the subproblems in ADM are usually not easily solvable when the linear mappings in the constraints are not identities. In this paper, we propose a proximity algorithm with adaptive penalty (PA-AP). First, we formulate the nuclear norm minimization problems into a unified model. To solve this model, we improve the ADM by adding a proximal term to the subproblems that are difficult to solve. An adaptive tactic on the proximity parameters is also put forward for acceleration. By employing subdifferentials and proximity operators, an equivalent fixed-point equation system is constructed, and we use this system to further prove the convergence of the proposed algorithm under certain conditions, e.g., the precondition matrix is symmetric positive definite. Finally, experimental results and comparisons with state-of-the-art methods, e.g., ADM, IADM-CG and IADM-BB, show that the proposed algorithm is effective.

Keywords:

nuclear norm minimization; matrix completion; alternating direction method; subdifferential; proximity operator

Graphical Abstract

1. Introduction

The rank minimization (RM) problem aims to recover an unknown low-rank matrix from very limited information. It has gained am increasing amount of attention rapidly in recent years, since it has a range of applications in many computer vision and machine learning areas, such as collaborative filtering [1], subspace segmentation [2], non-rigid structure from motion [3] and image inpainting [4]. This paper deals with the following rank minimization problem:

\min_{X \in R^{m \times n}} rank (X) s . t . A X = b,

(1)

where

A : R^{m \times n} \to R^{p}

is a linear map and the vector

b \in R^{p}

is known. The matrix completion (MC) problem is a special case of the RM problem, where

A

is a sampling operator in the form of

A X = X_{Ω}

,

Ω \subset {1, 2, \dots, m} \times {1, 2, \dots, n}

is an index subset, and

X_{Ω}

is a vector formed by the entries of X with indices in

Ω

.

Although Label (1) is simple in form, directly solving it is an NP-hard problem due to the discrete nature of the rank function. One popular way is replacing the rank function with the nuclear norm, which is the sum of the singular values of a matrix. This technique is based on the fact that the nuclear norm minimization (NNM) is the tightest convex relaxation of the rank minimization problem [5]. The obtained new problem is given by

\min_{X \in R^{m \times n}} {∥ X ∥}_{*} s . t . A X = b,

(2)

where

{∥ X ∥}_{*} : = \sum_{i = 1}^{r} σ_{i} (X)

denotes the nuclear norm. It has been shown that recovering a low-rank matrix can be achieved by solving Label (2) [1,6].

In practical applications, the observed data may be corrupted with noise, namely

b = A X + e

, where e contains measurement errors dominated by certain normal distribution. In order to recover the low-rank matrix robustly, problem (2) should be modified to the following inequality constrained problem:

\min_{X \in R^{m \times n}} {∥ X ∥}_{*} s . t . {∥ A X - b ∥}_{2} \leq δ,

(3)

where

{∥ \cdot ∥}_{2}

is the

ℓ_{2}

norm of vector and the constant

δ \geq 0

is the noise level. When

δ = 0

, problem (3) reduces to the noiseless case (2).

Alternatively, problems (2) and (3) can be rewritten as the nuclear norm regularized least-square (NNRLS) under some conditions:

\min_{X \in R^{m \times n}} {∥ X ∥}_{*} + \frac{γ}{2} {∥ A X - b ∥}_{2}^{2},

(4)

where

γ > 0

is as given parameter.

The studies on the nuclear norm minimization problem are mainly along two directions. The first one is enhancing the precision of a low rank approximation via replacing the nuclear norm by a non-convex regularizer—for instance, the Schatten p-norm [7,8], the truncated nuclear norm [4,9], the log or fraction function based norm [10,11], and so on. The second one is improving the efficiency of solving problems (2), (3) and (4) and their variants. For instance, the authors in [12] treated the problem (2) as a standard linear semidefinite programming (SDP) problem, and proposed the solving algorithm from the dual problem. However, since the SDP solver uses second-order information, with the increase in the size of the matrix, the memory required to calculate the descending direction quickly becomes too large. Therefore, algorithms that use only first-order information are developed, such as the singular value thresholding (SVT) algorithm [13], the fixed point continuation algorithm (FPCA) [14], the accelerated proximal gradient Lagrangian (APGL) method [15], the proximal point algorithm based on indicator function (PPA-IF) [16], the augmented Lagrange multiplier (ALM) algorithm [17] and the alternating direction methods (ADM) [18,19,20,21].

In particular, Chen et al. [18] applied the ADM to solve the nuclear norm based matrix completion problem. Due to the simplicity of the linear mapping

A

, i.e.,

A A^{*} = I

, all of the ADM subproblems of the matrix completion problem can be solved exactly by an explicit formula; see [18] for details. Here, and hereafter

A^{*}

and

I

represent the adjoint of

A

and the identity operator. However, for a generic

A

with

A A^{*} \neq I

, some of the resulting subproblems no longer have closed-form solutions. Thus, the efficiency of the ADM depends heavily on how to solve these harder subproblems.

To solve this difficulty, a common strategy is to introduce new auxiliary variables, e.g., in [19], one auxiliary variable was introduced for solving Label (2), while two auxiliary variables were introduced for Label (3). However, with more variables and more constraints, more memory is required and the convergence of ADM also becomes slower. Moreover, to update auxiliary variables, whose subproblems are least square problems, expensive matrix inversions are often necessary. Even worse, the convergence of ADM with more than two variables is not guaranteed. To mitigate these problems, Yang and Yuan [21] presented a linearized ADM to solve the NNRLS (4) as well as problems (2) and (3), where each subproblems admit explicit solutions. Instead of the linearized technique, Xiao and Jin [19] solve one least square subproblem iteratively by the Barzilai–Borwein (BB) gradient method [22]. Unlike [19], Jin et al. [20] used the linear conjugate gradient (CG) algorithm rather than BB to solve the subproblem.

In this paper, we further investigate the efficiency of ADM in solving the nuclear norm minimization problems. We first reformulate the problems (2), (3) and (4) into a unified form as follows:

\min_{X \in R^{m \times n}} f (X) + g (A X),

(5)

where

f : R^{m \times n} \to R

and

g : R^{p} \to R

. In this paper, we always fix

f (\cdot) = {∥ \cdot ∥}_{*}

. When considering problems (2) and (3), we choose

g (\cdot) = X_{B_{δ}} (\cdot - b)

, where

X_{B_{δ}} (\cdot)

denotes the indicator function over

B_{δ} : = {u \in R^{p} ∣ {∥ u ∥}_{2} \leq δ}

, i.e.,

\begin{matrix} X_{B_{δ}} (x) : = \{\begin{matrix} 0, & if x \in B_{δ}, \\ + \infty, & otherwise . \end{matrix} \end{matrix}

(6)

When considering problem (4), we choose

g (\cdot) = \frac{γ}{2} {∥ \cdot - b ‖}_{2}^{2}

. As a result, for a general linear mapping

A

, we only need to solve such a problem whose objective function is a sum of two convex functions and one of them contains an affine transformation.

Motivated by the ADM algorithms above, we then present a unified proximity algorithm with adaptive penalty (PA-AP) to solve Label (5). In particular, we employ the proximity idea to deal with the problems encountered by the present ADM, by adding a proximity term to one of the subproblems. We term the proposed algorithm as a proximity algorithm because we can rewrite it as a fixed-point equation system of proximity operators of f and g. By analyzing the fixed-point equations and applying the “Condition-M” [23], the convergence of the algorithm is proved under some assumptions. Furthermore, to improve the efficiency, an adaptive tactic on the proximity parameters is put forward. This paper is closely related to the works [23,24,25,26]. However, this paper is motivated to improve ADM to solve the nuclear norm minimization problem with linear affine constraints.

The organization of this paper is as follows. In Section 2, a review of ADM and its application on NNM are provided. In addition, the properties about subdifferentials and proximity operators are introduced. To improve ADM, a proximity algorithm with adaptive penalty is proposed, and convergence of the proposed algorithm is obtained in Section 3. Section 4 demonstrates the performance and effectiveness of the algorithm through numerical experiment. Finally, we will make a conclusion in Section 5.

2. Preliminaries

In this section, we give a brief review on ADM and its applications to the NNM problem (2) developed in [19,20]. In addition, some preliminaries on subdifferentials and proximity operators are given. Throughout this paper, linear maps are denoted with calligraphic letters (e.g.,

A

), while capital letters represent matrices (e.g., A), and boldface lowercase letters represent vectors (e.g.,

x

).

We begin with introducing the ADM. The basic idea of ADM goes back to the work of Gabay and Mercier [27]. ADM is designed to solving the separable convex minimization problem:

\min_{x, y} θ_{1} (x) + θ_{2} (y), s . t . A x + B y = c,

(7)

where

θ_{1} : R^{s} \to R

,

θ_{2} : R^{t} \to R

are convex functions, and

A \in R^{l \times s}

,

B \in R^{l \times t}

and

c \in R^{l}

. The corresponding augmented Lagrangian function is

L (x, y, λ) = θ_{1} (x) + θ_{2} (y) - 〈 λ, A x + B y - c 〉 + \frac{β}{2} {∥ A x + B y - c ∥}_{2}^{2},

(8)

where

λ \in R^{l}

is the Lagrangian multiplier and

β > 0

is a penalty parameter. ADM is to minimize (8) first with respect to

x

, then with respect to

y

, and finally update

λ

iteratively, i.e.,

\begin{matrix} \{\begin{matrix} x_{k + 1} = \arg \min_{x} L (x, y_{k}, λ_{k}), \\ y_{k + 1} = \arg \min_{y} L (x_{k + 1}, y, λ_{k}), \\ λ_{k + 1} = λ_{k} - β (A x_{k + 1} + B y_{k + 1} - c) . \end{matrix} \end{matrix}

(9)

The main advantage of ADM is to make use of the separability structure of the objective function

θ_{1} (x) + θ_{2} (y)

.

To solve (2) based on ADM, the authors in [19,20] introduced an auxiliary variable Y, and equivalently transformed the original model into

\min_{X, Y} {∥ X ∥}_{*} s . t . X - Y = 0, A Y = b .

(10)

The augmented Lagrangian function to (10) is

L (X, Y, Z, z) = {∥ X ∥}_{*} - 〈 Z, X - Y 〉 + \frac{μ}{2} {∥ X - Y ∥}_{F}^{2} - 〈 z, A Y - b 〉 + \frac{γ}{2} {∥ A Y - b ∥}_{2}^{2},

(11)

where

Z \in R^{m \times n}

,

z \in R^{p}

are Lagrangian multipliers,

μ, γ > 0

are penalty parameters and

〈 \cdot 〉

denotes the Frobenius inner product, i.e., the matrices are treated like vectors. Following the idea of ADM, given

(X_{k}, Y_{k})

, the next pair

(X_{k + 1}, Y_{k + 1})

is determined by alternating minimizing (11),

\begin{matrix} \{\begin{matrix} X_{k + 1} = \arg \min_{X} L (X, Y_{k}, Z_{k}, z_{k}), \\ Y_{k + 1} = \arg \min_{Y} L (X_{k + 1}, Y, Z_{k}, z_{k}), \\ Z_{k + 1} = Z_{k} - μ (X_{k + 1} - Y_{k + 1}), \\ z_{k + 1} = z_{k} - γ (A Y_{k + 1} - b) . \end{matrix} \end{matrix}

(12)

Firstly,

X_{k + 1}

can be updated by

\begin{matrix} X_{k + 1} & = & \arg \min_{X} {∥ X ∥}_{*} - 〈 Z_{k}, X - Y_{k} 〉 + \frac{μ}{2} {∥ X - Y_{k} ∥}_{F}^{2} \\ = & \arg \min_{X} {∥ X ∥}_{*} + \frac{μ}{2} {∥ X - (Y_{k} + \frac{1}{μ} Z_{k}) ∥}_{F}^{2}, \end{matrix}

(13)

which in fact corresponds to evaluating the proximal operator of

{∥ \cdot ∥}_{*}

, i.e.,

{prox}_{\frac{1}{β} {∥ \cdot ∥}_{*}}

which is defined below.

Secondly, given

(X_{k + 1}, Z_{k}, z_{k})

,

Y_{k + 1}

can be updated by

\begin{matrix} Y_{k + 1} & = & \arg \min_{Y} - 〈 Z_{k}, X_{k + 1} - Y 〉 + \frac{μ}{2} ∥ X_{k + 1} {- Y ∥}_{F}^{2} - 〈 z_{k}, A Y - b 〉 + \frac{γ}{2} {∥ A Y - b ∥}_{2}^{2} \\ = & \arg \min_{Y} \frac{μ}{2} ∥ Y - (X_{k + 1} - \frac{1}{μ} Z_{k}) ∥_{F}^{2} + \frac{γ}{2} {∥ A Y - (b + \frac{1}{γ} z_{k}) ∥}_{2}^{2}, \end{matrix}

(14)

which is a least square subproblem. Its solution can be found by solving a linear equation:

(μ I + γ A^{*} A) Y = μ X_{k + 1} - Z_{k} - A^{*} (γ b + z_{k}) .

However, computing the matrix inverse

{(μ I + γ A^{*} A)}^{- 1}

is too costly to implement. Though [19,20] adopted inverse-free methods, i.e., BB and CG, to solve (14) iteratively, they are still inefficient, which will be shown in Section 4.

Next, we review definitions of subdifferential and proximity operator, which play an important role in the algorithm and convergence analysis. The subdifferential of a convex function

θ

at a point

x \in R^{d}

is the set defined by

\partial_{θ (\cdot)} (x) = {y \in R^{d} : θ (z) \geq θ (x) + < y, z - x >, \forall z \in R^{d}} .

(15)

The conjugate function of

θ

is denoted by

θ^{*}

, which is defined by

θ^{*} (y) : = sup_{x} {〈 x, y 〉 - θ (x)} .

For

x \in dom (θ)

,

y \in dom (θ^{*})

, it holds that

y \in \partial_{θ (\cdot)} (x) \Leftrightarrow x \in \partial_{θ^{*} (\cdot)} (y),

(16)

where

dom (\cdot)

denotes the domain of a function.

For a given positive definite matrix

H

, the weighted inner product is defined by

{〈 x, y 〉}_{H} = 〈 H x, y 〉

. Furthermore, the proximity operator of

θ

at

x

with respect to

H

[23] is defined by

{prox}_{θ (\cdot), H} (x) = \arg \min {θ (u) + \frac{1}{2} {∥ u - x ∥}_{H}^{2} : u \in R^{d}} .

(17)

If

H = β I

, then

{prox}_{θ (\cdot), H} (\cdot)

is reduced to

{prox}_{θ (\cdot), β I} (x) = \arg \min {θ (u) + \frac{β}{2} {∥ u - x ∥}_{2}^{2} : u \in R^{d}},

and

{prox}_{\frac{1}{β} θ (\cdot)} (\cdot)

is short for

{prox}_{θ (\cdot), β I} (\cdot)

. A relation between subdifferentials and proximity operators is that

y \in \partial_{θ (\cdot)} (x) \Leftrightarrow x = {prox}_{θ (\cdot)} (x + y),

(18)

which is frequently used to construct fixed-point equations and prove convergence of the algorithm.

3. Proximity Algorithm with Adaptive Penalty

In Section 2, it is shown that the current ADM results in expensive matrix inverse computation when solving (2). Therefore, it is desirable to improve it. In this section, we propose a proximity algorithm with adaptive penalty (PA-AP) to solve the unified problem (5).

3.1. Proximity Algorithm

We derive our algorithm from ADM. First of all, we introduce an auxiliary variable

y \in R^{p}

, and convert (5) to

\min_{X, y} f (X) + g (y), s . t . A X - y = 0 .

(19)

The augmented Lagrangian function of (19) is defined by

L (X, y) = f (X) + g (y) - 〈 λ, A X - y 〉 + \frac{μ}{2} {∥ A X - y ∥}_{2}^{2},

(20)

where

λ \in R^{p}

is the Lagrangian multiplier and

μ > 0

is a penalty parameter. ADM first updates X by minimizing

L (X, y)

with

y

being fixed and then updates

y

with X fixed at its latest value, until some convergence criteria are satisfied. After some simplification, we get

\begin{matrix} \{\begin{matrix} X_{k + 1} = \arg \min_{X} f (X) + \frac{μ}{2} {∥ A X - (y_{k} + \frac{1}{μ} λ_{k}) ∥}_{2}^{2}, \\ y_{k + 1} = \arg \min_{y} g (y) + \frac{μ}{2} {∥ y - (A X_{k + 1} - \frac{1}{μ} λ_{k}) ∥}_{2}^{2}, \\ λ_{k + 1} = λ_{k} - μ (A X_{k + 1} - y_{k + 1}) . \end{matrix} \end{matrix}

(21)

Note that the subproblem of

X_{k + 1}

usually has no closed-form solutions when

A

is not the identity. In order to design efficient algorithms, we add a proximity term to this subproblem. More precisely, we propose the following algorithm:

\begin{matrix} \{\begin{matrix} X_{k + 1} = \arg \min_{X} f (X) + \frac{μ}{2} ∥ A X - (y_{k} + \frac{1}{μ} λ_{k}) ∥_{2}^{2} + \frac{1}{2} {∥ X - X_{k} ∥}_{Q}^{2}, \\ y_{k + 1} = \arg \min_{y} g (y) + \frac{μ}{2} {∥ y - (A X_{k + 1} - \frac{1}{μ} λ_{k}) ∥}_{2}^{2}, \\ λ_{k + 1} = λ_{k} - μ (A X_{k + 1} - y_{k + 1}), \end{matrix} \end{matrix}

(22)

where

Q \in R^{m \times m}

is a symmetric positive definite matrix.

The next lemma shows that (22) is equivalent to a fixed-point equation of some proximity operators. The proof is similar to [25].

Lemma 1.

The problem (22) is equivalent to solving the following equations:

\begin{matrix} \{\begin{matrix} z_{k + 1} = {prox}_{μ g^{*} (\cdot)} (μ A X_{k} + z_{k}), \\ X_{k + 1} = {prox}_{τ f (\cdot)} (X_{k + 1} - τ (μ A^{*} A + Q) (X_{k + 1} - X_{k}) - τ A^{*} (2 z_{k + 1} - z_{k})), \end{matrix} \end{matrix}

(23)

where

τ > 0

is arbitrary.

Proof.

By changing the order of iterations, the first-order optimality condition of (22) is

\begin{matrix} \{\begin{matrix} 0 \in \partial_{g (\cdot)} (y_{k + 1}) + μ (y_{k + 1} - A X_{k} + \frac{1}{μ} λ_{k}), \\ λ_{k + 1} = λ_{k} - μ (A X_{k} - y_{k + 1}), \\ 0 \in \partial_{f (\cdot)} (X_{k + 1}) + μ A^{*} (A X_{k + 1} - y_{k + 1} - \frac{1}{μ} λ_{k + 1}) + Q (X_{k + 1} - X_{k}) . \end{matrix} \end{matrix}

(24)

From the first line of (24) and (16), we obtain

μ y_{k + 1} \in \partial_{μ g^{*} (\cdot)} (μ (A X_{k} - y_{k + 1}) - λ_{k}) .

(25)

Denote

z_{k} : = - λ_{k}

. Then,

z_{k + 1} = μ A X_{k} + z_{k} - μ y_{k + 1} .

(26)

Using (18), it follows from (25) and (26) that

z_{k + 1} = {prox}_{μ g^{*} (\cdot)} (μ A X_{k} + z_{k}) .

(27)

By (26), we have

μ A X_{k} + z_{k} = μ y_{k + 1} + μ (A X_{k} - y_{k + 1} - \frac{1}{μ} λ_{k}) = μ y_{k + 1} + z_{k + 1},

and thus

μ A^{*} (A X_{k + 1} - y_{k + 1} - \frac{1}{μ} λ_{k + 1}) + Q (X_{k + 1} - X_{k}) = (μ A^{*} A + Q) (X_{k + 1} - X_{k}) + A^{*} (2 z_{k + 1} - z_{k}) .

(28)

From (28) and the third line of (24), given

τ > 0

, we have

- τ (μ A^{*} A + Q) (X_{k + 1} - X_{k}) - τ A^{*} (2 z_{k + 1} - z_{k}) \in \partial_{τ f (\cdot)} (X_{k + 1}),

which yields

X_{k + 1} = {prox}_{τ f (\cdot)} (X_{k + 1} - τ (μ A^{*} A + Q) (X_{k + 1} - X_{k}) - τ A^{*} (2 z_{k + 1} - z_{k})) .

(29)

Therefore, the results in (23) are achieved by combining (27) and (29). □

We now discuss the choice of Q. To make the first subproblem of (22) have closed-form solutions, in this paper, we simply choose

Q = μ η I - μ A^{*} A

. To make sure Q is positive definite,

η

must satisfy

{η > ∥ A ∥}_{2}^{2}

, where

{∥ \cdot ∥}_{2}

is the spectral norm. By substituting Q into (22), we obtain

\begin{matrix} \{\begin{matrix} X_{k + 1} = \arg \min_{X} f (X) + \frac{μ η}{2} {∥ X - (X_{k} + \frac{A^{*} (λ_{k} - μ (A X_{k} - y_{k}))}{μ η}) ∥}_{2}^{2}, \\ y_{k + 1} = \arg \min_{y} g (y) + \frac{μ}{2} {∥ y - (A X_{k + 1} - \frac{1}{μ} λ_{k}) ∥}_{2}^{2}, \\ λ_{k + 1} = λ_{k} - μ (A X_{k + 1} - y_{k + 1}) . \end{matrix} \end{matrix}

(30)

The subproblems in (30) can be solved explicitly based on proximity operators. Specifically, we have

X_{k + 1} = {prox}_{\frac{1}{μ η} {∥ \cdot ∥}_{*}} (X_{k} + \frac{A^{*} (λ_{k} - μ (A X_{k} - y_{k}))}{μ η}) = U max {Σ - \frac{1}{μ η}, 0} V^{T},

(31)

where

U Σ V^{T}

is the singular value decomposition (SVD) of

X_{k} + \frac{A^{*} (λ_{k} - μ (A X_{k} - y_{k}))}{μ η}

,

U \in R^{m \times m}

,

V \in R^{n \times n}

, and

Σ \in R^{m \times n}

is a diagonal matrix containing the singular values.

Moreover,

y_{k + 1} = {prox}_{\frac{1}{μ} g (\cdot)} (A X_{k + 1} - \frac{1}{μ} λ_{k})

, which depends on the choice of

g (\cdot)

. If

g (\cdot) = X_{B_{δ}} (\cdot - b)

, then

\begin{matrix} p r o x_{\frac{1}{μ} g (\cdot)} (x) = \{\begin{matrix} x, & x - b \in B_{δ}, \\ b + \frac{δ}{{∥ x - b ∥}_{2}} (x - b), & o t h e r w i s e . \end{matrix} \end{matrix}

(32)

If

g (x) = \frac{γ}{2} {∥ x - b ∥}_{2}^{2}

, then

p r o x_{\frac{1}{μ} g (\cdot)} (x) = \frac{γ b + μ x}{γ + μ} .

(33)

3.2. Adaptive Penalty

In previous proximity algorithms [23,28,29], the penalty parameter

μ

is usually fixed. In view of the linearized ADM, Liu et al. [26] presented an adaptive updating strategy for the penalty parameter

μ

. Motivated by it, we update

μ

by

μ_{k + 1} = min {μ_{m a x}, ρ μ_{k}},

(34)

where

μ_{m a x}

is an upper bound of

{μ_{k}}

. The value of

ρ

is defined as

\begin{matrix} ρ = \{\begin{matrix} ρ_{0}, & if μ_{k} max {\sqrt{η} ∥ X_{k + 1} - X_{k} ∥_{F}, ∥ y_{k + 1} - y_{k} ∥_{2} {} / ∥ b ∥}_{2} \leq ϵ_{1}, \\ 1, & otherwise, \end{matrix} \end{matrix}

where

ρ_{0} > 1

and

ϵ_{1} > 0

are given.

Based on the above analysis in Section 3.1 and Section 3.2, the proximity algorithm with adaptive penalty (abbr. PA-AP) for solving (5) can be outlined in Algorithm 1.

Algorithm 1 PA-AP for solving (5)

Input: Observation vector

b

, linear mapping

A

, and some parameters

μ_{0}, ρ_{0}, ε_{1}, ε_{2} > 0

,

μ_{m a x} ≫ μ_{0} > 0

.

Initialize: Set

X_{0}

and

y_{0}

to zero matrix and vector, respectively. Set

μ = μ_{0}

and

{η > ∥ A ∥}_{2}^{2}

. Set

k = 0

.

while not converged, do

step 1: Update

X_{k + 1}

,

y_{k + 1}

and

λ_{k + 1}

in turn by (30).

step 2: Update

μ_{k + 1}

by (34), and let

μ \leftarrow μ_{k + 1}

.

step 3:

k \leftarrow k + 1

.

end while

3.3. Convergence

In this section, we establish the convergence of (22). For problem (5), Li et al. [23] presented a general formula of fixed-point algorithms. Given two symmetric positive definite matrices

S \in R^{p \times p}

and

T \in R^{m \times m}

, denote

S_{A} : = (\begin{matrix} 0 & A \\ - A^{*} & 0 \end{matrix}), R : = (\begin{matrix} S & 0 \\ 0 & T \end{matrix}), E : = I + R^{- 1} S_{A},

where

S_{A}, R

and E can be treated as linear maps which map

(R^{p}) \times (R^{m \times n})

into itself.

Defining

Z : = (z, X) \in (R^{p}) \times (R^{m \times n})

and

T : = {prox}_{(g^{*} + f) (\cdot), R}

, then the solution to (5) is equivalent to

Z = (T \circ E) (Z) .

Furthermore, a multi-step proximity algorithm was proposed, which is

Z_{k + 1} = T (E_{0} Z_{k + 1} + R^{- 1} \sum_{i = 1}^{l} M_{i} Z_{k - i + 1}),

(35)

where

E_{0} = E - R^{- 1} (S_{A} - M_{0})

and

M_{0} = \sum_{i = 1}^{l} M_{i}

. Let

M : = {M_{i} : i = 0, 1, \dots, l}

. Li et al. [23] proved that the sequence

{Z_{k}}

generated by (35) converges to a solution of problem (5) if

M

satisfies the “Condition-M”, which refers to

(i): $M_{0} = \sum_{i = 1}^{l} M_{i}$ ,
(ii): $M_{1} = M_{2} = \dots = M_{l - 1}$ ,
(iii): $H : = M_{0} + M_{l}$ , H is symmetric positive definite,
(iv): $N (H) \subseteq N (M_{l}) \cap N (M_{l}^{T})$ ,
(v): $∥ {(H^{†})}^{\frac{1}{2}} M_{l} {(H^{†})}^{\frac{1}{2}} ∥_{2} < \frac{1}{2}$ ,

where

N (H)

and

H^{†}

are the null space and the Moore–Penrose pseudo-inverse matrix of H, respectively.

By checking the Condition-M, we prove the convergence of (22).

Theorem 1.

If Q is symmetric positive definite, and

∥ A {(A^{*} A + \frac{1}{μ} Q)}^{- \frac{1}{2}} ∥_{2} < 1

, then the sequence generated by (22) converges to the solution of (5).

Proof.

By comparing (23) and (35), it can be found that

l = 2

and

E_{0} = (\begin{matrix} 0 & 0 \\ - 2 τ A^{*} & I - τ (μ A^{*} A + Q) \end{matrix}), R = (\begin{matrix} \frac{1}{μ} I & 0 \\ 0 & \frac{1}{τ} I \end{matrix}), M_{0} = M_{1} = (\begin{matrix} \frac{1}{μ} I & A \\ A^{*} & μ A^{*} A + Q \end{matrix}), M_{2} = 0 .

We clearly see that Item (i) and Item (ii) of Condition-M hold. Furthermore,

H = M_{0} + M_{2} = M_{0}

, which is symmetric. By Lemma 6.2 in [23], H is positive definite if and only if

∥ {(μ A^{*} A + Q)}^{- \frac{1}{2}} A^{*} {(\frac{1}{μ})}^{- \frac{1}{2}} ∥_{2} < 1

, which is equivalent to

∥ A {(A^{*} A + \frac{1}{μ} Q)}^{- \frac{1}{2}} ∥_{2} < 1

. Hence, Item (iii) of Condition-M also holds.

Since H is positive definite, it yields that

N (H) = {0}

, which implies Item (iv) of Condition-M holds. Finally,

M_{2} = 0

implies that Item (v) holds. Consequently, the sequence generated by (23) converges to the solution of (5). The equivalence of (22) and (23) proves the result. □

Corollary 1.

Let

Q = μ η I - μ A^{*} A

. If

{η > ∥ A ∥}_{2}^{2}

, then the sequence generated by (30) converges to the solution of (5).

Proof.

Since

{η > ∥ A ∥}_{2}^{2}

if and only if

∥ A {(A^{*} A + \frac{1}{μ} Q)}^{- \frac{1}{2}} ∥_{2} < 1

, the result is true by Theorem 1. □

4. Numerical Experiments

In this section, we present some numerical experiments to show the effectiveness of the proposed algorithm (PA-AP). To this end, we test algorithms to solve the nuclear norm minimization problem, the noiseless matrix completion problem

(δ = 0)

, noisy matrix completion

(δ > 0)

and low-rank image recovery problem. We compare PA-AP against the ADM [18], IADM-CG [20] and IADM-BB [19]. All experiments are performed under Windows 10 and MATLAB R2016 running on a Lenovo laptop with an Intel CORE i7 CPU at 2.7 GHz and 8 GB of memory. In the numerical experiments of the first two parts, we use randomly generated square matrices for simulations. We denote the true solution by

X^{*} \in R^{m \times m}

. We generate the rank-r matrix

X^{*}

as a product of

X_{L} X_{R}^{T}

, where

X_{L}

and

X_{R}

are independent

m \times r

matrices with i.i.d. Gaussian entries. For each test, the stopping criterion is

RelChg = \frac{∥ X_{k} - X_{k - 1} ∥_{F}}{∥ X_{k - 1} ∥_{F}} \leq ε_{2},

where

ε_{2} > 0

. The algorithms are also forced to stop when the iteration number exceeds

10^{3}

.

Let

\hat{X}

be the solution obtained by the algorithms. We use the relative error to measure the quality of

\hat{X}

compared to the original matrix

X^{*}

, i.e.,

RelErr = \frac{∥ \hat{X} - X^{*} ∥_{F}}{∥ X^{*} ∥_{F}} .

It is obvious that, in each iteration of computing

X^{k + 1}

, PA-AP contains an SVD computation that computes all singular values and singular vectors. However, we actually only need the ones that are bigger than

\frac{1}{μ η}

. This causes the main computational load by using full SVD. Fortunately, this disadvantage can be smoothed by using the software PROPACK [30], which is designed to compute the singular values bigger than a threshold and the corresponding vectors. Although PROPACK can calculate the first fixed number of singular values, it cannot automatically determine the number of singular values greater than

\frac{1}{μ η}

. Therefore, in order to perform a local SVD, we need to predict the number of singular values and vectors calculated in each iteration, which is expressed by

s v_{k}

. We initialize

s v_{0} = 0.01 m

, and update it in each iteration as follows:

s v_{k + 1} = \{\begin{matrix} s v p_{k} + 1, & if s v p_{k} < s v_{k}, \\ s v p_{k} + 5, & if s v p_{k} = s v_{k}, \end{matrix}

where

s v p_{k}

is the number of singular values in the

s v_{k}

singular values that are bigger than

\frac{1}{μ η}

.

We use r and p to represent the rank of an

(m \times n)

matrix and the cardinality of the index set

Ω

, i.e.,

p = | Ω |

, and use

s r = p / (m n)

to represent the sampling rate. The “degree of freedom” of a matrix with rank r is defined by

dof = r (m + n - r)

. For PA-AP, we set

ε_{1} = 10^{- 4}

,

μ_{0} = \frac{1}{{∥ b ∥}_{2}}

,

μ_{m a x} = max {10^{3} μ_{0}, 10^{- 2}}

,

ρ_{0} = 1.7

, and

η = {1.01 ∥ A ∥}_{2}^{2}

. In all the experimental results, the boldface numbers always indicate the best results.

4.1. Nuclear Norm Minimization Problem

In this subsection, we use PA-AP to solve the three types of problems including (2)–(4). The linear map

A

is chosen as a partial discrete cosine transform (DCT) matrix. Specifically, in the noiseless model (2),

A

is generated by the following MATLAB scripts:

i n d i c e s = r a n d s a m p l e (m * n, p); b = d c t 2 (X^{*}); b = b (i n d i c e s),

which shows that

A

maps

R^{m \times n}

into

R^{p}

. In the noise model (3), we further set

b = A X^{*} + ω,

where

ω

is the additive Gaussian noise of zero mean and standard deviation

σ

. In (3), the noise level

δ

is chosen as

{∥ ω ∥}_{2}

.

The results are listed in Table 1, where the number of iterations (Iter) and CPU time in seconds (Time) besides RelErr are reported. To further illustrate the efficiency of PA-AP, we test problems with different matrix sizes and sampling rates (

s r

). In Table 2, we compare the PA-AP with IADM-CG and IADM-BB for solving the NNRLS problem (4). It shows that our method is more efficient than the other two methods, and thus it is suitable for solving large-scale problems.

4.2. Matrix Completion

This subsection adopts the PA-AP method to solve the noiseless matrix completion problem (2) and the noisy matrix completion problem (3) to verify its validity. The mapping

A

is a linear projection operator defined as

A X^{*} = X_{Ω}

, where

X_{Ω}

is a vector formed by the components of

X^{*}

with indices in

Ω

. The indicators of the selected elements are randomly arranged to form a column vector, and the index set of the first

s r \times m \times n

is selected to form the set

Ω

. For noisy matrix completion problems, we take

δ = 10^{- 2}

.

In Table 3, we report the numerical results of PA-AP for noiseless and noisy matrix completion problems, taking

m = n = 1000

and

m = n = 2000

. Only the rank of the original matrix is considered to be

r = 10

and

r = 20

. As can be seen from Table 3, the PA-AP method can effectively solve these problems. Compared with the noiseless problem, PA-AP solves the noisy problems accuracy of the solution dropped. Moreover, the number of iterations and the running time decrease as

s r

increases.

To further verify the validity of the PA-AP method, it is compared with ADM, IADM-CG and IADM-BB. When RelChg is lower than

10^{- 5}

, the algorithms are set to terminate. The numerical results of the four methods for solving the noiseless and noisy MC problem are recorded in Table 4 and Table 5, from which we can see that the calculation time of the PA-AP method is much less than IADM-BB and IADM-CG, and the number of iterations and calculation time of PA-AP and ADM are almost the same, while our method is relatively more accurate. From the limited experimental data, the PA-AP method is shown to be more effective than the ADM, IADM-BB and IADM-CG.

4.3. Low-Rank Image Recovery

In the section, we turn to solve problem (2) for low-rank image recovery. The effectiveness of the PA-AP method is verified by testing three

512 \times 512

grayscale images. First, the original images are transformed into low-rank images with rank 40. Then, we lose some elements from the low rank matrix to get the damaged image, and restore them by using the PA-AP, ADM, IADM-BB and IADM-CG, respectively. The iteration process is stopped when RelChg falls below

10^{- 5}

. The original images, the corresponding low-rank images, the damaged images, and the restored images by the PA-AP are depicted in Figure 1. Observing the figure, we clearly see that our algorithm performs well.

To evaluate the recovery performance, we employ the Peak Signal-to-Noise Ratio (PSNR), which is defined as

\begin{matrix} P S N R = 10 \cdot l o g_{10} (\frac{∥ X^{*} ∥_{\infty}^{2}}{\frac{1}{m n} {∥ X^{*} - \hat{X} ∥}_{F}^{2}}), \end{matrix}

where

∥ X^{*} ∥_{\infty}

is the infinity norm of

X^{*}

, defined as the maximum absolute value of the elements in

X^{*}

. From the definition, higher PSNR indicates a better recovery result.

Table 6 shows the cost time, relative error and PSNR of recovery image by different methods. From Table 6, we can note that the PA-AP method is able to obtain higher PSNR as

s r

increases. Moreover, the running time of PA-AP is always much less than the other methods with different settings. Figure 2 shows the executing process of the different methods. From Figure 2, it is clear that our method can estimate the rank exactly after 30 iterations, and runs much less time before termination than other methods.

5. Conclusions and Future Work

In this paper, a unified model and algorithm for the matrix nuclear norm minimization problem are proposed. In each iteration, the proposed algorithm mainly includes computing matrix singular value decompositions and solving proximity operators of two convex functions. In addition, the convergence of the algorithm is also proved. A large number of experimental results and numerical comparisons show that the algorithm is superior to IADM-BB, IADM-CG and ADM algorithms.

The problem of tensor completion has been widely studied recently [31]. One of our future works is to extend the proposed PA-AP algorithm to tensor completion.

Author Contributions

Methodology, W.H. and W.Z.; Investigation, W.H., W.Z. and G.Y.; Writing–original draft preparation, W.H. and W.Z.; Writing–review and editing, W.H. and G.Y.; Software, W.Z.; Project administration, G.Y.

Funding

This research was funded by the National Natural Science Foundation of China (Nos. 61863001, 11661007, 61702244, 11761010 and 61562003), the National Natural Science Foundation of Jiangxi Province (Nos. 20181BAB202021, JXJG-18-14-11, and 20192BAB205086), the National Science Foundation of Zhejiang Province, China (LD19A010002), Research programme (Nos. 18zb04, YCX18B001) and the ’XieTong ChuangXin’ project of Gannan Normal University.

Acknowledgments

The authors would like to thank the anonymous reviewers that have highly improved the final version of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Candès, E.J.; Recht, B. Exact Matrix Completion via Convex Optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [Green Version]
Liu, G.; Lin, Z.C.; Yu, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 663–670. [Google Scholar]
Dai, Y.; Li, H.; He, M. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 2014, 107, 101–122. [Google Scholar] [CrossRef]
Hu, W.; Wang, Z.; Liu, S.; Yang, X.; Yu, G.; Zhang, J.J. Motion capture data completion via truncated nuclear norm regularization. IEEE Signal Proc. Lett. 2018, 25, 258–262. [Google Scholar] [CrossRef]
Lin, X.F.; Wei, G. Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery. Signal Process. 2015, 114, 24–33. [Google Scholar] [CrossRef]
Candès, E.J.; Plan, Y. Matrix completion with noise. Proc. IEEE 2010, 98, 925–936. [Google Scholar]
Nie, F.; Huang, H.; Ding, C.H. Low-rank matrix recovery via efficient schatten p-norm minimization. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 655–661. [Google Scholar]
Mohan, K.; Fazel, M. Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. 2012, 13, 3441–3473. [Google Scholar]
Zhang, D.; Hu, Y.; Ye, J.; Li, X.; He, X. Matrix completion by truncated nuclear norm regularization. In Proceedings of the Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2192–2199. [Google Scholar]
Nie, F.; Hu, Z.; Li, X. Matrix completion based on non-convex low rank approximation. IEEE Trans. Image Process. 2019, 28, 2378–2388. [Google Scholar] [CrossRef]
Cui, A.; Peng, J.; Li, H.; Zhang, C.; Yu, Y. Affine matrix rank minimization problem via non-convex fraction function penalty. J. Comput. Appl. Math. 2018, 336, 353–374. [Google Scholar] [CrossRef] [Green Version]
Tütüncü, R.H.; Toh, K.C.; Todd, M.J. Solving semidenite-quadratic-linear programs using SDPT3. Math. Program. 2003, 95, 189–217. [Google Scholar] [CrossRef]
Cai, J.F.; Candés, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
Ma, S.; Goldfarb, D.; Chen, L. Fixed point and bregman iterative methods for matrix rank minimization. Math. Progr. 2009, 128, 321–353. [Google Scholar] [CrossRef]
Toh, K.C.; Yun, S. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim. 2010, 6, 615–640. [Google Scholar]
Geng, J.; Wang, L.; Wang, X. Nuclear norm and indicator function model for matrix completion. J. Inverse Ill-Posed Probl. 2015, 24, 1–11. [Google Scholar] [CrossRef]
Lin, Z.; Chen, M.; Ma, Y. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
Chen, C.; He, B.; Yuan, X.M. Matrix completion via an alternating direction method. IMA J. Numer. Anal. 2012, 32, 227–245. [Google Scholar] [CrossRef]
Xiao, Y.H.; Jin, Z.F. An alternating direction method for linear-constrained matrix nuclear norm minimization. Numer. Linear Algebra 2012, 19, 541–554. [Google Scholar] [CrossRef]
Jin, Z.F.; Wang, Q.; Wan, Z. Recovering low-rank matrices from corrupted observations via the linear conjugate gradient algorithm. J. Comput. Appl. Math. 2014, 256, 114–120. [Google Scholar] [CrossRef]
Yang, J.; Yuan, X.M. Linearized augmented Lagrangian and alternating direction methods for nuclear minimization. Math. Comput. 2013, 82, 301–329. [Google Scholar] [CrossRef]
Barzilai, J.; Borwein, J.M. Two point step size gradient method. IMA J. Numer. Anal. 1988, 4, 141–148. [Google Scholar] [CrossRef]
Li, Q.; Shen, L.; Xu, Y. Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Adv. Comput. Math. 2015, 41, 387–422. [Google Scholar] [CrossRef]
Zhang, X.; Burger, M.; Osher, S. A unified prial-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 2011, 46, 20–46. [Google Scholar] [CrossRef]
Wang, J.H.; Meng, F.Y.; Pang, L.P.; Hao, X.H. An adaptive fixed-point proximity algorithm for solving total variation denoising models. Inform. Sci. 2017, 402, 69–81. [Google Scholar] [CrossRef]
Lin, Z.C.; Liu, R.; Su, Z. Linearized alternating direction method with adaptive penalty for low-rank representation. Proc. Adv. Neural Inf. Process. Syst. 2011, 104, 612–620. [Google Scholar]
Gabay, D.; Mercier, B. A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 1976, 2, 17–40. [Google Scholar] [CrossRef]
Chen, P.; Huang, J.; Zhang, X. A primal-dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Probl. 2013, 29, 025011. [Google Scholar] [CrossRef]
Micchelli, C.A.; Shen, L.; Xu, Y. Proximity algorithms for image models: Denoising. Inverse Probl. 2011, 27, 045009. [Google Scholar] [CrossRef]
Larsen, R.M. PROPACK-Software for Large and Sparse SVD Calculations. Available online: http://sun.stanfor.edu/srmunk/PROPACK/ (accessed on 1 September 2019).
Li, X.T.; Zhao, X.L.; Jiang, T.X.; Zheng, Y.B.; Ji, T.Y.; Huang, T.Z. Low-rank tensor completion via combined non-local self-similarity and low-rank regularization. Neurocomputing 2019, 267, 1–12. [Google Scholar] [CrossRef]

Figure 1. Original

512 \times 512

images (Lena, Pirate, Cameraman) with full rank (first column); Corresponding low rank images with

r = 40

(second column); Randomly masked images from rank 40 images with

s r = 40 %

(third column); Recovered images by PA-AP (last column).

Figure 1. Original

512 \times 512

images (Lena, Pirate, Cameraman) with full rank (first column); Corresponding low rank images with

r = 40

(second column); Randomly masked images from rank 40 images with

s r = 40 %

(third column); Recovered images by PA-AP (last column).

Figure 2. Convergence behavior of the four methods (

L e n a, s r = 0.4, ε_{2} = 10^{- 5}

). The first subfigure is the estimated rank; the second is the relative error to the original matrix; and the last is the running time.

Figure 2. Convergence behavior of the four methods (

L e n a, s r = 0.4, ε_{2} = 10^{- 5}

). The first subfigure is the estimated rank; the second is the relative error to the original matrix; and the last is the running time.

Table 1. PA-AP for noiseless and noisy DCT matrix (

m = n, ε_{2} = 10^{- 4}

).

Table 1. PA-AP for noiseless and noisy DCT matrix (

m = n, ε_{2} = 10^{- 4}

).

(n, r)	p/dof	sr	Prob. (3) ( $δ$ = 0)			Prob. (3) ( $δ$ = 10⁻²)			Prob. (4)
(n, r)	p/dof	sr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr
(128, 3)	4.32	0.2	86	2.03	3.942 × 10⁻³	83	1.83	5.706 × 10⁻³	83	2.14	8.050 × 10⁻³
(128, 3)	8.64	0.4	31	0.57	4.042 × 10⁻⁴	33	0.69	5.535 × 10⁻³	34	0.70	6.196 × 10⁻³
(128, 3)	12.95	0.6	20	0.44	1.376 × 10⁻⁴	20	0.45	5.652 × 10⁻³	20	0.48	5.714 × 10⁻³
(128, 3)	17.27	0.8	13	0.29	6.251 × 10⁻⁵	12	0.34	5.952 × 10⁻³	13	0.28	5.966 × 10⁻³
(256, 5)	5.17	0.2	53	2.04	3.178 × 10⁻⁴	53	2.02	3.846 × 10⁻³	53	2.08	4.353 × 10⁻³
(256, 5)	10.34	0.4	31	1.22	1.649 × 10⁻⁴	31	1.16	4.424 × 10⁻³	31	1.28	4.404 × 10⁻³
(256, 5)	15.51	0.6	20	0.78	1.139 × 10⁻⁴	20	0.78	4.414 × 10⁻³	20	0.76	4.419 × 10⁻³
(256, 5)	20.68	0.8	13	0.50	3.893 × 10⁻⁵	13	0.52	4.458 × 10⁻³	14	0.52	4.297 × 10⁻³
(512, 10)	5.17	0.2	55	7.63	2.397 × 10⁻⁴	55	7.77	2.858 × 10⁻³	54	7.86	2.939 × 10⁻³
(512, 10)	10.34	0.4	34	4.22	1.259 × 10⁻⁴	34	4.55	3.068 × 10⁻³	34	4.46	3.081 × 10⁻³
(512, 10)	15.51	0.6	22	2.85	1.195 × 10⁻⁴	22	2.98	3.106 × 10⁻³	22	2.96	3.177 × 10⁻³
(512, 10)	20.68	0.8	13	1.81	1.009 × 10⁻⁴	13	1.85	3.129 × 10⁻³	13	1.86	3.262 × 10⁻³

Table 2. Comparisons of PA-AP, IADM-CG and IADM-BB for DCT matrix (

m = n, ε_{2} = 10^{- 4}

).

Table 2. Comparisons of PA-AP, IADM-CG and IADM-BB for DCT matrix (

m = n, ε_{2} = 10^{- 4}

).

(n, r)	p/dof	sr	PA-AP			IADM-CG			IADM-BB
(n, r)	p/dof	sr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr
(500, 10)	5.05	0.2	54	8.65	2.881 × 10⁻³	33	15.09	8.312 × 10⁻³	39	26.65	9.448 × 10⁻³
(500, 10)	10.10	0.4	33	4.14	3.035 × 10⁻³	23	8.45	4.433 × 10⁻³	26	16.58	6.484 × 10⁻³
(500, 10)	15.15	0.6	22	2.15	3.20 × 10⁻³	15	5.56	3.968 × 10⁻³	18	9.81	4.581 × 10⁻³
(500, 10)	20.20	0.8	13	1.72	3.224 × 10⁻³	10	4.79	3.292 × 10⁻³	12	7.50	3.528 × 10⁻³
(1000, 20)	5.05	0.2	70	38.20	2.110 × 10⁻³	33	69.23	8.806 × 10⁻³	40	141.57	5.047 × 10⁻³
(1000, 20)	10.10	0.4	38	15.14	2.379 × 10⁻³	22	41.34	3.442 × 10⁻³	23	78.78	7.097 × 10⁻³
(1000, 20)	15.15	0.6	21	8.83	2.322 × 10⁻³	17	30.52	3.362 × 10⁻³	19	60.35	4.277 × 10⁻³
(1000, 20)	20.20	0.8	14	6.60	2.379 × 10⁻³	13	25.36	3.665 × 10⁻³	20	57.51	2.724 × 10⁻³
(2000, 20)	5.05	0.2	74	144.89	2.270 × 10⁻³	39	581.64	1.039 × 10⁻²	46	1399.31	8.316 × 10⁻³
(2000, 20)	10.10	0.4	33	62.68	2.348 × 10⁻³	22	328.02	4.755 × 10⁻³	22	628.66	5.090 × 10⁻³
(2000, 20)	15.15	0.6	21	34.60	2.368 × 10⁻³	14	172.70	3.405 × 10⁻³	17	438.01	4.527 × 10⁻³
(2000, 20)	20.20	0.8	13	23.40	2.401 × 10⁻³	20	128.73	2.752 × 10⁻³	15	359.59	3.260 × 10⁻³

Table 3. PA-AP for noiseless and noisy matrix completion problems (

m = n, ε_{2} = 10^{- 5}

).

Table 3. PA-AP for noiseless and noisy matrix completion problems (

m = n, ε_{2} = 10^{- 5}

).

(n, r)	p/dof	sr	Prob. (3) ( $δ$ = 0)			Prob. (3) ( $δ$ = 10⁻²)			Prob. (4)
(n, r)	p/dof	sr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr
(1000, 10)	10.05	0.2	76	21.94	2.439 × 10⁻⁵	76	21.52	3.015 × 10⁻³	76	22.92	5.245 × 10⁻⁴
(1000, 10)	20.10	0.4	40	9.28	1.159 × 10⁻⁵	40	9.68	3.085 × 10⁻³	39	9.55	7.436 × 10⁻⁴
(1000, 10)	30.15	0.6	22	5.72	1.061 × 10⁻⁶	23	6.36	3.105 × 10⁻³	23	6.10	8.251 × 10⁻⁴
(1000, 10)	40.20	0.8	12	3.88	2.859 × 10⁻⁶	14	3.76	3.075 × 10⁻³	14	3.74	8.668 × 10⁻⁴
(1000, 20)	5.05	0.2	87	39.10	3.037 × 10⁻⁵	88	43.58	1.995 × 10⁻³	87	43.71	5.469 × 10⁻⁴
(1000, 20)	10.10	0.4	46	13.02	1.277 × 10⁻⁵	46	13.69	2.115 × 10⁻³	46	13.47	7.467 × 10⁻⁴
(1000, 20)	15.15	0.6	25	7.56	1.230 × 10⁻⁵	25	7.54	2.158 × 10⁻³	25	7.64	8.232 × 10⁻⁴
(1000, 20)	20.20	0.8	14	4.86	9.617 × 10⁻⁶	14	4.87	2.194 × 10⁻³	14	4.88	8.624 × 10⁻⁴
(2000, 10)	20.05	0.2	80	46.73	2.538 × 10⁻⁵	80	67.92	3.086 × 10⁻³	80	54.29	7.422 × 10⁻⁴
(2000, 10)	40.10	0.4	37	26.02	1.299 × 10⁻⁵	37	31.65	3.091 × 10⁻³	37	27.20	8.641 × 10⁻⁴
(2000, 10)	60.15	0.6	23	19.30	9.672 × 10⁻⁶	23	21.64	3.160 × 10⁻³	23	20.02	9.024 × 10⁻⁴
(2000, 10)	80.20	0.8	13	11.99	2.584 × 10⁻⁶	13	13.31	3.153 × 10⁻³	13	11.71	9.276 × 10⁻⁴
(2000, 20)	10.05	0.2	91	108.16	2.853 × 10⁻⁵	91	132.04	2.133 × 10⁻³	91	129.30	7.461 × 10⁻⁴
(2000, 20)	20.10	0.4	40	42.54	1.146 × 10⁻⁵	40	44.62	2.190 × 10⁻³	40	42.49	8.674 × 10⁻⁴
(2000, 20)	30.15	0.6	24	27.07	8.683 × 10⁻⁶	24	28.24	2.226 × 10⁻³	24	27.43	9.058 × 10⁻⁴
(2000, 20)	40.20	0.8	13	15.81	3.772 × 10⁻⁶	13	17.19	2.205 × 10⁻³	13	16.18	9.268 × 10⁻⁴

Table 4. Comparisons of PA-AP, ADM, IADM-CG and IADM-BB for noiseless matrix completion.

(n, r)	p	p/dof	sr	PA-AP			ADM			IADM-CG			IADM-BB
(n, r)	p	p/dof	sr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr
(1024, 5)	209235	10.29	0.2	80	17.52	2.218 × 10⁻⁵	81	18.74	6.133 × 10⁻⁴	52	58.41	3.266 × 10⁻³	50	112.50	1.801 × 10⁻³
(1024, 5)	314529	30.80	0.3	50	10.25	1.743 × 10⁻⁵	56	11.60	3.988 × 10⁻⁴	37	39.52	3.639 × 10⁻³	51	96.33	3.023 × 10⁻³
(1024, 5)	419547	41.06	0.4	37	7.57	1.036 × 10⁻⁵	40	8.31	3.023 × 10⁻⁴	36	36.77	2.439 × 10⁻³	37	67.10	2.655 × 10⁻³
(1024, 5)	525213	51.33	0.5	28	6.44	8.484 × 10⁻⁶	31	7.08	2.600 × 10⁻⁴	31	33.22	1.102 × 10⁻³	29	57.78	1.414 × 10⁻³
(1024, 5)	628736	61.59	0.6	22	4.89	6.311 × 10⁻⁶	26	6.01	2.063 × 10⁻⁴	21	22.91	2.330 × 10⁻³	32	55.61	1.020 × 10⁻³
(1024, 5)	733429	71.86	0.7	17	4.20	4.239 × 10⁻⁶	21	5.42	1.738 × 10⁻⁴	40	40.23	1.361 × 10⁻⁴	53	49.85	1.743 × 10⁻⁴
(1024, 5)	838513	82.12	0.8	13	3.17	3.840 × 10⁻⁶	16	3.95	1.593 × 10⁻⁴	41	29.95	4.171 × 10⁻⁴	30	41.92	1.031 × 10⁻⁴
(1024, 5)	943801	92.39	0.9	13	2.58	2.782 × 10⁻⁶	12	2.58	1.386 × 10⁻⁴	11	12.16	8.341 × 10⁻⁴	12	19.25	6.523 × 10⁻⁴
(1024, 10)	209469	10.29	0.2	77	15.94	2.341 × 10⁻⁵	71	14.79	6.273 × 10⁻⁴	51	50.90	2.646 × 10⁻³	53	109.16	2.210 × 10⁻³
(1024, 10)	315457	15.44	0.3	55	11.95	1.848 × 10⁻⁵	52	11.63	4.264 × 10⁻⁴	40	42.72	2.328 × 10⁻³	39	79.82	2.258 × 10⁻³
(1024, 10)	419442	20.58	0.4	40	10.66	1.005 × 10⁻⁵	39	10.79	3.031 × 10⁻⁴	32	33.68	1.268 × 10⁻³	34	68.76	8.858 × 10⁻³
(1024, 10)	524145	25.73	0.5	30	8.97	8.688 × 10⁻⁵	31	9.49	2.525 × 10⁻⁴	32	35.26	1.137 × 10⁻³	29	55.39	1.536 × 10⁻³
(1024, 10)	629555	30.87	0.6	23	5.26	7.085 × 10⁻⁶	24	5.56	1.972 × 10⁻⁴	39	37.55	4.376 × 10⁻⁴	32	51.02	7.468 × 10⁻³
(1024, 10)	733285	36.02	0.7	18	4.98	6.149 × 10⁻⁶	19	5.13	1.817 × 10⁻⁴	27	28.35	6.503 × 10⁻⁴	30	48.98	3.295 × 10⁻⁴
(1024, 10)	838650	41.16	0.8	14	4.53	2.647 × 10⁻⁶	16	5.28	1.557 × 10⁻⁴	31	33.23	5.794 × 10⁻⁴	17	30.96	6.046 × 10⁻⁴
(1024, 10)	943738	46.31	0.9	12	3.36	2.130 × 10⁻⁶	12	3.62	1.370 × 10⁻⁴	34	34.15	8.934 × 10⁻⁴	23	40.62	3.884 × 10⁻⁴

Table 5. Comparisons of PA-AP, ADM, IADM-CG and IADM-BB for noisy matrix completion (

δ = 10^{- 2})

.

Table 5. Comparisons of PA-AP, ADM, IADM-CG and IADM-BB for noisy matrix completion (

δ = 10^{- 2})

.

(n, r)	p	p/dof	sr	PA-AP			ADM			IADM-CG			IADM-BB
(n, r)	p	p/dof	sr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr	Iter	Time	RelErr
(1024, 5)	210151	20.53	0.2	61	16.96	4.345 × 10⁻³	60	13.61	4.354 × 10⁻³	40	95.66	9.546 × 10⁻³	45	237.45	5.234 × 10⁻³
(1024, 5)	314332	30.80	0.3	43	10.68	4.413 × 10⁻³	43	10.88	4.448 × 10⁻³	27	64.23	1.215 × 10⁻²	31	164.52	5.319 × 10⁻³
(1024, 5)	418708	41.06	0.4	31	8.95	4.416 × 10⁻³	33	9.50	4.419 × 10⁻³	25	59.75	6.410 × 10⁻³	24	123.38	5.318 × 10⁻³
(1024, 5)	524429	51.33	0.5	24	7.65	4.387 × 10⁻³	25	7.74	4.402 × 10⁻³	21	49.00	4.920 × 10⁻³	19	97.81	4.791 × 10⁻³
(1024, 5)	628736	61.59	0.6	19	5.25	4.452 × 10⁻³	19	4.93	4.452 × 10⁻³	19	40.69	5.035 × 10⁻³	14	73.53	5.177 × 10⁻³
(1024, 5)	734131	71.86	0.7	15	4.47	4.335 × 10⁻³	17	4.56	4.337 × 10⁻³	17	39.24	4.791 × 10⁻³	21	76.87	4.524 × 10⁻³
(1024, 5)	838476	82.12	0.8	12	3.64	4.444 × 10⁻³	14	3.97	4.446 × 10⁻³	19	39.52	1.184 × 10⁻³	15	54.21	4.479 × 10⁻³
(1024, 5)	944170	92.39	0.9	10	3.16	4.486 × 10⁻³	11	2.94	4.557 × 10⁻³>	10	21.29	4.591 × 10⁻³	10	36.21	4.531 × 10⁻³
(1024, 10)	210118	10.29	0.2	77	20.56	3.017 × 10⁻³	71	19.10	3.081 × 10⁻³	53	62.98	3.894 × 10⁻³	48	123.01	4.089 × 10⁻³
(1024, 10)	314614	15.44	0.3	54	15.21	3.089 × 10⁻³	52	15.27	3.119 × 10⁻³	40	44.45	3.842 × 10⁻³	41	85.27	3.816 × 10⁻³
(1024, 10)	420191	20.58	0.4	40	11.53	3.060 × 10⁻³	39	11.01	3.075 × 10⁻³	32	36.46	3.306 × 10⁻³	34	72.74	3.178 × 10⁻³
(1024, 10)	523405	25.73	0.5	30	7.59	3.087 × 10⁻³	31	7.70	3.097 × 10⁻³	52	52.59	3.087 × 10⁻³	28	54.28	3.647 × 10⁻³
(1024, 10)	628935	30.87	0.6	23	6.04	3.061 × 10⁻³	24	6.76	3.068 × 10⁻³	44	45.74	3.061 × 10⁻³	39	61.59	3.113 × 10⁻³
(1024, 10)	734096	36.02	0.7	18	5.17	3.090 × 10⁻³	19	4.97	3.095 × 10⁻³	32	33.06	3.134 × 10⁻³	47	72.07	3.116 × 10⁻³
(1024, 10)	838509	41.16	0.8	14	4.26	3.121 × 10⁻³	16	4.65	3.125 × 10⁻³	31	31.85	3.175 × 10⁻³	38	60.78	3.183 × 10⁻³
(1024, 10)	944068	46.31	0.9	12	3.76	3.093 × 10⁻³	12	3.54	3.096 × 10⁻³	34	34.64	3.216 × 10⁻³	33	45.15	3.095 × 10⁻³

Table 6. Comparisons of PA-AP, ADM, IADM-CG and IADM-BB for low-rank image recovery.

Name	sr	PA-AP			ADM			IADM-CG			IADM-BB
Name	sr	Time	RelErr	PSNR	Time	RelErr	PSNR	Time	RelErr	PSNR	Time	RelErr	PSNR
Lena	0.2	58.66	1.730 × 10⁻⁴	26.17	127.86	1.751 × 10⁻⁴	26.61	148.71	1.625 × 10⁻⁴	26.78	161.59	1.633 × 10⁻⁴	26.78
	0.4	46.09	3.984 × 10⁻⁵	84.67	94.65	4.578 × 10⁻⁵	85.52	94.86	1.191 × 10⁻⁴	76.06	110.74	1.244 × 10⁻⁴	75.58
	0.6	7.52	2.071 × 10⁻⁵	93.42	55.92	2.717 × 10⁻⁵	92.60	54.78	8.376 × 10⁻⁵	81.65	65.75	9.504 × 10⁻⁵	80.40
	0.8	3.41	1.025 × 10⁻⁵	101.59	37.32	1.771 × 10⁻⁵	99.04	33.46	4.374 × 10⁻⁵	89.62	39.09	6.070 × 10⁻⁵	86.65
Pirate	0.2	54.23	1.561 × 10⁻⁴	26.03	121.73	1.745 × 10⁻⁴	26.25	140.40	1.643 × 10⁻⁴	26.41	154.24	1.651 × 10⁻⁴	26.41
	0.4	38.58	3.961 × 10⁻⁵	85.42	92.33	4.905 × 10⁻⁵	86.24	94.57	1.140 × 10⁻⁴	76.98	112.77	8.735 × 10⁻⁵	79.81
	0.6	8.15	1.313 × 10⁻⁵	97.97	53.13	2.839 × 10⁻⁵	92.92	54.81	4.376 × 10⁻⁵	86.89	66.42	6.227 × 10⁻⁵	84.26
	0.8	3.45	8.451 × 10⁻⁶	104.65	33.83	1.967 × 10⁻⁵	99.05	29.41	6.704 × 10⁻⁵	91.70	36.86	1.001 × 10⁻⁵	88.64
Cameraman	0.2	50.51	1.717 × 10⁻⁴	24.10	124.54	1.965 × 10⁻⁴	24.32	145.18	2.159 × 10⁻⁴	24.43	159.92	2.167 × 10⁻⁴	24.43
	0.4	46.81	5.290 × 10⁻⁵	80.18	102.31	5.570 × 10⁻⁵	81.30	106.05	1.182 × 10⁻⁵	73.72	115.48	1.313 × 10⁻⁴	73.00
	0.6	10.03	2.389 × 10⁻⁵	90.92	59.87	3.090 × 10⁻⁵	90.36	58.43	6.099 × 10⁻⁵	83.07	71.95	5.343 × 10⁻⁵	84.40
	0.8	4.18	1.613 × 10⁻⁵	96.46	37.86	2.036 × 10⁻⁵	96.03	35.31	3.360 × 10⁻⁵	90.22	45.53	2.106 × 10⁻⁵	93.04

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, W.; Zheng, W.; Yu, G. A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization. Symmetry 2019, 11, 1277. https://doi.org/10.3390/sym11101277

AMA Style

Hu W, Zheng W, Yu G. A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization. Symmetry. 2019; 11(10):1277. https://doi.org/10.3390/sym11101277

Chicago/Turabian Style

Hu, Wenyu, Weidong Zheng, and Gaohang Yu. 2019. "A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization" Symmetry 11, no. 10: 1277. https://doi.org/10.3390/sym11101277

APA Style

Hu, W., Zheng, W., & Yu, G. (2019). A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization. Symmetry, 11(10), 1277. https://doi.org/10.3390/sym11101277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Unified Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization

Abstract

1. Introduction

2. Preliminaries

3. Proximity Algorithm with Adaptive Penalty

3.1. Proximity Algorithm

3.2. Adaptive Penalty

3.3. Convergence

4. Numerical Experiments

4.1. Nuclear Norm Minimization Problem

4.2. Matrix Completion

4.3. Low-Rank Image Recovery

5. Conclusions and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI