-
Subdifferentially polynomially bounded functions and Gaussian smoothing-based zeroth-order optimization
Authors:
Ming Lei,
Ting Kei Pong,
Shuqin Sun,
Man-Chung Yue
Abstract:
We introduce the class of subdifferentially polynomially bounded (SPB) functions, which is a rich class of locally Lipschitz functions that encompasses all Lipschitz functions, all gradient- or Hessian-Lipschitz functions, and even some non-smooth locally Lipschitz functions. We show that SPB functions are compatible with Gaussian smoothing (GS), in the sense that the GS of any SPB function is wel…
▽ More
We introduce the class of subdifferentially polynomially bounded (SPB) functions, which is a rich class of locally Lipschitz functions that encompasses all Lipschitz functions, all gradient- or Hessian-Lipschitz functions, and even some non-smooth locally Lipschitz functions. We show that SPB functions are compatible with Gaussian smoothing (GS), in the sense that the GS of any SPB function is well-defined and satisfies a descent lemma akin to gradient-Lipschitz functions, with the Lipschitz constant replaced by a polynomial function. Leveraging this descent lemma, we propose GS-based zeroth-order optimization algorithms with an adaptive stepsize strategy for constrained minimization of SPB functions, and analyze their iteration complexity. An important instrument in our analysis, which could be of independent interest, is the quantification of Goldstein stationarity via the GS gradient.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Tight error bounds for log-determinant cones without constraint qualifications
Authors:
Ying Lin,
Scott B. Lindstrom,
Bruno F. Lourenço,
Ting Kei Pong
Abstract:
In this paper, without requiring any constraint qualifications, we establish tight error bounds for the log-determinant cone, which is the closure of the hypograph of the perspective function of the log-determinant function. This error bound is obtained using the recently developed framework based on one-step facial residual functions.
In this paper, without requiring any constraint qualifications, we establish tight error bounds for the log-determinant cone, which is the closure of the hypograph of the perspective function of the log-determinant function. This error bound is obtained using the recently developed framework based on one-step facial residual functions.
△ Less
Submitted 10 April, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Kurdyka-Łojasiewicz exponent via Hadamard parametrization
Authors:
Wenqing Ouyang,
Yuncheng Liu,
Ting Kei Pong,
Hao Wang
Abstract:
We consider a class of $\ell_1$-regularized optimization problems and the associated smooth "over-parameterized" optimization problems built upon the Hadamard parametrization, or equivalently, the Hadamard difference parametrization (HDP). We characterize the set of second-order stationary points of the HDP-based model and show that they correspond to some stationary points of the corresponding…
▽ More
We consider a class of $\ell_1$-regularized optimization problems and the associated smooth "over-parameterized" optimization problems built upon the Hadamard parametrization, or equivalently, the Hadamard difference parametrization (HDP). We characterize the set of second-order stationary points of the HDP-based model and show that they correspond to some stationary points of the corresponding $\ell_1$-regularized model. More importantly, we show that the Kurdyka-Lojasiewicz (KL) exponent of the HDP-based model at a second-order stationary point can be inferred from that of the corresponding $\ell_1$-regularized model under suitable assumptions. Our assumptions are general enough to cover a wide variety of loss functions commonly used in $\ell_1$-regularized models, such as the least squares loss function and the logistic loss function. Since the KL exponents of many $\ell_1$-regularized models are explicitly known in the literature, our results allow us to leverage these known exponents to deduce the KL exponents at second-order stationary points of the corresponding HDP-based models, which were previously unknown. Finally, we demonstrate how these explicit KL exponents at second-order stationary points can be applied to deducing the explicit local convergence rate of a standard gradient descent method for minimizing the HDP-based model.
△ Less
Submitted 17 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
An extended sequential quadratic method with extrapolation
Authors:
Yongle Zhang,
Ting Kei Pong,
Shiqi Xu
Abstract:
We revisit and adapt the extended sequential quadratic method (ESQM) in [3] for solving a class of difference-of-convex optimization problems whose constraints are defined as the intersection of level sets of Lipschitz differentiable functions and a simple compact convex set. Particularly, for this class of problems, we develop a variant of ESQM, called ESQM with extrapolation (ESQM$_e$), which in…
▽ More
We revisit and adapt the extended sequential quadratic method (ESQM) in [3] for solving a class of difference-of-convex optimization problems whose constraints are defined as the intersection of level sets of Lipschitz differentiable functions and a simple compact convex set. Particularly, for this class of problems, we develop a variant of ESQM, called ESQM with extrapolation (ESQM$_e$), which incorporates Nesterov's extrapolation techniques for empirical acceleration. Under standard constraint qualifications, we show that the sequence generated by ESQM$_e$ clusters at a critical point if the extrapolation parameters are uniformly bounded above by a certain threshold. Convergence of the whole sequence and the convergence rate are established by assuming Kurdyka-Lojasiewicz (KL) property of a suitable potential function and imposing additional differentiability assumptions on the objective and constraint functions. In addition, when the objective and constraint functions are all convex, we show that linear convergence can be established if a certain exact penalty function is known to be a KL function with exponent $\frac12$; we also discuss how the KL exponent of such an exact penalty function can be deduced from that of the original extended objective (i.e., sum of the objective and the indicator function of the constraint set). Finally, we perform numerical experiments to demonstrate the empirical acceleration of ESQM$_{e}$ over a basic version of ESQM, and illustrate its effectiveness by comparing with the natural competing algorithm SCP$_{ls}$ from [35].
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees
Authors:
Chuan He,
Zhaosong Lu,
Ting Kei Pong
Abstract:
In this paper we consider finding a second-order stationary point (SOSP) of nonconvex equality constrained optimization when a nearly feasible point is known. In particular, we first propose a new Newton-CG method for finding an approximate SOSP of unconstrained optimization and show that it enjoys a substantially better complexity than the Newton-CG method [56]. We then propose a Newton-CG based…
▽ More
In this paper we consider finding a second-order stationary point (SOSP) of nonconvex equality constrained optimization when a nearly feasible point is known. In particular, we first propose a new Newton-CG method for finding an approximate SOSP of unconstrained optimization and show that it enjoys a substantially better complexity than the Newton-CG method [56]. We then propose a Newton-CG based augmented Lagrangian (AL) method for finding an approximate SOSP of nonconvex equality constrained optimization, in which the proposed Newton-CG method is used as a subproblem solver. We show that under a generalized linear independence constraint qualification (GLICQ), our AL method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-7/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of nonconvex equality constrained optimization with high probability, which are significantly better than the ones achieved by the proximal AL method [60]. Besides, we show that it has a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ when the GLICQ does not hold. To the best of our knowledge, all the complexity results obtained in this paper are new for finding an approximate SOSP of nonconvex equality constrained optimization with high probability. Preliminary numerical results also demonstrate the superiority of our proposed methods over the ones in [56,60].
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Convergence rate analysis of a Dykstra-type projection algorithm
Authors:
Xiaozhou Wang,
Ting Kei Pong
Abstract:
Given closed convex sets $C_i$, $i=1,\ldots,\ell$, and some nonzero linear maps $A_i$, $i = 1,\ldots,\ell$, of suitable dimensions, the multi-set split feasibility problem aims at finding a point in $\bigcap_{i=1}^\ell A_i^{-1}C_i$ based on computing projections onto $C_i$ and multiplications by $A_i$ and $A_i^T$. In this paper, we consider the associated best approximation problem, i.e., the prob…
▽ More
Given closed convex sets $C_i$, $i=1,\ldots,\ell$, and some nonzero linear maps $A_i$, $i = 1,\ldots,\ell$, of suitable dimensions, the multi-set split feasibility problem aims at finding a point in $\bigcap_{i=1}^\ell A_i^{-1}C_i$ based on computing projections onto $C_i$ and multiplications by $A_i$ and $A_i^T$. In this paper, we consider the associated best approximation problem, i.e., the problem of computing projections onto $\bigcap_{i=1}^\ell A_i^{-1}C_i$; we refer to this problem as the best approximation problem in multi-set split feasibility settings (BA-MSF). We adapt the Dykstra's projection algorithm, which is classical for solving the BA-MSF in the special case when all $A_i = I$, to solve the general BA-MSF. Our Dykstra-type projection algorithm is derived by applying (proximal) coordinate gradient descent to the Lagrange dual problem, and it only requires computing projections onto $C_i$ and multiplications by $A_i$ and $A_i^T$ in each iteration. Under a standard relative interior condition and a genericity assumption on the point we need to project, we show that the dual objective satisfies the Kurdyka-Lojasiewicz property with an explicitly computable exponent on a neighborhood of the (typically unbounded) dual solution set when each $C_i$ is $C^{1,α}$-cone reducible for some $α\in (0,1]$: this class of sets covers the class of $C^2$-cone reducible sets, which include all polyhedrons, second-order cone, and the cone of positive semidefinite matrices as special cases. Using this, explicit convergence rate (linear or sublinear) of the sequence generated by the Dykstra-type projection algorithm is derived. Concrete examples are constructed to illustrate the necessity of some of our assumptions.
△ Less
Submitted 2 September, 2023; v1 submitted 8 January, 2023;
originally announced January 2023.
-
Generalized power cones: optimal error bounds and automorphisms
Authors:
Ying Lin,
Scott B. Lindstrom,
Bruno F. Lourenço,
Ting Kei Pong
Abstract:
Error bounds are a requisite for trusting or distrusting solutions in an informed way. Until recently, provable error bounds in the absence of constraint qualifications were unattainable for many classes of cones that do not admit projections with known succinct expressions. We build such error bounds for the generalized power cones, using the recently developed framework of one-step facial residu…
▽ More
Error bounds are a requisite for trusting or distrusting solutions in an informed way. Until recently, provable error bounds in the absence of constraint qualifications were unattainable for many classes of cones that do not admit projections with known succinct expressions. We build such error bounds for the generalized power cones, using the recently developed framework of one-step facial residual functions. We also show that our error bounds are tight in the sense of that framework. Besides their utility for understanding solution reliability, the error bounds we discover have additional applications to the algebraic structure of the underlying cone, which we describe. In particular we use the error bounds to compute the dimension of the automorphism group for the generalized power cones, and to identify a set of generalized power cones that are self-dual, irreducible, nonhomogeneous, and perfect
△ Less
Submitted 22 October, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Doubly majorized algorithm for sparsity-inducing optimization problems with regularizer-compatible constraints
Authors:
Tianxiang Liu,
Ting Kei Pong,
Akiko Takeda
Abstract:
We consider a class of sparsity-inducing optimization problems whose constraint set is regularizer-compatible, in the sense that, the constraint set becomes easy-to-project-onto after a coordinate transformation induced by the sparsity-inducing regularizer. Our model is general enough to cover, as special cases, the ordered LASSO model and its variants with some commonly used nonconvex sparsity-in…
▽ More
We consider a class of sparsity-inducing optimization problems whose constraint set is regularizer-compatible, in the sense that, the constraint set becomes easy-to-project-onto after a coordinate transformation induced by the sparsity-inducing regularizer. Our model is general enough to cover, as special cases, the ordered LASSO model and its variants with some commonly used nonconvex sparsity-inducing regularizers. The presence of both the sparsity-inducing regularizer and the constraint set poses challenges on the design of efficient algorithms. In this paper, by exploiting absolute-value symmetry and other properties in the sparsity-inducing regularizer, we propose a new algorithm, called the Doubly Majorized Algorithm (DMA), for this class of problems. The DMA makes use of projections onto the constraint set after the coordinate transformation in each iteration, and hence can be performed efficiently. Without invoking any commonly used constraint qualification conditions such as those based on horizon subdifferentials, we show that any accumulation point of the sequence generated by DMA is a so-called $ψ_{\rm opt}$-stationary point, a new notion of stationarity we define as inspired by the notion of $L$-stationarity. We also show that any global minimizer of our model has to be a $ψ_{\rm opt}$-stationary point, again without imposing any constraint qualification conditions. Finally, we illustrate numerically the performance of DMA on solving variants of ordered LASSO with nonconvex regularizers.
△ Less
Submitted 8 March, 2023; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Doubly iteratively reweighted algorithm for constrained compressed sensing models
Authors:
Shuqin Sun,
Ting Kei Pong
Abstract:
We propose a new algorithmic framework for constrained compressed sensing models that admit nonconvex sparsity-inducing regularizers including the log-penalty function as objectives, and nonconvex loss functions such as the Cauchy loss function and the Tukey biweight loss function in the constraint. Our framework employs iteratively reweighted $\ell_1$ and $\ell_2$ schemes to construct subproblems…
▽ More
We propose a new algorithmic framework for constrained compressed sensing models that admit nonconvex sparsity-inducing regularizers including the log-penalty function as objectives, and nonconvex loss functions such as the Cauchy loss function and the Tukey biweight loss function in the constraint. Our framework employs iteratively reweighted $\ell_1$ and $\ell_2$ schemes to construct subproblems that can be efficiently solved by well-developed solvers for basis pursuit denoising such as SPGL1 [6]. We propose a new termination criterion for the subproblem solvers that allows them to return an infeasible solution, with a suitably constructed feasible point satisfying a descent condition. The feasible point construction step is the key for establishing the well-definedness of our proposed algorithm, and we also prove that any accumulation point of this sequence of feasible points is a stationary point of the constrained compressed sensing model, under suitable assumptions. Finally, we compare numerically our algorithm (with subproblems solved by SPGL1 or the alternating direction method of multipliers) against the SCP$_{\rm ls}$ in [41] on solving constrained compressed sensing models with the log-penalty function as the objective and the Cauchy loss function in the constraint, for badly-scaled measurement matrices. Our computational results show that our approaches return solutions with better recovery errors, and are always faster.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Frank-Wolfe-type methods for a class of nonconvex inequality-constrained problems
Authors:
Liaoyuan Zeng,
Yongle Zhang,
Guoyin Li,
Ting Kei Pong,
Xiaozhou Wang
Abstract:
The Frank-Wolfe (FW) method, which implements efficient linear oracles that minimize linear approximations of the objective function over a fixed compact convex set, has recently received much attention in the optimization and machine learning literature. In this paper, we propose a new FW-type method for minimizing a smooth function over a compact set defined as the level set of a single differen…
▽ More
The Frank-Wolfe (FW) method, which implements efficient linear oracles that minimize linear approximations of the objective function over a fixed compact convex set, has recently received much attention in the optimization and machine learning literature. In this paper, we propose a new FW-type method for minimizing a smooth function over a compact set defined as the level set of a single difference-of-convex function, based on new generalized linear-optimization oracles (LO). We show that these LOs can be computed efficiently with closed-form solutions in some important optimization models that arise in compressed sensing and machine learning. In addition, under a mild strict feasibility condition, we establish the subsequential convergence of our nonconvex FW-type method. Since the feasible region of our generalized LO typically changes from iteration to iteration, our convergence analysis is completely different from those existing works in the literature on FW-type methods that deal with fixed feasible regions among subproblems. Finally, motivated by the away steps for accelerating FW-type methods for convex problems, we further design an away-step oracle to supplement our nonconvex FW-type method, and establish subsequential convergence of this variant. Numerical results on the matrix completion problem with standard datasets are presented to demonstrate the efficiency of the proposed FW-type method and its away-step variant.
△ Less
Submitted 18 January, 2024; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Optimal error bounds in the absence of constraint qualifications with applications to the $p$-cones and beyond
Authors:
Scott B. Lindstrom,
Bruno F. Lourenço,
Ting Kei Pong
Abstract:
We prove tight Hölderian error bounds for all $p$-cones. Surprisingly, the exponents differ in several ways from those that have been previously conjectured; moreover, they illuminate $p$-cones as a curious example of a class of objects that possess properties in 3 dimensions that they do not in 4 or more. Using our error bounds, we analyse least squares problems with $p$-norm regularization, wher…
▽ More
We prove tight Hölderian error bounds for all $p$-cones. Surprisingly, the exponents differ in several ways from those that have been previously conjectured; moreover, they illuminate $p$-cones as a curious example of a class of objects that possess properties in 3 dimensions that they do not in 4 or more. Using our error bounds, we analyse least squares problems with $p$-norm regularization, where our results enable us to compute the corresponding KL exponents for previously inaccessible values of $p$. Another application is a (relatively) simple proof that most $p$-cones are neither self-dual nor homogeneous. Our error bounds are obtained under the framework of facial residual functions, and we expand it by establishing for general cones an optimality criterion under which the resulting error bound must be tight.
△ Less
Submitted 27 March, 2024; v1 submitted 23 September, 2021;
originally announced September 2021.
-
$ρ$-regularization subproblems: Strong duality and an eigensolver-based algorithm
Authors:
Liaoyuan Zeng,
Ting Kei Pong
Abstract:
Trust-region (TR) type method, based on a quadratic model such as the trust-region subproblem (TRS) and $ p $-regularization subproblem ($p$RS), is arguably one of the most successful methods for unconstrained minimization. In this paper, we study a general regularized subproblem (named $ ρ$RS), which covers TRS and $p$RS as special cases. We derive a strong duality theorem for $ ρ$RS, and also it…
▽ More
Trust-region (TR) type method, based on a quadratic model such as the trust-region subproblem (TRS) and $ p $-regularization subproblem ($p$RS), is arguably one of the most successful methods for unconstrained minimization. In this paper, we study a general regularized subproblem (named $ ρ$RS), which covers TRS and $p$RS as special cases. We derive a strong duality theorem for $ ρ$RS, and also its necessary and sufficient optimality condition under general assumptions on the regularization term. We then define the Rendl-Wolkowicz (RW) dual problem of $ ρ$RS, which is a maximization problem whose objective function is concave, and differentiable except possibly at two points. It is worth pointing out that our definition is based on an alternative derivation of the RW-dual problem for TRS. Then we propose an eigensolver-based algorithm for solving the RW-dual problem of $ ρ$RS. The algorithm is carried out by finding the smallest eigenvalue and its unit eigenvector of a certain matrix in each iteration. Finally, we present numerical results on randomly generated $p$RS's, and on a new class of regularized problem that combines TRS and $p$RS, to illustrate our algorithm.
△ Less
Submitted 4 September, 2021;
originally announced September 2021.
-
Retraction-based first-order feasible methods for difference-of-convex programs with smooth inequality and simple geometric constraints
Authors:
Yongle Zhang,
Guoyin Li,
Ting Kei Pong,
Shiqi Xu
Abstract:
In this paper, we propose first-order feasible methods for difference-of-convex (DC) programs with smooth inequality and simple geometric constraints. Our strategy for maintaining feasibility of the iterates is based on a "retraction" idea adapted from the literature of manifold optimization. When the constraints are convex, we establish the global subsequential convergence of the sequence generat…
▽ More
In this paper, we propose first-order feasible methods for difference-of-convex (DC) programs with smooth inequality and simple geometric constraints. Our strategy for maintaining feasibility of the iterates is based on a "retraction" idea adapted from the literature of manifold optimization. When the constraints are convex, we establish the global subsequential convergence of the sequence generated by our algorithm under strict feasibility condition, and analyze its convergence rate when the objective is in addition convex according to the Kurdyka-Lojasiewicz (KL) exponent of the extended objective (i.e., sum of the objective and the indicator function of the constraint set). We also show that the extended objective of a large class of Euclidean norm (and more generally, group LASSO penalty) regularized convex optimization problems is a KL function with exponent $\frac12$; consequently, our algorithm is locally linearly convergent when applied to these problems. We then extend our method to solve DC programs with a single specially structured nonconvex constraint. Finally, we discuss how our algorithms can be applied to solve two concrete optimization problems, namely, group-structured compressed sensing problems with Gaussian measurement noise and compressed sensing problems with Cauchy measurement noise, and illustrate the empirical performance of our algorithms.
△ Less
Submitted 2 December, 2022; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Error bounds, facial residual functions and applications to the exponential cone
Authors:
Scott B. Lindstrom,
Bruno F. Lourenço,
Ting Kei Pong
Abstract:
We construct a general framework for deriving error bounds for conic feasibility problems. In particular, our approach allows one to work with cones that fail to be amenable or even to have computable projections, two previously challenging barriers. For the purpose, we first show how error bounds may be constructed using objects called one-step facial residual functions. Then, we develop several…
▽ More
We construct a general framework for deriving error bounds for conic feasibility problems. In particular, our approach allows one to work with cones that fail to be amenable or even to have computable projections, two previously challenging barriers. For the purpose, we first show how error bounds may be constructed using objects called one-step facial residual functions. Then, we develop several tools to compute these facial residual functions even in the absence of closed form expressions for the projections onto the cones. We demonstrate the use and power of our results by computing tight error bounds for the exponential cone feasibility problem. Interestingly, we discover a natural example for which the tightest error bound is related to the Boltzmann-Shannon entropy. We were also able to produce an example of sets for which a Hölderian error bound holds but the supremum of the set of admissible exponents is not itself an admissible exponent.
△ Less
Submitted 9 March, 2022; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Analysis and algorithms for some compressed sensing models based on L1/L2 minimization
Authors:
Liaoyuan Zeng,
Peiran Yu,
Ting Kei Pong
Abstract:
Recently, in a series of papers [32,38,39,41], the ratio of $\ell_1$ and $\ell_2$ norms was proposed as a sparsity inducing function for noiseless compressed sensing. In this paper, we further study properties of such model in the noiseless setting, and propose an algorithm for minimizing $\ell_1$/$\ell_2$ subject to noise in the measurements. Specifically, we show that the extended objective func…
▽ More
Recently, in a series of papers [32,38,39,41], the ratio of $\ell_1$ and $\ell_2$ norms was proposed as a sparsity inducing function for noiseless compressed sensing. In this paper, we further study properties of such model in the noiseless setting, and propose an algorithm for minimizing $\ell_1$/$\ell_2$ subject to noise in the measurements. Specifically, we show that the extended objective function (the sum of the objective and the indicator function of the constraint set) of the model in [32] satisfies the Kurdyka-Lojasiewicz (KL) property with exponent 1/2; this allows us to establish linear convergence of the algorithm proposed in [39, Eq. 11] under mild assumptions. We next extend the $\ell_1$/$\ell_2$ model to handle compressed sensing problems with noise. We establish the solution existence for some of these models under the spherical section property [37,44], and extend the algorithm in [39, Eq. 11] by incorporating moving-balls-approximation techniques [4] for solving these problems. We prove the subsequential convergence of our algorithm under mild conditions, and establish global convergence of the whole sequence generated by our algorithm by imposing additional KL and differentiability assumptions on a specially constructed potential function. Finally, we perform numerical experiments on robust compressed sensing and basis pursuit denoising with residual error measured by $ \ell_2 $ norm or Lorentzian norm via solving the corresponding $\ell_1$/$\ell_2$ models by our algorithm. Our numerical simulations show that our algorithm is able to recover the original sparse vectors with reasonable accuracy.
△ Less
Submitted 11 February, 2021; v1 submitted 24 July, 2020;
originally announced July 2020.
-
Convergence rate analysis of a sequential convex programming method with line search for a class of constrained difference-of-convex optimization problems
Authors:
Peiran Yu,
Ting Kei Pong,
Zhaosong Lu
Abstract:
In this paper, we study the sequential convex programming method with monotone line search (SCP$_{ls}$) in [46] for a class of difference-of-convex (DC) optimization problems with multiple smooth inequality constraints. The SCP$_{ls}$ is a representative variant of moving-ball-approximation-type algorithms [6,10,13,54] for constrained optimization problems. We analyze the convergence rate of the s…
▽ More
In this paper, we study the sequential convex programming method with monotone line search (SCP$_{ls}$) in [46] for a class of difference-of-convex (DC) optimization problems with multiple smooth inequality constraints. The SCP$_{ls}$ is a representative variant of moving-ball-approximation-type algorithms [6,10,13,54] for constrained optimization problems. We analyze the convergence rate of the sequence generated by SCP$_{ls}$ in both nonconvex and convex settings by imposing suitable Kurdyka-Lojasiewicz (KL) assumptions. Specifically, in the nonconvex settings, we assume that a special potential function related to the objective and the constraints is a KL function, while in the convex settings we impose KL assumptions directly on the extended objective function (i.e., sum of the objective and the indicator function of the constraint set). A relationship between these two different KL assumptions is established in the convex settings under additional differentiability assumptions. We also discuss how to deduce the KL exponent of the extended objective function from its Lagrangian in the convex settings, under additional assumptions on the constraint functions. Thanks to this result, the extended objectives of some constrained optimization models such as minimizing $\ell_1$ subject to logistic/Poisson loss are found to be KL functions with exponent $\frac12$ under mild assumptions. To illustrate how our results can be applied, we consider SCP$_{ls}$ for minimizing $\ell_{1-2}$ [60] subject to residual error measured by $\ell_2$ norm/Lorentzian norm [21]. We first discuss how the various conditions required in our analysis can be verified, and then perform numerical experiments to illustrate the convergence behaviors of SCP$_{ls}$.
△ Less
Submitted 11 May, 2021; v1 submitted 20 January, 2020;
originally announced January 2020.
-
A hybrid penalty method for a class of optimization problems with multiple rank constraints
Authors:
Tianxiang Liu,
Ivan Markovsky,
Ting Kei Pong,
Akiko Takeda
Abstract:
In this paper, we consider the problem of minimizing a smooth objective over multiple rank constraints on Hankel-structured matrices. This kind of problems arises in system identification, system theory and signal processing, where the rank constraints are typically "hard constraints". To solve these problems, we propose a hybrid penalty method that combines a penalty method with a post-processing…
▽ More
In this paper, we consider the problem of minimizing a smooth objective over multiple rank constraints on Hankel-structured matrices. This kind of problems arises in system identification, system theory and signal processing, where the rank constraints are typically "hard constraints". To solve these problems, we propose a hybrid penalty method that combines a penalty method with a post-processing scheme. Specifically, we solve the penalty subproblems until the penalty parameter reaches a given threshold, and then switch to a local alternating "pseudo-projection'' method to further reduce constraint violation. Pseudo-projection is a generalization of the concept of projection. We show that a pseudo-projection onto a {\em single} low-rank Hankel-structured matrix constraint can be computed efficiently by existing softwares such as SLRA (Markovsky and Usevich, 2014), under mild assumptions. We also demonstrate how the penalty subproblems in the hybrid penalty method can be solved by pseudo-projection-based optimization methods, and then present some convergence results for our hybrid penalty method. Finally, the efficiency of our method is illustrated by numerical examples.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
A difference-of-convex approach for split feasibility with applications to matrix factorizations and outlier detection
Authors:
Chen Chen,
Ting Kei Pong,
Lulin Tan,
Liaoyuan Zeng
Abstract:
The split feasibility problem is to find an element in the intersection of a closed set $C$ and the linear preimage of another closed set $D$, assuming the projections onto $C$ and $D$ are easy to compute. This class of problems arises naturally in many contemporary applications such as compressed sensing. While the sets $C$ and $D$ are typically assumed to be convex in the literature, in this pap…
▽ More
The split feasibility problem is to find an element in the intersection of a closed set $C$ and the linear preimage of another closed set $D$, assuming the projections onto $C$ and $D$ are easy to compute. This class of problems arises naturally in many contemporary applications such as compressed sensing. While the sets $C$ and $D$ are typically assumed to be convex in the literature, in this paper, we allow both sets to be possibly nonconvex. We observe that, in this setting, the split feasibility problem can be formulated as an optimization problem with a difference-of-convex objective so that standard majorization-minimization type algorithms can be applied. Here we focus on the nonmonotone proximal gradient algorithm with majorization studied in [15, Appendix A]. We show that, when this algorithm is applied to a split feasibility problem, the sequence generated clusters at a stationary point of the problem under mild assumptions. We also study local convergence property of the sequence under suitable assumptions on the closed sets involved. Finally, we perform numerical experiments to illustrate the efficiency of our approach on solving split feasibility problems that arise in completely positive matrix factorization, (uniformly) sparse matrix factorization, and outlier detection.
△ Less
Submitted 4 November, 2020; v1 submitted 4 March, 2019;
originally announced March 2019.
-
Kurdyka-Łojasiewicz exponent via inf-projection
Authors:
Peiran Yu,
Guoyin Li,
Ting Kei Pong
Abstract:
Kurdyka-Lojasiewicz (KL) exponent plays an important role in estimating the convergence rate of many contemporary first-order methods. In particular, a KL exponent of $\frac12$ for a suitable potential function is related to local linear convergence. Nevertheless, KL exponent is in general extremely hard to estimate. In this paper, we show under mild assumptions that KL exponent is preserved via i…
▽ More
Kurdyka-Lojasiewicz (KL) exponent plays an important role in estimating the convergence rate of many contemporary first-order methods. In particular, a KL exponent of $\frac12$ for a suitable potential function is related to local linear convergence. Nevertheless, KL exponent is in general extremely hard to estimate. In this paper, we show under mild assumptions that KL exponent is preserved via inf-projection. Inf-projection is a fundamental operation that is ubiquitous when reformulating optimization problems via the lift-and-project approach. By studying its operation on KL exponent, we show that the KL exponent is $\frac12$ for several important convex optimization models, including some semidefinite-programming-representable functions and some functions that involve $C^2$-cone reducible structures, under conditions such as strict complementarity. Our results are applicable to concrete optimization models such as group fused Lasso and overlapping group Lasso. In addition, for nonconvex models, we show that the KL exponent of many difference-of-convex functions can be derived from that of their natural majorant functions, and the KL exponent of the Bregman envelope of a function is the same as that of the function itself. Finally, we estimate the KL exponent of the sum of the least squares function and the indicator function of the set of matrices of rank at most $k$.
△ Less
Submitted 13 January, 2021; v1 submitted 10 February, 2019;
originally announced February 2019.
-
Polar Convolution
Authors:
Michael P. Friedlander,
Ives Macêdo,
Ting Kei Pong
Abstract:
The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for convex optimization. This paper develops the theory for an analogous convolution operation, called the polar envelope, specialized to gauge functions. Many important properties of the Moreau envelope and the proximal map are m…
▽ More
The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for convex optimization. This paper develops the theory for an analogous convolution operation, called the polar envelope, specialized to gauge functions. Many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, which play important roles in duality and in the construction of algorithms for gauge optimization. A suite of tools with which to build algorithms for this family of optimization problems is thus established.
△ Less
Submitted 3 February, 2019; v1 submitted 21 August, 2018;
originally announced August 2018.
-
Inner approximating the completely positive cone via the cone of scaled diagonally dominant matrices
Authors:
João Gouveia,
Ting Kei Pong,
Mina Saee
Abstract:
Motivated by the expressive power of completely positive programming to encode hard optimization problems, many approximation schemes for the completely positive cone have been proposed and successfully used. Most schemes are based on outer approximations, with the only inner approximations available being linear programming based methods proposed by Bundfuss and Dür and also Yıldırım, and a semid…
▽ More
Motivated by the expressive power of completely positive programming to encode hard optimization problems, many approximation schemes for the completely positive cone have been proposed and successfully used. Most schemes are based on outer approximations, with the only inner approximations available being linear programming based methods proposed by Bundfuss and Dür and also Yıldırım, and a semidefinite programming based method proposed by Lasserre. In this paper, we propose the use of the cone of nonnegative scaled diagonally dominant matrices as a natural inner approximation to the completely positive cone. Using projections of this cone we derive new graph-based second-order cone approximation schemes for completely positive programming, leading to both uniform and problem-dependent hierarchies. This offers a compromise between the expressive power of semidefinite programming and the speed of linear programming based approaches. Numerical results on random problems, standard quadratic programs and the stable set problem are presented to illustrate the effectiveness of our approach.
△ Less
Submitted 4 October, 2019; v1 submitted 1 July, 2018;
originally announced July 2018.
-
A subgradient-based approach for finding the maximum feasible subsystem with respect to a set
Authors:
Minglu Ye,
Ting Kei Pong
Abstract:
We propose a subgradient-based method for finding the maximum feasible subsystem in a collection of closed sets with respect to a given closed set $C$ (MFS$_C$). In this method, we reformulate the MFS$_C$ problem as an $\ell_0$ optimization problem and construct a sequence of continuous optimization problems to approximate it. The objective of each approximation problem is the sum of the compositi…
▽ More
We propose a subgradient-based method for finding the maximum feasible subsystem in a collection of closed sets with respect to a given closed set $C$ (MFS$_C$). In this method, we reformulate the MFS$_C$ problem as an $\ell_0$ optimization problem and construct a sequence of continuous optimization problems to approximate it. The objective of each approximation problem is the sum of the composition of a nonnegative nondecreasing continuously differentiable concave function with the squared distance function to a closed set. Although this objective function is nonsmooth in general, a subgradient can be obtained in terms of the projections onto the closed sets. Based on this observation, we adapt a subgradient projection method to solve these approximation problems. Unlike classical subgradient methods, the convergence (clustering to stationary points) of our subgradient method is guaranteed with a {\em nondiminishing stepsize} under mild assumptions. This allows us to further study the sequential convergence of the subgradient method under suitable Kurdyka-Łojasiewicz assumptions. Finally, we illustrate our algorithm numerically for solving the MFS$_C$ problems on a collection of halfspaces and a collection of unions of halfspaces, respectively, with respect to the set of $s$-sparse vectors.
△ Less
Submitted 8 May, 2018;
originally announced May 2018.
-
A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection
Authors:
Tianxiang Liu,
Ting Kei Pong,
Akiko Takeda
Abstract:
We consider the problem of minimizing a difference-of-convex (DC) function, which can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous possibly nonsmooth concave function. We refine the convergence analysis in [38] for the proximal DC algorithm with extrapolation (pDCA$_e$) and show that the whole sequence generated by the…
▽ More
We consider the problem of minimizing a difference-of-convex (DC) function, which can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous possibly nonsmooth concave function. We refine the convergence analysis in [38] for the proximal DC algorithm with extrapolation (pDCA$_e$) and show that the whole sequence generated by the algorithm is convergent when the objective is level-bounded, {\em without} imposing differentiability assumptions in the concave part. Our analysis is based on a new potential function and we assume such a function is a Kurdyka-Łojasiewicz (KL) function. We also establish a relationship between our KL assumption and the one used in [38]. Finally, we demonstrate how the pDCA$_e$ can be applied to a class of simultaneous sparse recovery and outlier detection problems arising from robust compressed sensing in signal processing and least trimmed squares regression in statistics. Specifically, we show that the objectives of these problems can be written as level-bounded DC functions whose concave parts are {\em typically nonsmooth}. Moreover, for a large class of loss functions and regularizers, the KL exponent of the corresponding potential function are shown to be 1/2, which implies that the pDCA$_e$ is locally linearly convergent when applied to these problems. Our numerical experiments show that the pDCA$_e$ usually outperforms the proximal DC algorithm with nonmonotone linesearch [24, Appendix A] in both CPU time and solution quality for this particular application.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
Iteratively reweighted $\ell_1$ algorithms with extrapolation
Authors:
Peiran Yu,
Ting Kei Pong
Abstract:
Iteratively reweighted $\ell_1$ algorithm is a popular algorithm for solving a large class of optimization problems whose objective is the sum of a Lipschitz differentiable loss function and a possibly nonconvex sparsity inducing regularizer. In this paper, motivated by the success of extrapolation techniques in accelerating first-order methods, we study how widely used extrapolation techniques su…
▽ More
Iteratively reweighted $\ell_1$ algorithm is a popular algorithm for solving a large class of optimization problems whose objective is the sum of a Lipschitz differentiable loss function and a possibly nonconvex sparsity inducing regularizer. In this paper, motivated by the success of extrapolation techniques in accelerating first-order methods, we study how widely used extrapolation techniques such as those in [4,5,22,28] can be incorporated to possibly accelerate the iteratively reweighted $\ell_1$ algorithm. We consider three versions of such algorithms. For each version, we exhibit an explicitly checkable condition on the extrapolation parameters so that the sequence generated provably clusters at a stationary point of the optimization problem. We also investigate global convergence under additional Kurdyka-$Ł$ojasiewicz assumptions on certain potential functions. Our numerical experiments show that our algorithms usually outperform the general iterative shrinkage and thresholding algorithm in [21] and an adaptation of the iteratively reweighted $\ell_1$ algorithm in [23, Algorithm 7] with nonmonotone line-search for solving random instances of log penalty regularized least squares problems in terms of both CPU time and solution quality.
△ Less
Submitted 18 November, 2017; v1 submitted 21 October, 2017;
originally announced October 2017.
-
A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems
Authors:
Tianxiang Liu,
Ting Kei Pong,
Akiko Takeda
Abstract:
We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proximal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for…
▽ More
We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proximal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proximal mapping can be hard to compute so that standard first-order methods such as the proximal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-convex approximation method for solving this kind of problems. In this algorithm, we approximate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous {\em difference-of-convex} functions, we can then approximately minimize the approximation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proximal mapping of {\em each} nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconvex fused regularized optimization problems and simultaneously structured matrix optimization problems, and illustrate the performance numerically for these two specific applications.
△ Less
Submitted 26 May, 2018; v1 submitted 16 October, 2017;
originally announced October 2017.
-
A Non-monotone Alternating Updating Method for A Class of Matrix Factorization Problems
Authors:
Lei Yang,
Ting Kei Pong,
Xiaojun Chen
Abstract:
In this paper we consider a general matrix factorization model which covers a large class of existing models with many applications in areas such as machine learning and imaging sciences. To solve this possibly nonconvex, nonsmooth and non-Lipschitz problem, we develop a non-monotone alternating updating method based on a potential function. Our method essentially updates two blocks of variables i…
▽ More
In this paper we consider a general matrix factorization model which covers a large class of existing models with many applications in areas such as machine learning and imaging sciences. To solve this possibly nonconvex, nonsmooth and non-Lipschitz problem, we develop a non-monotone alternating updating method based on a potential function. Our method essentially updates two blocks of variables in turn by inexactly minimizing this potential function, and updates another auxiliary block of variables using an explicit formula. The special structure of our potential function allows us to take advantage of efficient computational strategies for non-negative matrix factorization to perform the alternating minimization over the two blocks of variables. A suitable line search criterion is also incorporated to improve the numerical performance. Under some mild conditions, we show that the line search criterion is well defined, and establish that the sequence generated is bounded and any cluster point of the sequence is a stationary point. Finally, we conduct some numerical experiments using real datasets to compare our method with some existing efficient methods for non-negative matrix factorization and matrix completion. The numerical results show that our method can outperform these methods for these specific applications.
△ Less
Submitted 14 May, 2018; v1 submitted 18 May, 2017;
originally announced May 2017.
-
A proximal difference-of-convex algorithm with extrapolation
Authors:
Bo Wen,
Xiaojun Chen,
Ting Kei Pong
Abstract:
We consider a class of difference-of-convex (DC) optimization problems whose objective is level-bounded and is the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous concave function. While this kind of problems can be solved by the classical difference-of-convex algorithm (DCA) [26], the difficulty of the subproblems of this algorithm depends…
▽ More
We consider a class of difference-of-convex (DC) optimization problems whose objective is level-bounded and is the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous concave function. While this kind of problems can be solved by the classical difference-of-convex algorithm (DCA) [26], the difficulty of the subproblems of this algorithm depends heavily on the choice of DC decomposition. Simpler subproblems can be obtained by using a specific DC decomposition described in [27]. This decomposition has been proposed in numerous work such as [18], and we refer to the resulting DCA as the proximal DCA. Although the subproblems are simpler, the proximal DCA is the same as the proximal gradient algorithm when the concave part of the objective is void, and hence is potentially slow in practice. In this paper, motivated by the extrapolation techniques for accelerating the proximal gradient algorithm in the convex settings, we consider a proximal difference-of-convex algorithm with extrapolation to possibly accelerate the proximal DCA. We show that any cluster point of the sequence generated by our algorithm is a stationary point of the DC optimization problem for a fairly general choice of extrapolation parameters: in particular, the parameters can be chosen as in FISTA with fixed restart [15]. In addition, by assuming the Kurdyka-Łojasiewicz property of the objective and the differentiability of the concave part, we establish global convergence of the sequence generated by our algorithm and analyze its convergence rate. Our numerical experiments on two difference-of-convex regularized least squares models show that our algorithm usually outperforms the proximal DCA and the general iterative shrinkage and thresholding algorithm proposed in [17].
△ Less
Submitted 21 June, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Further properties of the forward-backward envelope with applications to difference-of-convex programming
Authors:
Tianxiang Liu,
Ting Kei Pong
Abstract:
In this paper, we further study the forward-backward envelope first introduced in [28] and [30] for problems whose objective is the sum of a proper closed convex function and a twice continuously differentiable possibly nonconvex function with Lipschitz continuous gradient. We derive sufficient conditions on the original problem for the corresponding forward-backward envelope to be a level-bounded…
▽ More
In this paper, we further study the forward-backward envelope first introduced in [28] and [30] for problems whose objective is the sum of a proper closed convex function and a twice continuously differentiable possibly nonconvex function with Lipschitz continuous gradient. We derive sufficient conditions on the original problem for the corresponding forward-backward envelope to be a level-bounded and Kurdyka-Łojasiewicz function with an exponent of $\frac12$; these results are important for the efficient minimization of the forward-backward envelope by classical optimization algorithms. In addition, we demonstrate how to minimize some difference-of-convex regularized least squares problems by minimizing a suitably constructed forward-backward envelope. Our preliminary numerical results on randomly generated instances of large-scale $\ell_{1-2}$ regularized least squares problems [37] illustrate that an implementation of this approach with a limited-memory BFGS scheme usually outperforms standard first-order methods such as the nonmonotone proximal gradient method in [35].
△ Less
Submitted 18 October, 2016; v1 submitted 30 April, 2016;
originally announced May 2016.
-
Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods
Authors:
Guoyin Li,
Ting Kei Pong
Abstract:
In this paper, we study the Kurdyka-Łojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo-Tseng error bound together with a mi…
▽ More
In this paper, we study the Kurdyka-Łojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo-Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is $\frac12$. The Luo-Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function's KL exponent is $\frac12$. This includes the least squares problem with smoothly clipped absolute deviation (SCAD) regularization or minimax concave penalty (MCP) regularization and the logistic regression problem with $\ell_1$ regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step-sizes for some specific models that arise in sparse recovery.
△ Less
Submitted 29 August, 2021; v1 submitted 9 February, 2016;
originally announced February 2016.
-
Linear Convergence of Proximal Gradient Algorithm with Extrapolation for a Class of Nonconvex Nonsmooth Minimization Problems
Authors:
Bo Wen,
Xiaojun Chen,
Ting Kei Pong
Abstract:
In this paper, we study the proximal gradient algorithm with extrapolation for minimizing the sum of a Lipschitz differentiable function and a proper closed convex function. Under the error bound condition used in [19] for analyzing the convergence of the proximal gradient algorithm, we show that there exists a threshold such that if the extrapolation coefficients are chosen below this threshold,…
▽ More
In this paper, we study the proximal gradient algorithm with extrapolation for minimizing the sum of a Lipschitz differentiable function and a proper closed convex function. Under the error bound condition used in [19] for analyzing the convergence of the proximal gradient algorithm, we show that there exists a threshold such that if the extrapolation coefficients are chosen below this threshold, then the sequence generated converges $R$-linearly to a stationary point of the problem. Moreover, the corresponding sequence of objective values is also $R$-linearly convergent. In addition, the threshold reduces to $1$ for convex problems and, as a consequence, we obtain the $R$-linear convergence of the sequence generated by FISTA with fixed restart. Finally, we present some numerical experiments to illustrate our results.
△ Less
Submitted 31 July, 2016; v1 submitted 31 December, 2015;
originally announced December 2015.
-
Peaceman-Rachford splitting for a class of nonconvex optimization problems
Authors:
Guoyin Li,
Tianxiang Liu,
Ting Kei Pong
Abstract:
We study the applicability of the Peaceman-Rachford (PR) splitting method for solving nonconvex optimization problems. When applied to minimizing the sum of a strongly convex Lipschitz differentiable function and a proper closed function, we show that if the strongly convex function has a large enough strong convexity modulus and the step-size parameter is chosen below a threshold that is computab…
▽ More
We study the applicability of the Peaceman-Rachford (PR) splitting method for solving nonconvex optimization problems. When applied to minimizing the sum of a strongly convex Lipschitz differentiable function and a proper closed function, we show that if the strongly convex function has a large enough strong convexity modulus and the step-size parameter is chosen below a threshold that is computable, then any cluster point of the sequence generated, if exists, will give a stationary point of the optimization problem. We also give sufficient conditions guaranteeing boundedness of the sequence generated. We then discuss one way to split the objective so that the proposed method can be suitably applied to solving optimization problems with a coercive objective that is the sum of a (not necessarily strongly) convex Lipschitz differentiable function and a proper closed function; this setting covers a large class of nonconvex feasibility problems and constrained least squares problems. Finally, we illustrate the proposed algorithm numerically.
△ Less
Submitted 9 January, 2017; v1 submitted 3 July, 2015;
originally announced July 2015.
-
Alternating Direction Method of Multipliers for A Class of Nonconvex and Nonsmooth Problems with Applications to Background/Foreground Extraction
Authors:
Lei Yang,
Ting Kei Pong,
Xiaojun Chen
Abstract:
In this paper, we study a general optimization model, which covers a large class of existing models for many applications in imaging sciences. To solve the resulting possibly nonconvex, nonsmooth and non-Lipschitz optimization problem, we adapt the alternating direction method of multipliers (ADMM) with a general dual step-size to solve a reformulation that contains three blocks of variables, and…
▽ More
In this paper, we study a general optimization model, which covers a large class of existing models for many applications in imaging sciences. To solve the resulting possibly nonconvex, nonsmooth and non-Lipschitz optimization problem, we adapt the alternating direction method of multipliers (ADMM) with a general dual step-size to solve a reformulation that contains three blocks of variables, and analyze its convergence. We show that for any dual step-size less than the golden ratio, there exists a computable threshold such that if the penalty parameter is chosen above such a threshold and the sequence thus generated by our ADMM is bounded, then the cluster point of the sequence gives a stationary point of the nonconvex optimization problem. We achieve this via a potential function specifically constructed for our ADMM. Moreover, we establish the global convergence of the whole sequence if, in addition, this special potential function is a Kurdyka-Łojasiewicz function. Furthermore, we present a simple strategy for initializing the algorithm to guarantee boundedness of the sequence. Finally, we perform numerical experiments comparing our ADMM with the proximal alternating linearized minimization (PALM) proposed in [5] on the background/foreground extraction problem with real data. The numerical results show that our ADMM with a nontrivial dual step-size is efficient.
△ Less
Submitted 29 September, 2016; v1 submitted 23 June, 2015;
originally announced June 2015.
-
Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems
Authors:
Guoyin Li,
Ting Kei Pong
Abstract:
We adapt the Douglas-Rachford (DR) splitting method to solve nonconvex feasibility problems by studying this method for a class of nonconvex optimization problem. While the convergence properties of the method for convex problems have been well studied, far less is known in the nonconvex setting. In this paper, for the direct adaptation of the method to minimize the sum of a proper closed function…
▽ More
We adapt the Douglas-Rachford (DR) splitting method to solve nonconvex feasibility problems by studying this method for a class of nonconvex optimization problem. While the convergence properties of the method for convex problems have been well studied, far less is known in the nonconvex setting. In this paper, for the direct adaptation of the method to minimize the sum of a proper closed function $g$ and a smooth function $f$ with a Lipschitz continuous gradient, we show that if the step-size parameter is smaller than a computable threshold and the sequence generated has a cluster point, then it gives a stationary point of the optimization problem. Convergence of the whole sequence and a local convergence rate are also established under the additional assumption that $f$ and $g$ are semi-algebraic. We also give simple sufficient conditions guaranteeing the boundedness of the sequence generated. We then apply our nonconvex DR splitting method to finding a point in the intersection of a closed convex set $C$ and a general closed set $D$ by minimizing the squared distance to $C$ subject to $D$. We show that if either set is bounded and the step-size parameter is smaller than a computable threshold, then the sequence generated from the DR splitting method is actually bounded. Consequently, the sequence generated will have cluster points that are stationary for an optimization problem, and the whole sequence is convergent under an additional assumption that $C$ and $D$ are semi-algebraic. We achieve these results based on a new merit function constructed particularly for the DR splitting method. Our preliminary numerical results indicate that our DR splitting method usually outperforms the alternating projection method in finding a sparse solution of a linear system, in terms of both the solution quality and the number of iterations taken.
△ Less
Submitted 5 November, 2015; v1 submitted 30 September, 2014;
originally announced September 2014.
-
Penalty methods for a class of non-Lipschitz optimization problems
Authors:
Xiaojun Chen,
Zhaosong Lu,
Ting Kei Pong
Abstract:
We consider a class of constrained optimization problems with a possibly nonconvex non-Lipschitz objective and a convex feasible set being the intersection of a polyhedron and a possibly degenerate ellipsoid. Such problems have a wide range of applications in data science, where the objective is used for inducing sparsity in the solutions while the constraint set models the noise tolerance and inc…
▽ More
We consider a class of constrained optimization problems with a possibly nonconvex non-Lipschitz objective and a convex feasible set being the intersection of a polyhedron and a possibly degenerate ellipsoid. Such problems have a wide range of applications in data science, where the objective is used for inducing sparsity in the solutions while the constraint set models the noise tolerance and incorporates other prior information for data fitting. To solve this class of constrained optimization problems, a common approach is the penalty method. However, there is little theory on exact penalization for problems with nonconvex and non-Lipschitz objective functions. In this paper, we study the existence of exact penalty parameters regarding local minimizers, stationary points and $ε$-minimizers under suitable assumptions. Moreover, we discuss a penalty method whose subproblems are solved via a nonmonotone proximal gradient method with a suitable update scheme for the penalty parameters, and prove the convergence of the algorithm to a KKT point of the constrained problem. Preliminary numerical results demonstrate the efficiency of the penalty method for finding sparse solutions of underdetermined linear systems.
△ Less
Submitted 6 April, 2016; v1 submitted 8 September, 2014;
originally announced September 2014.
-
Global convergence of splitting methods for nonconvex composite optimization
Authors:
Guoyin Li,
Ting Kei Pong
Abstract:
We consider the problem of minimizing the sum of a smooth function $h$ with a bounded Hessian, and a nonsmooth function. We assume that the latter function is a composition of a proper closed function $P$ and a surjective linear map $\cal M$, with the proximal mappings of $τP$, $τ> 0$, simple to compute. This problem is nonconvex in general and encompasses many important applications in engineerin…
▽ More
We consider the problem of minimizing the sum of a smooth function $h$ with a bounded Hessian, and a nonsmooth function. We assume that the latter function is a composition of a proper closed function $P$ and a surjective linear map $\cal M$, with the proximal mappings of $τP$, $τ> 0$, simple to compute. This problem is nonconvex in general and encompasses many important applications in engineering and machine learning. In this paper, we examined two types of splitting methods for solving this nonconvex optimization problem: alternating direction method of multipliers and proximal gradient algorithm. For the direct adaptation of the alternating direction method of multipliers, we show that, if the penalty parameter is chosen sufficiently large and the sequence generated has a cluster point, then it gives a stationary point of the nonconvex problem. We also establish convergence of the whole sequence under an additional assumption that the functions $h$ and $P$ are semi-algebraic. Furthermore, we give simple sufficient conditions to guarantee boundedness of the sequence generated. These conditions can be satisfied for a wide range of applications including the least squares problem with the $\ell_{1/2}$ regularization. Finally, when $\cal M$ is the identity so that the proximal gradient algorithm can be efficiently applied, we show that any cluster point is stationary under a slightly more flexible constant step-size rule than what is known in the literature for a nonconvex $h$.
△ Less
Submitted 3 November, 2015; v1 submitted 2 July, 2014;
originally announced July 2014.
-
Eigenvalue, Quadratic Programming, and Semidefinite Programming Bounds for a Cut Minimization Problem
Authors:
Ting Kei Pong,
Hao Sun,
Ningchuan Wang,
Henry Wolkowicz
Abstract:
We consider the problem of partitioning the node set of a graph into $k$ sets of given sizes in order to \emph{minimize the cut} obtained using (removing) the $k$-th set. If the resulting cut has value $0$, then we have obtained a vertex separator. This problem is closely related to the graph partitioning problem. In fact, the model we use is the same as that for the graph partitioning problem exc…
▽ More
We consider the problem of partitioning the node set of a graph into $k$ sets of given sizes in order to \emph{minimize the cut} obtained using (removing) the $k$-th set. If the resulting cut has value $0$, then we have obtained a vertex separator. This problem is closely related to the graph partitioning problem. In fact, the model we use is the same as that for the graph partitioning problem except for a different \emph{quadratic} objective function. We look at known and new bounds obtained from various relaxations for this NP-hard problem. This includes: the standard eigenvalue bound, projected eigenvalue bounds using both the adjacency matrix and the Laplacian, quadratic programming (QP) bounds based on recent successful QP bounds for the quadratic assignment problems, and semidefinite programming bounds. We include numerical tests for large and \emph{huge} problems that illustrate the efficiency of the bounds in terms of strength and time.
△ Less
Submitted 18 November, 2014; v1 submitted 20 January, 2014;
originally announced January 2014.
-
Gauge optimization and duality
Authors:
Michael P. Friedlander,
Ives Macedo,
Ting Kei Pong
Abstract:
Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987}, seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and sig…
▽ More
Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987}, seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
△ Less
Submitted 20 March, 2014; v1 submitted 9 October, 2013;
originally announced October 2013.
-
Convex relaxation for finding planted influential nodes in a social network
Authors:
Lisa Elkin,
Ting Kei Pong,
Stephen Vavasis
Abstract:
We consider the problem of maximizing influence in a social network. We focus on the case that the social network is a directed bipartite graph whose arcs join senders to receivers. We consider both the case of deterministic networks and probabilistic graphical models, that is, the so-called "cascade" model. The problem is to find the set of the $k$ most influential senders for a given integer…
▽ More
We consider the problem of maximizing influence in a social network. We focus on the case that the social network is a directed bipartite graph whose arcs join senders to receivers. We consider both the case of deterministic networks and probabilistic graphical models, that is, the so-called "cascade" model. The problem is to find the set of the $k$ most influential senders for a given integer $k$. Although this problem is NP-hard, there is a polynomial-time approximation algorithm due to Kempe, Kleinberg and Tardos. In this work we consider convex relaxation for the problem. We prove that convex optimization can recover the exact optimizer in the case that the network is constructed according to a generative model in which influential nodes are planted but then obscured with noise. We also demonstrate computationally that the convex relaxation can succeed on a more realistic generative model called the "forest fire" model.
△ Less
Submitted 15 July, 2013;
originally announced July 2013.
-
The proximal-proximal gradient algorithm
Authors:
Ting Kei Pong
Abstract:
We consider the problem of minimizing a convex objective which is the sum of a smooth part, with Lipschitz continuous gradient, and a nonsmooth part. Inspired by various applications, we focus on the case when the nonsmooth part is a composition of a proper closed convex function P and a nonzero affine map, with the proximal mappings of τP, τ> 0, easy to compute. In this case, a direct application…
▽ More
We consider the problem of minimizing a convex objective which is the sum of a smooth part, with Lipschitz continuous gradient, and a nonsmooth part. Inspired by various applications, we focus on the case when the nonsmooth part is a composition of a proper closed convex function P and a nonzero affine map, with the proximal mappings of τP, τ> 0, easy to compute. In this case, a direct application of the widely used proximal gradient algorithm does not necessarily lead to easy subproblems. In view of this, we propose a new algorithm, the proximal-proximal gradient algorithm, which admits easy subproblems. Our algorithm reduces to the proximal gradient algorithm if the affine map is just the identity map and the stepsizes are suitably chosen, and it is equivalent to applying a variant of the alternating minimization algorithm [35] to the dual problem. Moreover, it is closely related to inexact proximal gradient algorithms [29,33]. We show that the whole sequence generated from the algorithm converges to an optimal solution. We also establish an upper bound on iteration complexity. Our numerical experiments on the stochastic realization problem and the logistic fused lasso problem suggest that the algorithm performs reasonably well on large-scale instances.
△ Less
Submitted 27 August, 2013; v1 submitted 21 May, 2013;
originally announced May 2013.
-
Computing Optimal Experimental Designs via Interior Point Method
Authors:
Zhaosong Lu,
Ting Kei Pong
Abstract:
In this paper, we study optimal experimental design problems with a broad class of smooth convex optimality criteria, including the classical A-, D- and p th mean criterion. In particular, we propose an interior point (IP) method for them and establish its global convergence. Furthermore, by exploiting the structure of the Hessian matrix of the aforementioned optimality criteria, we derive an expl…
▽ More
In this paper, we study optimal experimental design problems with a broad class of smooth convex optimality criteria, including the classical A-, D- and p th mean criterion. In particular, we propose an interior point (IP) method for them and establish its global convergence. Furthermore, by exploiting the structure of the Hessian matrix of the aforementioned optimality criteria, we derive an explicit formula for computing its rank. Using this result, we then show that the Newton direction arising in the IP method can be computed efficiently via Sherman-Morrison-Woodbury formula when the size of the moment matrix is small relative to the sample size. Finally, we compare our IP method with the widely used multiplicative algorithm introduced by Silvey et al. [29]. The computational results show that the IP method generally outperforms the multiplicative algorithm both in speed and solution quality.
△ Less
Submitted 12 October, 2012; v1 submitted 9 September, 2010;
originally announced September 2010.