Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

Ito, Masaru; Fukuda, Mituhiro

doi:10.1007/s10957-020-01806-7

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

Published: 27 January 2021

Volume 188, pages 770–804, (2021)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

482 Accesses
4 Citations
Explore all metrics

Abstract

In the development of first-order methods for smooth (resp., composite) convex optimization problems, where smooth functions with Lipschitz continuous gradients are minimized, the gradient (resp., gradient mapping) norm becomes a fundamental optimality measure. Under this measure, a fixed iteration algorithm with the optimal iteration complexity for the smooth case is known, while determining this number of iteration to obtain a desired accuracy requires the prior knowledge of the distance from the initial point to the optimal solution set. In this paper, we report an adaptive regularization approach, which attains the nearly optimal iteration complexity without knowing the distance to the optimal solution set. To obtain further faster convergence adaptively, we secondly apply this approach to construct a first-order method that is adaptive to the Hölderian error bound condition (or equivalently, the Łojasiewicz gradient property), which covers moderately wide classes of applications. The proposed method attains nearly optimal iteration complexity with respect to the gradient mapping norm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Article 01 July 2016

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Article 25 November 2021

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Article 01 June 2019

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

Suppose that $\varphi $ is an L-smooth convex function satisfying (4) for some exponent $\rho \in [1,2[$. For $\rho =1$, Lemma 2.2 implies $\left\| \nabla {\varphi }(x)\right\| \ge \kappa $ for $x \not \in \mathop {\text {Argmin}}\varphi $. If $\rho \in ]1,2[$, on the other hand, Lemmas 2.1 and 2.2 imply ${\frac{1}{2L}\left\| \nabla {\varphi }(x)\right\| ^2 \le \varphi (x)-\varphi ^* \le \kappa ^{-\frac{1}{\rho -1}}\left\| \nabla {\varphi }(x)\right\| ^{\frac{\rho }{\rho -1}}}$ for all x, which yields $\left\| \nabla {\varphi }(x)\right\| \ge \text {const.}$ for all $x \not \in \text {Argmin }\varphi $. This contradicts to the continuity of $\nabla {\varphi }$ at points in $\text {Argmin }\varphi $.
By the strong convexity of $\left\langle a,x\right\rangle +h(x)$ and $\left\langle b,x\right\rangle +h(x)$, we have
$$\begin{aligned} \frac{\mu }{2}\left\| x_a^*-x_b^*\right\| ^2 \le [\left\langle a,x_b^*\right\rangle +h(x_b^*)] - [\left\langle a,x_a^*\right\rangle +h(x_a^*)] ~~\text {and}~~ \frac{\mu }{2}\left\| x_a^*-x_b^*\right\| ^2 \le [\left\langle b,x_a^*\right\rangle +h(x_a^*)]-[\left\langle b,x_b^*\right\rangle +h(x_b^*)], \end{aligned}$$
respectively. Adding them implies $ \mu \left\| x_a^*-x_b^*\right\| ^2 \le \left\langle a-b,x_b^*-x_a^*\right\rangle \le \left\| a-b\right\| \left\| x_b^*-x_a^*\right\| . $
In fact, since the derivative $\log (1+x)$ of the function $h(x):=(1+x)\log (1+x)-x$ is increasing and vanishes at $x=0$, we have $\min _{x>-1}h(x)=h(0)=0$.
This assumption on $L_f/L_{\min }$ means that we know a good estimate $L_{\min }$ close to $L_f$ in advance.
Although (29) is asserted in the case ${x^{(t)}_+} \not \in X^*$, it trivially holds if ${x^{(t)}_+} \in X^*$ unless $\rho = 1$.

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Dokl. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152, 381–404 (2015)
Article MathSciNet Google Scholar
Renegar, J., Grimmer, B.: A simple nearly-optimal restart scheme for speeding-up first order methods. ArXiv preprint arXiv:1803.00151v1 (2018)
Roulet, V., d’Aspremont, A.: Sharpness, restart and acceleration. SIAM J. Optim. 30, 262–289 (2020)
Article MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)
Article MathSciNet Google Scholar
Lu, Z.: Smooth optimization approach for sparse covariance selection. SIAM J. Optim. 19, 1807–1827 (2009)
Article MathSciNet Google Scholar
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw. (2020). https://doi.org/10.1080/10556788.2020.1731747
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. (2020). https://doi.org/10.1007/s10957-020-01770-2
Nesterov, Y.: How to make the gradients small. Optima 88, 10–11 (2012)
Google Scholar
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28, 1920–1950 (2018)
Article MathSciNet Google Scholar
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du Centre National de la Recherche Scientifique, Paris (1963)
Łojasiewicz, S.: Ensembles semi-analytiques. preprint, Institut des Hautes Études Scientifiques (1965)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)
Article Google Scholar
Liu, M., Yang, T.: Adaptive accelerated gradient converging methods under Hölderian error bound condition. Adv. Neural Inf. Process. Syst. 30, 1 (2017)
Google Scholar
Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39, 2069–2095 (2019)
Article MathSciNet Google Scholar
Lin, Q., Xiao, L.: An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. Comput. Optim. Appl. 60, 633–674 (2015)
Article MathSciNet Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2014)
Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Norwell (2004)
Book Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, New Jersey (1970)
Book Google Scholar
Łojasiewicz, S.: Sur le problème de la division. Studia Math. 18, 87–136 (1959)
Article MathSciNet Google Scholar
Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137, 37–64 (2013)
Article MathSciNet Google Scholar
Nemirovsky, A.: Information-based complexity of linear operator equations. J. Complexity 8, 153–175 (1992)
Article MathSciNet Google Scholar
Nemirovsky, A., Nesterov, Y.: Optimal methods for smooth convex minimization. USSR Comput. Math. Math. Phys. 25, 21–30 (1985)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are thankful to an anonymous referee for valuable comments that improved this paper. This work was partially supported by the Grant-in-Aid for Young Scientists (B) (17K12645) and the Grant-in-Aid for Scientific Research (C) (18K11178) from Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Department of Mathematics, College of Science and Technology, Nihon University, 1-8-14 Kanda-Surugadai, Chiyoda, Tokyo, 101-8308, Japan
Masaru Ito
Department of Mathematical and Computing Science, Tokyo Institute of Technology, 2-12-1-W8-41 Oh-okayama, Meguro, Tokyo, 152-8552, Japan
Mituhiro Fukuda

Authors

Masaru Ito
View author publications
You can also search for this author in PubMed Google Scholar
Mituhiro Fukuda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masaru Ito.

Additional information

Communicated by Lionel Thibault.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ito, M., Fukuda, M. Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach. J Optim Theory Appl 188, 770–804 (2021). https://doi.org/10.1007/s10957-020-01806-7

Download citation

Received: 15 January 2020
Accepted: 26 December 2020
Published: 27 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10957-020-01806-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation