iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s10957-020-01806-7
Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach | Journal of Optimization Theory and Applications Skip to main content
Log in

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In the development of first-order methods for smooth (resp., composite) convex optimization problems, where smooth functions with Lipschitz continuous gradients are minimized, the gradient (resp., gradient mapping) norm becomes a fundamental optimality measure. Under this measure, a fixed iteration algorithm with the optimal iteration complexity for the smooth case is known, while determining this number of iteration to obtain a desired accuracy requires the prior knowledge of the distance from the initial point to the optimal solution set. In this paper, we report an adaptive regularization approach, which attains the nearly optimal iteration complexity without knowing the distance to the optimal solution set. To obtain further faster convergence adaptively, we secondly apply this approach to construct a first-order method that is adaptive to the Hölderian error bound condition (or equivalently, the Łojasiewicz gradient property), which covers moderately wide classes of applications. The proposed method attains nearly optimal iteration complexity with respect to the gradient mapping norm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

  1. Suppose that \(\varphi \) is an L-smooth convex function satisfying (4) for some exponent \(\rho \in [1,2[\). For \(\rho =1\), Lemma 2.2 implies \(\left\| \nabla {\varphi }(x)\right\| \ge \kappa \) for \(x \not \in \mathop {\text {Argmin}}\varphi \). If \(\rho \in ]1,2[\), on the other hand, Lemmas 2.1 and 2.2 imply \({\frac{1}{2L}\left\| \nabla {\varphi }(x)\right\| ^2 \le \varphi (x)-\varphi ^* \le \kappa ^{-\frac{1}{\rho -1}}\left\| \nabla {\varphi }(x)\right\| ^{\frac{\rho }{\rho -1}}}\) for all x, which yields \(\left\| \nabla {\varphi }(x)\right\| \ge \text {const.}\) for all \(x \not \in \text {Argmin }\varphi \). This contradicts to the continuity of \(\nabla {\varphi }\) at points in \(\text {Argmin }\varphi \).

  2. By the strong convexity of \(\left\langle a,x\right\rangle +h(x)\) and \(\left\langle b,x\right\rangle +h(x)\), we have

    $$\begin{aligned} \frac{\mu }{2}\left\| x_a^*-x_b^*\right\| ^2 \le [\left\langle a,x_b^*\right\rangle +h(x_b^*)] - [\left\langle a,x_a^*\right\rangle +h(x_a^*)] ~~\text {and}~~ \frac{\mu }{2}\left\| x_a^*-x_b^*\right\| ^2 \le [\left\langle b,x_a^*\right\rangle +h(x_a^*)]-[\left\langle b,x_b^*\right\rangle +h(x_b^*)], \end{aligned}$$

    respectively. Adding them implies \( \mu \left\| x_a^*-x_b^*\right\| ^2 \le \left\langle a-b,x_b^*-x_a^*\right\rangle \le \left\| a-b\right\| \left\| x_b^*-x_a^*\right\| . \)

  3. In fact, since the derivative \(\log (1+x)\) of the function \(h(x):=(1+x)\log (1+x)-x\) is increasing and vanishes at \(x=0\), we have \(\min _{x>-1}h(x)=h(0)=0\).

  4. This assumption on \(L_f/L_{\min }\) means that we know a good estimate \(L_{\min }\) close to \(L_f\) in advance.

  5. Although (29) is asserted in the case \({x^{(t)}_+} \not \in X^*\), it trivially holds if \({x^{(t)}_+} \in X^*\) unless \(\rho = 1\).

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  2. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  3. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)

    Article  MathSciNet  Google Scholar 

  4. Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152, 381–404 (2015)

    Article  MathSciNet  Google Scholar 

  5. Renegar, J., Grimmer, B.: A simple nearly-optimal restart scheme for speeding-up first order methods. ArXiv preprint arXiv:1803.00151v1 (2018)

  6. Roulet, V., d’Aspremont, A.: Sharpness, restart and acceleration. SIAM J. Optim. 30, 262–289 (2020)

    Article  MathSciNet  Google Scholar 

  7. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  8. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)

    Article  MathSciNet  Google Scholar 

  9. Lu, Z.: Smooth optimization approach for sparse covariance selection. SIAM J. Optim. 19, 1807–1827 (2009)

    Article  MathSciNet  Google Scholar 

  10. Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw. (2020). https://doi.org/10.1080/10556788.2020.1731747

  11. Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. (2020). https://doi.org/10.1007/s10957-020-01770-2

  12. Nesterov, Y.: How to make the gradients small. Optima 88, 10–11 (2012)

    Google Scholar 

  13. Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28, 1920–1950 (2018)

    Article  MathSciNet  Google Scholar 

  14. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du Centre National de la Recherche Scientifique, Paris (1963)

  15. Łojasiewicz, S.: Ensembles semi-analytiques. preprint, Institut des Hautes Études Scientifiques (1965)

  16. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  17. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)

    Article  MathSciNet  Google Scholar 

  18. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)

    Article  Google Scholar 

  19. Liu, M., Yang, T.: Adaptive accelerated gradient converging methods under Hölderian error bound condition. Adv. Neural Inf. Process. Syst. 30, 1 (2017)

    Google Scholar 

  20. Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39, 2069–2095 (2019)

    Article  MathSciNet  Google Scholar 

  21. Lin, Q., Xiao, L.: An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. Comput. Optim. Appl. 60, 633–674 (2015)

    Article  MathSciNet  Google Scholar 

  22. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2014)

    Google Scholar 

  23. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Norwell (2004)

    Book  Google Scholar 

  24. Rockafellar, R.T.: Convex Analysis. Princeton University Press, New Jersey (1970)

    Book  Google Scholar 

  25. Łojasiewicz, S.: Sur le problème de la division. Studia Math. 18, 87–136 (1959)

    Article  MathSciNet  Google Scholar 

  26. Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137, 37–64 (2013)

    Article  MathSciNet  Google Scholar 

  27. Nemirovsky, A.: Information-based complexity of linear operator equations. J. Complexity 8, 153–175 (1992)

    Article  MathSciNet  Google Scholar 

  28. Nemirovsky, A., Nesterov, Y.: Optimal methods for smooth convex minimization. USSR Comput. Math. Math. Phys. 25, 21–30 (1985)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are thankful to an anonymous referee for valuable comments that improved this paper. This work was partially supported by the Grant-in-Aid for Young Scientists (B) (17K12645) and the Grant-in-Aid for Scientific Research (C) (18K11178) from Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaru Ito.

Additional information

Communicated by Lionel Thibault.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, M., Fukuda, M. Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach. J Optim Theory Appl 188, 770–804 (2021). https://doi.org/10.1007/s10957-020-01806-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01806-7

Keywords

Mathematics Subject Classification

Navigation