iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s11424-021-1251-5
Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation | Journal of Systems Science and Complexity Skip to main content
Log in

Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

Several decades ago, Profs. Sean Meyn and Lei Guo were postdoctoral fellows at ANU, where they shared interest in recursive algorithms. It seems fitting to celebrate Lei Guo’s 60th birthday with a review of the ODE Method and its recent evolution, with focus on the following themes:

  • The method has been regarded as a technique for algorithm analysis. It is argued that this viewpoint is backwards: The original stochastic approximation method was surely motivated by an ODE, and tools for analysis came much later (based on establishing robustness of Euler approximations). The paper presents a brief survey of recent research in machine learning that shows the power of algorithm design in continuous time, following by careful approximation to obtain a practical recursive algorithm.

  • While these methods are usually presented in a stochastic setting, this is not a prerequisite. In fact, recent theory shows that rates of convergence can be dramatically accelerated by applying techniques inspired by quasi Monte-Carlo.

  • Subject to conditions, the optimal rate of convergence can be obtained by applying the averaging technique of Polyak and Ruppert. The conditions are not universal, but theory suggests alternatives to achieve acceleration.

  • The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen H F and Guo L, Identification and Stochastic Adaptive Control, Birkhauser, Boston, 1991.

    Book  MATH  Google Scholar 

  2. Chen H F and Guo L, Convergence rate of least-squares identification and adaptive control for stochastic systems, Intl. Journal of Control, 1986, 44(5): 1459–1476.

    Article  MATH  Google Scholar 

  3. Chen H F and Guo L, Identification and Stochastic Adaptive Control, Springer Science & Business Media, Berlin, 2012.

    Google Scholar 

  4. Guo L, Estimating time-varying parameters by the Kalman filter based algorithm: Stability and convergence, IEEE Trans. Automat. Control, 1990, 35: 141–147.

    Article  MathSciNet  MATH  Google Scholar 

  5. Meyn S P and Caines P E, A new approach to stochastic adaptive control, IEEE Trans. Automat. Control, 1987, AC-32: 220–226.

    Article  MathSciNet  MATH  Google Scholar 

  6. Guo L and Meyn S P, Adaptive control for time-varying systems: A combination of martingale and Markov chain techniques, Int. J. Adaptive Control and Signal Processing, 1989, 3: 1–14.

    Article  MATH  Google Scholar 

  7. Xie L L and Guo L, How much uncertainty can be dealt with by feedback?. IEEE Transaction on Automatic Control, 2000, 45: 2203–2217.

    Article  MathSciNet  MATH  Google Scholar 

  8. Guo L, Stability of recursive stochastic tracking algorithms, SIAM J. Control Optim., 1994, 32: 1195–1225.

    Article  MathSciNet  MATH  Google Scholar 

  9. Guo L and Ljung L, Performance analysis of general tracking algorithms, IEEE Transactions on Automatic Control, 1995, 40: 1388–1402.

    Article  MathSciNet  MATH  Google Scholar 

  10. Robbins H and Monro S, A stochastic approximation method, Annals of Mathematical Statistics, 1951, 22: 400–407.

    Article  MathSciNet  MATH  Google Scholar 

  11. Jaakola T, Jordan M, and Singh S, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 1994, 6: 1185–1201.

    Article  MATH  Google Scholar 

  12. Tsitsiklis J, Asynchronous stochastic approximation and Q-learning, Machine Learning, 1994, 16: 185–202.

    Article  MATH  Google Scholar 

  13. Meyn S, Control Systems and Reinforcement Learning, Cambridge University Press, Cambridge, 2022 (to appear). Pre-publication draft online https://meyn.ece.ufl.edu/2021/08/01/control-systems-and-reinforcement-learning/.

    Book  Google Scholar 

  14. Sutton R and Barto A, Reinforcement Learning: An Introduction, MIT Press, 2nd Edition, On-line edition at http://www.cs.ualberta.ca/sutton/book/the-book.html, Cambridge, MA, 2018.

    MATH  Google Scholar 

  15. Ljung L, Analysis of recursive stochastic algorithms, Trans. on Automatic Control, 1977, 22(4): 551–575.

    Article  MathSciNet  MATH  Google Scholar 

  16. Fradkov A and Polyak B T, Adaptive and robust control in the USSR, IFAC — PapersOnLine, 2020, 53(2): 1373–1378.

    Article  Google Scholar 

  17. Tsypkin Y Z and Nikolic Z J, Adaptation and Learning in Automatic Systems, Academic Press, New York, 1971.

    Google Scholar 

  18. Borkar V S, Stochastic Approximation: A Dynamical Systems Viewpoint, 2nd Edition, Hindustan Book Agency, Delhi, India and Cambridge, UK, 2020.

    MATH  Google Scholar 

  19. Su W, Boyd S, and Candes E, A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights, Proc. Advances in Neural Information Processing Systems, 2014, 2510–2518.

  20. Attouch H, Goudou X, and Redont P, The heavy ball with friction method, I. the continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Communications in Contemporary Mathematics, 2000, 2(1): 1–34.

    Article  MathSciNet  MATH  Google Scholar 

  21. Krichene W and Bartlett P L, Acceleration and averaging in stochastic descent dynamics, Proc. Advances in Neural Information Processing Systems, 2017, 30: 6796–6806.

    Google Scholar 

  22. Raginsky M and Bouvrie J, Continuous-time stochastic mirror descent on a network: Variance reduction, consensus, convergence, Proc. of the Conf. on Dec. and Control, 2012, 6793–6800.

  23. Shi B, Du S S, Su W, et al., Acceleration via symplectic discretization of high-resolution differential equations, Eds. by Wallach H, Larochelle H, Beygelzimer A, et al., Proc. Advances in Neural Information Processing Systems, 2019, 5744–5752.

  24. Wibisono A, Wilson A C, and Jordan M I, A variational perspective on accelerated methods in optimization, Proc. of the National Academy of Sciences, 2016, 113: E7351–E7358.

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhang J, Mokhtari A, Sra S, et al., Direct Runge-Kutta discretization achieves acceleration. Proc. of the Intl. Conference on Neural Information Processing Systems, 2018, 3904–3913.

  26. Devraj A M, Bušić A, and Meyn S, On matrix momentum stochastic approximation and applications to Q-learning, Allerton Conference on Communication, Control, and Computing, 2019, 749–756.

  27. Lapeybe B, Pages G, and Sab K, Sequences with low discrepancy generalisation and application to Robbins-Monro algorithm, Statistics, 1990, 21(2): 251–272.

    Article  MathSciNet  MATH  Google Scholar 

  28. Laruelle S and Pagès G, Stochastic approximation with averaging innovation applied to finance, Monte Carlo Methods and Applications, 2012, 18(1): 1–51.

    Article  MathSciNet  MATH  Google Scholar 

  29. Chen S, Devraj A, Bernstein A, et al., Accelerating optimization and reinforcement learning with quasi stochastic approximation, Proc. of the American Control Conf., 2021, 1965–1972.

  30. Bernstein A, Chen Y, Colombino M, et al., Optimal rate of convergence for quasi-stochastic approximation, arXiv: 1903.07228, 2019.

  31. Bernstein A, Chen Y, Colombino M, et al., Quasi-stochastic approximation and off-policy reinforcement learning, Proc. of the Conf. on Dec. and Control, 2019, 5244–5251.

  32. Chen S, Devraj A M, Bušić A, et al., Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation, Eds. by Chiappa S and Calandra R, Proc. of AISTATS, 2020, 108: 4173–4183.

  33. Borkar V S and Meyn S P, The ODE method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim., 2000, 38(2): 447–469.

    Article  MathSciNet  MATH  Google Scholar 

  34. Ramaswamy A and Bhatnagar S, Stability of stochastic approximations with ‘controlled Markov’ noise and temporal difference learning, IEEE Transactions on Automatic Control, 2019, 64: 2614–2620.

    Article  MathSciNet  MATH  Google Scholar 

  35. Polyak B T, A new method of stochastic approximation type, Avtomatika i Telemekhanika (in Russian), 1990, 98–107; translated in Automat. Remote Control, 1991, 51.

  36. Polyak B T and Juditsky A B, Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 1992, 30(4): 838–855.

    Article  MathSciNet  MATH  Google Scholar 

  37. Ruppert D, Efficient estimators from a slowly convergent Robbins-Monro processes, Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research and Industrial Engineering, Ithaca, NY, 1988.

    Google Scholar 

  38. Kaledin M, Moulines E, Naumov A, et al., Finite time analysis of linear two-timescale stochastic approximation with Markovian noise, arXiv: 2002.01268, 2020.

  39. Ariyur K B and Krstić M, Real Time Optimization by Extremum Seeking Control, John Wiley & Sons, Inc., New York, NY, 2003.

    Book  MATH  Google Scholar 

  40. Liu S and Krstic M, Introduction to extremum seeking, Stochastic Averaging and Stochastic Extremum Seeking, Communications and Control Engineering, Springer, London, 2012.

    Chapter  MATH  Google Scholar 

  41. Tan Y, Moase W H, Manzie C, et al., Extremum seeking from 1922 to 2010, Proc.of the 29th Chinese Control Conference, IEEE, 2010, 14–26.

  42. Kiefer J and Wolfowitz J, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 1952, 23(3): 462–466.

    Article  MathSciNet  MATH  Google Scholar 

  43. Koronacki J, Random-seeking methods for the stochastic unconstrained optimization, International Journal of Control, 1975, 21(3): 517–527.

    Article  MathSciNet  MATH  Google Scholar 

  44. Spall J C, A stochastic approximation technique for generating maximum likelihood parameter estimates, Proc. of the American Control Conf., IEEE, 1987, 1161–1167.

  45. Spall J C, A one-measurement form of simultaneous perturbation stochastic approximation, Automatica, 1997, 33(1): 109–112.

    Article  MathSciNet  MATH  Google Scholar 

  46. Spall J C, et al., Multivariate stochastic approximation using a simultaneous perturbation gradient approximation, IEEE transactions on Automatic Control, 1992, 37(3): 332–341.

    Article  MathSciNet  MATH  Google Scholar 

  47. Chen H, Duncan T, and Pasik-Duncan B, A Kiefer-Wolfowitz algorithm with randomized differences, IEEE Transactions on Automatic Control, 1999, 44(3): 442–453.

    Article  MathSciNet  MATH  Google Scholar 

  48. Nesterov Y and Spokoiny V, Random gradient-free minimization of convex functions, Foundations of Computational Mathematics, 2017, 17(2): 527–566.

    Article  MathSciNet  MATH  Google Scholar 

  49. Bhatnagar S, Fu M C, Marcus S I, et al., Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences, ACM Transactions on Modeling and Computer Simulation (TOMACS), 2003, 13(2): 180–209.

    Article  MATH  Google Scholar 

  50. Mandl P, Estimation and control in Markov chains, Advances in Applied Probability, 1974, 6(1): 40–60.

    Article  MathSciNet  MATH  Google Scholar 

  51. Borkar V and Varaiya P, Adaptive control of Markov chains, I: Finite parameter set, IEEE Trans. Automat. Control, 1979, 24(6): 953–957.

    Article  MathSciNet  MATH  Google Scholar 

  52. Borkar V and Varaiya P, Identification and adaptive control of Markov chains, SIAM J. Control Optim., 1982, 20(4): 470–489.

    Article  MathSciNet  MATH  Google Scholar 

  53. Borkar V S, Identification and adaptive control of Markov chains, PhD Thesis, University of California, Berkeley, 1980.

    MATH  Google Scholar 

  54. Williams R J, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, 8(3–4): 229–256.

    Article  MATH  Google Scholar 

  55. Mania H, Guy A, and Recht B, Simple random search provides a competitive approach to reinforcement learning, Proc. Advances in Neural Information Processing Systems, 2018, 1800–1809.

  56. Asmussen S and Glynn P W, Stochastic Simulation: Algorithms and Analysis, volume 57 of Stochastic Modelling and Applied Probability, Springer-Verlag, New York, 2007.

    Book  MATH  Google Scholar 

  57. Metivier M and Priouret P, Theoremes de convergence presque sure pour une classe d’algorithmes stochastiques a pas decroissants, Prob. Theory Related Fields, 1987, 74: 403–428.

    Article  MATH  Google Scholar 

  58. Spall J C, Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, John Wiley & Sons, New York, 2003.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuhang Chen, Adithya Devraj, Andrey Berstein or Sean Meyn.

Additional information

This research was supported by ARO W911NF1810334, and NSF under EPCN 1935389. The work of A. Bernstein was supported in part by the National Renewable Energy Laboratory (NREL), operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract DE-AC36-08GO28308 and in part by the Laboratory Directed Research and Development (LDRD) Program at NREL. The views expressed in the paper do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the paper for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Devraj, A., Berstein, A. et al. Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation. J Syst Sci Complex 34, 1681–1702 (2021). https://doi.org/10.1007/s11424-021-1251-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-021-1251-5

Keywords

Navigation