Abstract
Several decades ago, Profs. Sean Meyn and Lei Guo were postdoctoral fellows at ANU, where they shared interest in recursive algorithms. It seems fitting to celebrate Lei Guo’s 60th birthday with a review of the ODE Method and its recent evolution, with focus on the following themes:
-
The method has been regarded as a technique for algorithm analysis. It is argued that this viewpoint is backwards: The original stochastic approximation method was surely motivated by an ODE, and tools for analysis came much later (based on establishing robustness of Euler approximations). The paper presents a brief survey of recent research in machine learning that shows the power of algorithm design in continuous time, following by careful approximation to obtain a practical recursive algorithm.
-
While these methods are usually presented in a stochastic setting, this is not a prerequisite. In fact, recent theory shows that rates of convergence can be dramatically accelerated by applying techniques inspired by quasi Monte-Carlo.
-
Subject to conditions, the optimal rate of convergence can be obtained by applying the averaging technique of Polyak and Ruppert. The conditions are not universal, but theory suggests alternatives to achieve acceleration.
-
The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.
Similar content being viewed by others
References
Chen H F and Guo L, Identification and Stochastic Adaptive Control, Birkhauser, Boston, 1991.
Chen H F and Guo L, Convergence rate of least-squares identification and adaptive control for stochastic systems, Intl. Journal of Control, 1986, 44(5): 1459–1476.
Chen H F and Guo L, Identification and Stochastic Adaptive Control, Springer Science & Business Media, Berlin, 2012.
Guo L, Estimating time-varying parameters by the Kalman filter based algorithm: Stability and convergence, IEEE Trans. Automat. Control, 1990, 35: 141–147.
Meyn S P and Caines P E, A new approach to stochastic adaptive control, IEEE Trans. Automat. Control, 1987, AC-32: 220–226.
Guo L and Meyn S P, Adaptive control for time-varying systems: A combination of martingale and Markov chain techniques, Int. J. Adaptive Control and Signal Processing, 1989, 3: 1–14.
Xie L L and Guo L, How much uncertainty can be dealt with by feedback?. IEEE Transaction on Automatic Control, 2000, 45: 2203–2217.
Guo L, Stability of recursive stochastic tracking algorithms, SIAM J. Control Optim., 1994, 32: 1195–1225.
Guo L and Ljung L, Performance analysis of general tracking algorithms, IEEE Transactions on Automatic Control, 1995, 40: 1388–1402.
Robbins H and Monro S, A stochastic approximation method, Annals of Mathematical Statistics, 1951, 22: 400–407.
Jaakola T, Jordan M, and Singh S, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 1994, 6: 1185–1201.
Tsitsiklis J, Asynchronous stochastic approximation and Q-learning, Machine Learning, 1994, 16: 185–202.
Meyn S, Control Systems and Reinforcement Learning, Cambridge University Press, Cambridge, 2022 (to appear). Pre-publication draft online https://meyn.ece.ufl.edu/2021/08/01/control-systems-and-reinforcement-learning/.
Sutton R and Barto A, Reinforcement Learning: An Introduction, MIT Press, 2nd Edition, On-line edition at http://www.cs.ualberta.ca/sutton/book/the-book.html, Cambridge, MA, 2018.
Ljung L, Analysis of recursive stochastic algorithms, Trans. on Automatic Control, 1977, 22(4): 551–575.
Fradkov A and Polyak B T, Adaptive and robust control in the USSR, IFAC — PapersOnLine, 2020, 53(2): 1373–1378.
Tsypkin Y Z and Nikolic Z J, Adaptation and Learning in Automatic Systems, Academic Press, New York, 1971.
Borkar V S, Stochastic Approximation: A Dynamical Systems Viewpoint, 2nd Edition, Hindustan Book Agency, Delhi, India and Cambridge, UK, 2020.
Su W, Boyd S, and Candes E, A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights, Proc. Advances in Neural Information Processing Systems, 2014, 2510–2518.
Attouch H, Goudou X, and Redont P, The heavy ball with friction method, I. the continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Communications in Contemporary Mathematics, 2000, 2(1): 1–34.
Krichene W and Bartlett P L, Acceleration and averaging in stochastic descent dynamics, Proc. Advances in Neural Information Processing Systems, 2017, 30: 6796–6806.
Raginsky M and Bouvrie J, Continuous-time stochastic mirror descent on a network: Variance reduction, consensus, convergence, Proc. of the Conf. on Dec. and Control, 2012, 6793–6800.
Shi B, Du S S, Su W, et al., Acceleration via symplectic discretization of high-resolution differential equations, Eds. by Wallach H, Larochelle H, Beygelzimer A, et al., Proc. Advances in Neural Information Processing Systems, 2019, 5744–5752.
Wibisono A, Wilson A C, and Jordan M I, A variational perspective on accelerated methods in optimization, Proc. of the National Academy of Sciences, 2016, 113: E7351–E7358.
Zhang J, Mokhtari A, Sra S, et al., Direct Runge-Kutta discretization achieves acceleration. Proc. of the Intl. Conference on Neural Information Processing Systems, 2018, 3904–3913.
Devraj A M, Bušić A, and Meyn S, On matrix momentum stochastic approximation and applications to Q-learning, Allerton Conference on Communication, Control, and Computing, 2019, 749–756.
Lapeybe B, Pages G, and Sab K, Sequences with low discrepancy generalisation and application to Robbins-Monro algorithm, Statistics, 1990, 21(2): 251–272.
Laruelle S and Pagès G, Stochastic approximation with averaging innovation applied to finance, Monte Carlo Methods and Applications, 2012, 18(1): 1–51.
Chen S, Devraj A, Bernstein A, et al., Accelerating optimization and reinforcement learning with quasi stochastic approximation, Proc. of the American Control Conf., 2021, 1965–1972.
Bernstein A, Chen Y, Colombino M, et al., Optimal rate of convergence for quasi-stochastic approximation, arXiv: 1903.07228, 2019.
Bernstein A, Chen Y, Colombino M, et al., Quasi-stochastic approximation and off-policy reinforcement learning, Proc. of the Conf. on Dec. and Control, 2019, 5244–5251.
Chen S, Devraj A M, Bušić A, et al., Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation, Eds. by Chiappa S and Calandra R, Proc. of AISTATS, 2020, 108: 4173–4183.
Borkar V S and Meyn S P, The ODE method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim., 2000, 38(2): 447–469.
Ramaswamy A and Bhatnagar S, Stability of stochastic approximations with ‘controlled Markov’ noise and temporal difference learning, IEEE Transactions on Automatic Control, 2019, 64: 2614–2620.
Polyak B T, A new method of stochastic approximation type, Avtomatika i Telemekhanika (in Russian), 1990, 98–107; translated in Automat. Remote Control, 1991, 51.
Polyak B T and Juditsky A B, Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 1992, 30(4): 838–855.
Ruppert D, Efficient estimators from a slowly convergent Robbins-Monro processes, Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research and Industrial Engineering, Ithaca, NY, 1988.
Kaledin M, Moulines E, Naumov A, et al., Finite time analysis of linear two-timescale stochastic approximation with Markovian noise, arXiv: 2002.01268, 2020.
Ariyur K B and Krstić M, Real Time Optimization by Extremum Seeking Control, John Wiley & Sons, Inc., New York, NY, 2003.
Liu S and Krstic M, Introduction to extremum seeking, Stochastic Averaging and Stochastic Extremum Seeking, Communications and Control Engineering, Springer, London, 2012.
Tan Y, Moase W H, Manzie C, et al., Extremum seeking from 1922 to 2010, Proc.of the 29th Chinese Control Conference, IEEE, 2010, 14–26.
Kiefer J and Wolfowitz J, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 1952, 23(3): 462–466.
Koronacki J, Random-seeking methods for the stochastic unconstrained optimization, International Journal of Control, 1975, 21(3): 517–527.
Spall J C, A stochastic approximation technique for generating maximum likelihood parameter estimates, Proc. of the American Control Conf., IEEE, 1987, 1161–1167.
Spall J C, A one-measurement form of simultaneous perturbation stochastic approximation, Automatica, 1997, 33(1): 109–112.
Spall J C, et al., Multivariate stochastic approximation using a simultaneous perturbation gradient approximation, IEEE transactions on Automatic Control, 1992, 37(3): 332–341.
Chen H, Duncan T, and Pasik-Duncan B, A Kiefer-Wolfowitz algorithm with randomized differences, IEEE Transactions on Automatic Control, 1999, 44(3): 442–453.
Nesterov Y and Spokoiny V, Random gradient-free minimization of convex functions, Foundations of Computational Mathematics, 2017, 17(2): 527–566.
Bhatnagar S, Fu M C, Marcus S I, et al., Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences, ACM Transactions on Modeling and Computer Simulation (TOMACS), 2003, 13(2): 180–209.
Mandl P, Estimation and control in Markov chains, Advances in Applied Probability, 1974, 6(1): 40–60.
Borkar V and Varaiya P, Adaptive control of Markov chains, I: Finite parameter set, IEEE Trans. Automat. Control, 1979, 24(6): 953–957.
Borkar V and Varaiya P, Identification and adaptive control of Markov chains, SIAM J. Control Optim., 1982, 20(4): 470–489.
Borkar V S, Identification and adaptive control of Markov chains, PhD Thesis, University of California, Berkeley, 1980.
Williams R J, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, 8(3–4): 229–256.
Mania H, Guy A, and Recht B, Simple random search provides a competitive approach to reinforcement learning, Proc. Advances in Neural Information Processing Systems, 2018, 1800–1809.
Asmussen S and Glynn P W, Stochastic Simulation: Algorithms and Analysis, volume 57 of Stochastic Modelling and Applied Probability, Springer-Verlag, New York, 2007.
Metivier M and Priouret P, Theoremes de convergence presque sure pour une classe d’algorithmes stochastiques a pas decroissants, Prob. Theory Related Fields, 1987, 74: 403–428.
Spall J C, Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, John Wiley & Sons, New York, 2003.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This research was supported by ARO W911NF1810334, and NSF under EPCN 1935389. The work of A. Bernstein was supported in part by the National Renewable Energy Laboratory (NREL), operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract DE-AC36-08GO28308 and in part by the Laboratory Directed Research and Development (LDRD) Program at NREL. The views expressed in the paper do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the paper for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
Rights and permissions
About this article
Cite this article
Chen, S., Devraj, A., Berstein, A. et al. Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation. J Syst Sci Complex 34, 1681–1702 (2021). https://doi.org/10.1007/s11424-021-1251-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-021-1251-5