Abstract
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the Λ-strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general game-theoretic problems. In particular, we describe a notion of generalized regret andshow its applications in learning theory.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Auer, P., Cesa-Bianchi, N., & Gentile, C. (2002). Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64:1, 48–75.
Blackwell, D. (1956). An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6, 1–8.
Block, H. (1962). The perceptron: A model for brain functioning. Review of Modern Physics, 34, 123–135.
Bregman, L. (1967). The relaxation method of finding the common point of convex sets and its application to the solutions of problems in convex programming. USSR Computational Mathematics and Physics, 7, 200–217.
Cesa-Bianchi, N. (1999). Analysis of two gradient-based algorithms for on-line regression. Journal of Computer and System Sciences, 59:3, 392–411.
Cesa-Bianchi, N., Freund, Y., Helmbold, D., Haussler, D., Schapire, R., & Warmuth, M. (1997). How to use expert advice. Journal of the ACM, 44:3, 427–485.
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2002). A second-order perceptron algorithm. In Proceedings of the 15th Annual Conference on Computational Learning Theory. Lecture Notes in Artificial Intelligence, Springer, 2002.
Cohen,W., & Singer, Y. (1999). Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems, 17:2, 141–173.
Cover, T., & Thomas, J. (1991). Elements of Information Theory. John Wiley and Sons.
Foster, D., & Vohra, R. (1997). Calibrated learning and correlated equilibrium. Games and Economic Behaviour, 21, 40–55.
Foster, D., & Vohra, R. (1999). Regret in the on-line decision problem. Games and Economic Behaviour, 29:1/2, 7–35.
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.
Freund, Y., & Schapire, R. (1999a). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79–103.
Freund,Y., & Schapire, R. (1999b). Large margin classification using the perceptron algorithm. Machine Learning, 37:3, 277–296.
Freund, Y., Schapire, R., Singer, Y., & Warmuth, M. (1997). Using and combining predictors that specialize. In Proceedings of the 29th Annual ACM Symposium on the Theory of Computing (pp. 334–343). ACM Press.
Fudenberg, D., & Levine, D. (1995). Universal consistency and cautious fictitious play. Journal of Economic, Dynamics, and Control, 19, 1065–1090.
Fudenberg, D., & Levine, D. (1999). Conditional universal consistency. Games and Economic Behaviour, 29, 7–35.
Gentile, C. (2001). The robustness of the p-norm algorithms.An extended abstract (co-authored with N. Littlestone) appeared in the Proceedings of the 12th Annual Conference on Computational Learning Theory (pp. 1–11). ACM Press, 1999.
Gentile, C., & Warmuth, M. (1999). Linear hinge loss and average margin. In Advances in Neural Information Processing Systems 10. MIT Press.
Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the 10th Annual Conference on Computational Learning Theory (pp. 171–183). ACM Press.
Grove, A., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43:3, 173–210.
Hannan, J. (1957). Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3, 97–139.
Hart, S., & Mas-Colell, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68, 1127–1150.
Hart, S., & Mas-Colell, A. (2001). A general class of adaptive strategies. Journal of Economic Theory, 98:1, 26–54.
Haussler, D., Kivinen, J., & Warmuth, M. (1998). Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44, 1906–1925.
Kivinen, J., & Warmuth, M. (1999). Averaging expert predictions. In Proceedings of the Fourth European Conference on Computational Learning Theory (pp. 153–167). Lecture Notes in Artificial Intelligence, vol. 1572. Springer.
Kivinen, J., & Warmuth, M. (2001). Relative loss bounds for multidimensional regression problems. Machine Learning, 45:3, 301–329.
Lehrer, E. (2001). A wide range no-regret theorem. Games and Economic Behavior, in press.
Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. Ph.D. Thesis, University of California at Santa Cruz.
Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Novikoff, A. (1962). On convergence proofs of perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, vol. XII. pp. 615–622.
Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan Books.
Schapire, R. (2001). Drifting games. Machine Learning, 43:5, 265–291.
Seneta, E. (1981). Non-Negative Matrices and Markov Chains. Springer.
Vovk, V. (1990). Aggregating strategies. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory. pp. 372–383.
Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56:2, 153–173.
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Warmuth, M., & Jagota, A. (1997). Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence. In Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cesa-Bianchi, N., Lugosi, G. Potential-Based Algorithms in On-Line Prediction and Game Theory. Machine Learning 51, 239–261 (2003). https://doi.org/10.1023/A:1022901500417
Issue Date:
DOI: https://doi.org/10.1023/A:1022901500417