Abstract
Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off \(U = \Theta(\sqrt{B})\) between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2(4), 319–342.
Auer, P., & Warmuth, M. (1998). Tracking the best disjunction. Machine Learning, 32(2), 127–150.
Block, H. (1962). The Perceptron: A model for brain functioning. Review of Modern Physics, 34, 123–135.
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2003). Learning probabilistic linear-threshold classifiers via selective sampling. In Proceedings of the 16th Annual Conference on Learning Theory (pp. 373–386).
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2005). A second-order Perceptron algorithm. SIAM Journal on Computing, 43(3), 640–668.
Crammer, K., Kandola, J., & Singer, Y. (2004). Online classification on a budget. In Advances in Neural Information Processing Systems 16.
Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2006). The Forgetron: A Kernel-based Perceptron on a fixed budget. In Advances in Neural Information Processing Systems, 18 (pp. 259–266).
Freund, Y., & Schapire, R. (1999). Large margin classification using the Perceptron algorithm. Machine Learning (pp. 277–296).
Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213–242.
Gentile, C. (2003). The robustness of the p-norm algorithms. Machine Learning, 53(3), 265–299.
Gentile, C., & Warmuth, M. (1999). Linear hinge loss and average margin. In Advances in Neural Information Processing Systems, 10 (pp. 225–231).
Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the 10th Annual Conference on Computational Learning Theory (pp. 171–183).
Herbster, M., & Warmuth, M. (1998). Tracking the best expert. Machine Learning, 32(2), 151–178.
Herbster, M., & Warmuth, M. (2001). Tracking the best linear predictor. Journal of Machine Learning Research, 1, 281–309.
Kivinen, J., Smola, A., & Williamson, R. (2004). Online learning with Kernels. IEEE Transactions on Signal Processing, 52(8), 2165–2176.
Li, Y., & Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learning, 46(1/3), 361–387.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4), 285–318.
Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Novikoff, A. (1962). On Convergence proofs of Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata (Vol. XII. pp. 615–622).
Reuters (2000). http://about.reuters.com/researchandstandards/corpus/.
Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.
Vapnik, V. (1998). Statistical Learning Theory. Wiley.
Weston, J., Bordes, A., & Bottou, L. (2005). Online (and offline) on an even tighter budget. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 413–420).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hans Ulrich Simon, Gabor Lugosi and Avrim Blum
An extended abstract appeared in the Proceedings of the 19th Annual Conference on Learning Theory, Springer, 2006. The work of all authors was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778.
Rights and permissions
About this article
Cite this article
Cavallanti, G., Cesa-Bianchi, N. & Gentile, C. Tracking the best hyperplane with a simple budget Perceptron. Mach Learn 69, 143–167 (2007). https://doi.org/10.1007/s10994-007-5003-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5003-0