Abstract
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic bounds for the number of support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anthony, M., Bartlett, P.L.: Neural network learning: Theoretical foundations. Cambridge University Press, Cambridge (1999)
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Large Margin Classifiers: convex loss, low noise and convergence rates. In: Advances in Neural Information Processing Systems 16 MIT Press, Cambridge (2004)
Fiacco, A.V.: Introduction to sensitivity and stability ananlysis in nonlinear programming. Academic Press, New York (1983)
Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Annals of Statistics 32(1), 30–55 (2004)
Pollard, D.: Convergence of stochastic processes. Springer, New York (1984)
Rockafellar, R.T.: Convex analysis. Princeton University Press, Princeton (1970)
Steinwart, I.: Sparseness of support vector machines. Journal of Machine Learning Research 4, 1071–1105 (2003)
Steinwart, I.: Sparseness of support vector machines – some asymptotically sharp bounds. In: Advances in Neural Information Processing Systems 16 MIT Press, Cambridge (2004)
Steinwart, I.: Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory ( to appear)
Wahba, G.: Soft and hard classification by reproducing kernel Hilbert space methods. Proceedings of the National Academy of Sciences USA 99(26), 16524–16530 (2002)
Zhang, T.: Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research 2, 527–550 (2002)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics 32(1), 56–85 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bartlett, P.L., Tewari, A. (2004). Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive