Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Bartlett, Peter L.; Tewari, Ambuj

doi:10.1007/978-3-540-27819-1_39

Peter L. Bartlett²⁰ &
Ambuj Tewari²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

International Conference on Computational Learning Theory

2217 Accesses
5 Citations

Abstract

One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic bounds for the number of support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Sparse classification: a scalable discrete optimization perspective

Article 02 November 2021

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

Article 15 November 2019

Optimal Kernel Selection for Density Estimation

References

Anthony, M., Bartlett, P.L.: Neural network learning: Theoretical foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Large Margin Classifiers: convex loss, low noise and convergence rates. In: Advances in Neural Information Processing Systems 16 MIT Press, Cambridge (2004)
Google Scholar
Fiacco, A.V.: Introduction to sensitivity and stability ananlysis in nonlinear programming. Academic Press, New York (1983)
Google Scholar
Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Annals of Statistics 32(1), 30–55 (2004)
MATH MathSciNet Google Scholar
Pollard, D.: Convergence of stochastic processes. Springer, New York (1984)
MATH Google Scholar
Rockafellar, R.T.: Convex analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Steinwart, I.: Sparseness of support vector machines. Journal of Machine Learning Research 4, 1071–1105 (2003)
Article MathSciNet Google Scholar
Steinwart, I.: Sparseness of support vector machines – some asymptotically sharp bounds. In: Advances in Neural Information Processing Systems 16 MIT Press, Cambridge (2004)
Google Scholar
Steinwart, I.: Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory ( to appear)
Google Scholar
Wahba, G.: Soft and hard classification by reproducing kernel Hilbert space methods. Proceedings of the National Academy of Sciences USA 99(26), 16524–16530 (2002)
Article MATH MathSciNet Google Scholar
Zhang, T.: Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research 2, 527–550 (2002)
Article MATH Google Scholar
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics 32(1), 56–85 (2004)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Division of Computer Science and Department of Statistics, University of California, Berkeley
Peter L. Bartlett
Division of Computer Science, University of California, Berkeley
Ambuj Tewari

Authors

Peter L. Bartlett
View author publications
You can also search for this author in PubMed Google Scholar
Ambuj Tewari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London
John Shawe-Taylor
Google, 1600 Amphitheater Parkway, CA 94043, Mountain View, USA
Yoram Singer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartlett, P.L., Tewari, A. (2004). Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-540-27819-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sparse classification: a scalable discrete optimization perspective

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

Optimal Kernel Selection for Density Estimation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sparse classification: a scalable discrete optimization perspective

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

Optimal Kernel Selection for Density Estimation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation