Learning noisy linear classifiers via adaptive and selective sampling

Cavallanti, Giovanni; Cesa-Bianchi, Nicolò; Gentile, Claudio

doi:10.1007/s10994-010-5191-x

Learning noisy linear classifiers via adaptive and selective sampling

Published: 20 May 2010

Volume 83, pages 71–102, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning noisy linear classifiers via adaptive and selective sampling

Download PDF

Giovanni Cavallanti¹,
Nicolò Cesa-Bianchi¹ &
Claudio Gentile²

774 Accesses
17 Citations
Explore all metrics

Abstract

We introduce efficient margin-based algorithms for selective sampling and filtering in binary classification tasks. Experiments on real-world textual data reveal that our algorithms perform significantly better than popular and similarly efficient competitors. Using the so-called Mammen-Tsybakov low noise condition to parametrize the instance distribution, and assuming linear label noise, we show bounds on the convergence rate to the Bayes risk of a weaker adaptive variant of our selective sampler. Our analysis reveals that, excluding logarithmic factors, the average risk of this adaptive sampler converges to the Bayes risk at rate N ^{−(1+α)(2+α)/2(3+α)} where N denotes the number of queried labels, and α>0 is the exponent in the low noise condition. For all \(\alpha>\sqrt{3}-1\approx0.73\) this convergence rate is asymptotically faster than the rate N ^{−(1+α)/(2+α)} achieved by the fully supervised version of the base selective sampler, which queries all labels. Moreover, for α→∞ (hard margin condition) the gap between the semi- and fully-supervised rates becomes exponential.

Article PDF

Learning from binary labels with instance-dependent noise

Article 22 May 2018

Sparse classification: a scalable discrete optimization perspective

Article 02 November 2021

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

Article Open access 21 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Angluin, D. (2004). Queries revisited. Theoretical Computer Science, 313(2), 175–194.
Article MATH MathSciNet Google Scholar
Azoury, K. S., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3), 211–246.
Article MATH Google Scholar
Balcan, M. F., Beygelzimer, A., & Langford, J. (2006). Agnostic active learning. In Proceedings of the 23rd international conference on machine learning (ICML) (pp. 65–72). Omnipress.
Balcan, M. F., Broder, A., & Zhang, T. (2007). Margin-based active learning. In Proceedings of the 20th annual conference on learning theory (COLT) (pp. 35–50). Berlin: Springer.
Google Scholar
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473), 138–156.
Article MATH MathSciNet Google Scholar
Blanchard, G., Bousquet, O., & Zwald, L. (2007). Statistical properties of kernel principal component analysis. Machine Learning, 66(2–3), 259–294.
Article Google Scholar
Boucheron, S., Bousquet, O., & Lugosi, G. (2005). Theory of classification: a survey of some recent advances. ESAIM Probability and Statistics, 9, 323–375.
Article MATH MathSciNet Google Scholar
Braun, M. L. (2006). Accurate error bounds for the eigenvalues of the kernel matrix. Journal of Machine Learning Research, 7, 2303–2328.
MathSciNet Google Scholar
Campbell, C., Cristianini, N., & Smola, A. (2000). Query learning with large margin classifiers. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 111–118). San Mateo: Morgan Kaufmann.
Google Scholar
Castro, R., & Nowak, R. D. (2008). Minimax bounds for active learning. IEEE Transactions on Information Theory, 54(5), 2339–2353.
Article MathSciNet Google Scholar
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2005). A second-order Perceptron algorithm. SIAM Journal on Computing, 43(3), 640–668.
Article MathSciNet Google Scholar
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006a). Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
MathSciNet Google Scholar
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006b). Worst-case analysis of selective sampling for linear-threshold algorithms. Journal of Machine Learning Research, 7, 1205–1230.
MathSciNet Google Scholar
Cohn, R., Atlas, L., & Ladner, R. (1990). Training connectionist networks with queries and selective sampling. In Advances in neural information processing systems (NIPS), 1989. New York: MIT Press.
Google Scholar
Cohn, R., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15(2), 201–221.
Google Scholar
Dasgupta, S., Kalai, A. T., & Monteleoni, C. (2005). Analysis of Perceptron-based active learning. In Proceedings of the 18th conference on learning theory (COLT 2005) (pp. 249–263). Berlin: Springer.
Google Scholar
Dasgupta, S., Hsu, D., & Monteleoni, C. (2008). A general agnostic active learning algorithm. In Advances in neural information processing systems (NIPS) (Vol. 21, pp. 353–360). New York: MIT Press.
Google Scholar
Freund, Y., Seung, S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2/3), 133–168.
Article MATH Google Scholar
Gilad-Bachrach, R., Navot, A., & Tishby, N. (2005). Query by committee made real. In Advances in neural information processing systems (NIPS) (Vol. 19). New York: MIT Press.
Google Scholar
Hanneke, S. (2007). A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on machine learning (ICML) (pp. 353–360). Omnipress.
Hanneke, S. (2009). Adaptive rates of convergence in active learning. In Proceedings of the 22nd conference on learning theory (COLT 2009). Omnipress.
Helmbold, D., Littlestone, N., & Long, P. (2000). Apple tasting. Information and Computation, 161(2), 85–139.
Article MATH MathSciNet Google Scholar
Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods: support vector learning. New York: MIT Press.
Google Scholar
Kääriäinen, M. (2006). Active learning in the non-realizable case. In Proceedings of the 17th international conference on algorithmic learning theory (ALT 2006) (pp. 63–77). Berlin: Springer.
Google Scholar
Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (pp. 3–12). Berlin: Springer.
Google Scholar
Monteleoni, C., & Kääriäinen, M. (2007). Practical online active learning for classification. In IEEE computer society conference on computer vision and pattern recognition (pp. 249–263). New York: IEEE Computer Society.
Google Scholar
Muslea, I., Minton, S., & Knoblock, C. A. (2000). Selective sampling with redundant views. In Proceedings of the national conference on artificial intelligence (AAAI 2000) (pp. 621–626). New York: MIT Press.
Google Scholar
NIST (2004). trec.nist.gov/data/reuters/reuters.html.
Sculley, D. (2008). Advances in online learning-based spam filtering. PhD Thesis in Computer Science, Tufts University. August.
Shawe-Taylor, J., Williams, C. K. I., Cristianini, N., & Kandola, J. (2005). On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Transactions on Information Theory, 51(7), 2510–2522.
Article MathSciNet Google Scholar
Steinwart, I., & Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Annals of Statistics, 35, 575–560.
Article MATH MathSciNet Google Scholar
Strehl, A. L., & Littman, M. L. (2008). Online linear regression and its application to model-based reinforcement learning. In Advances in neural information processing systems (NIPS) (Vol. 21, pp. 1417–1424). New York: MIT Press.
Google Scholar
Tong, S., & Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 999–1006). San Mateo: Morgan Kaufmann.
Google Scholar
Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1), 135–166.
Article MATH MathSciNet Google Scholar
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Article MATH Google Scholar
Ying, Y., & Zhou, D. X. (2006). Online regularized classification algorithms. IEEE Transactions on Information Theory, 52, 4775–4788.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

DSI, Università degli Studi di Milano, Milano, Italy
Giovanni Cavallanti & Nicolò Cesa-Bianchi
DICOM, Università dell’Insubria, Varese, Italy
Claudio Gentile

Authors

Giovanni Cavallanti
View author publications
You can also search for this author in PubMed Google Scholar
Nicolò Cesa-Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Gentile
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Cavallanti.

Additional information

Editor: Avrim Blum.

Preliminary versions of this paper appeared in the proceedings of NIPS 2002 (Margin-based algorithms for information filtering), COLT 2003 (Learning probabilistic linear-threshold classifiers via selective sampling), and NIPS 2008 (Linear classification and selective sampling under low noise conditions).

The authors gratefully acknowledge partial support by the PASCAL2 Network of Excellence under EC grant no. 216886. This publication only reflects the authors’ views.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavallanti, G., Cesa-Bianchi, N. & Gentile, C. Learning noisy linear classifiers via adaptive and selective sampling. Mach Learn 83, 71–102 (2011). https://doi.org/10.1007/s10994-010-5191-x

Download citation

Received: 13 January 2009
Revised: 13 April 2010
Accepted: 26 April 2010
Published: 20 May 2010
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10994-010-5191-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning noisy linear classifiers via adaptive and selective sampling

Abstract

Article PDF

Similar content being viewed by others

Learning from binary labels with instance-dependent noise

Sparse classification: a scalable discrete optimization perspective

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning noisy linear classifiers via adaptive and selective sampling

Abstract

Article PDF

Similar content being viewed by others

Learning from binary labels with instance-dependent noise

Sparse classification: a scalable discrete optimization perspective

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation