Abstract
We introduce efficient margin-based algorithms for selective sampling and filtering in binary classification tasks. Experiments on real-world textual data reveal that our algorithms perform significantly better than popular and similarly efficient competitors. Using the so-called Mammen-Tsybakov low noise condition to parametrize the instance distribution, and assuming linear label noise, we show bounds on the convergence rate to the Bayes risk of a weaker adaptive variant of our selective sampler. Our analysis reveals that, excluding logarithmic factors, the average risk of this adaptive sampler converges to the Bayes risk at rate N −(1+α)(2+α)/2(3+α) where N denotes the number of queried labels, and α>0 is the exponent in the low noise condition. For all \(\alpha>\sqrt{3}-1\approx0.73\) this convergence rate is asymptotically faster than the rate N −(1+α)/(2+α) achieved by the fully supervised version of the base selective sampler, which queries all labels. Moreover, for α→∞ (hard margin condition) the gap between the semi- and fully-supervised rates becomes exponential.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Angluin, D. (2004). Queries revisited. Theoretical Computer Science, 313(2), 175–194.
Azoury, K. S., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3), 211–246.
Balcan, M. F., Beygelzimer, A., & Langford, J. (2006). Agnostic active learning. In Proceedings of the 23rd international conference on machine learning (ICML) (pp. 65–72). Omnipress.
Balcan, M. F., Broder, A., & Zhang, T. (2007). Margin-based active learning. In Proceedings of the 20th annual conference on learning theory (COLT) (pp. 35–50). Berlin: Springer.
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473), 138–156.
Blanchard, G., Bousquet, O., & Zwald, L. (2007). Statistical properties of kernel principal component analysis. Machine Learning, 66(2–3), 259–294.
Boucheron, S., Bousquet, O., & Lugosi, G. (2005). Theory of classification: a survey of some recent advances. ESAIM Probability and Statistics, 9, 323–375.
Braun, M. L. (2006). Accurate error bounds for the eigenvalues of the kernel matrix. Journal of Machine Learning Research, 7, 2303–2328.
Campbell, C., Cristianini, N., & Smola, A. (2000). Query learning with large margin classifiers. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 111–118). San Mateo: Morgan Kaufmann.
Castro, R., & Nowak, R. D. (2008). Minimax bounds for active learning. IEEE Transactions on Information Theory, 54(5), 2339–2353.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2005). A second-order Perceptron algorithm. SIAM Journal on Computing, 43(3), 640–668.
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006a). Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006b). Worst-case analysis of selective sampling for linear-threshold algorithms. Journal of Machine Learning Research, 7, 1205–1230.
Cohn, R., Atlas, L., & Ladner, R. (1990). Training connectionist networks with queries and selective sampling. In Advances in neural information processing systems (NIPS), 1989. New York: MIT Press.
Cohn, R., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15(2), 201–221.
Dasgupta, S., Kalai, A. T., & Monteleoni, C. (2005). Analysis of Perceptron-based active learning. In Proceedings of the 18th conference on learning theory (COLT 2005) (pp. 249–263). Berlin: Springer.
Dasgupta, S., Hsu, D., & Monteleoni, C. (2008). A general agnostic active learning algorithm. In Advances in neural information processing systems (NIPS) (Vol. 21, pp. 353–360). New York: MIT Press.
Freund, Y., Seung, S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2/3), 133–168.
Gilad-Bachrach, R., Navot, A., & Tishby, N. (2005). Query by committee made real. In Advances in neural information processing systems (NIPS) (Vol. 19). New York: MIT Press.
Hanneke, S. (2007). A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on machine learning (ICML) (pp. 353–360). Omnipress.
Hanneke, S. (2009). Adaptive rates of convergence in active learning. In Proceedings of the 22nd conference on learning theory (COLT 2009). Omnipress.
Helmbold, D., Littlestone, N., & Long, P. (2000). Apple tasting. Information and Computation, 161(2), 85–139.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods: support vector learning. New York: MIT Press.
Kääriäinen, M. (2006). Active learning in the non-realizable case. In Proceedings of the 17th international conference on algorithmic learning theory (ALT 2006) (pp. 63–77). Berlin: Springer.
Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (pp. 3–12). Berlin: Springer.
Monteleoni, C., & Kääriäinen, M. (2007). Practical online active learning for classification. In IEEE computer society conference on computer vision and pattern recognition (pp. 249–263). New York: IEEE Computer Society.
Muslea, I., Minton, S., & Knoblock, C. A. (2000). Selective sampling with redundant views. In Proceedings of the national conference on artificial intelligence (AAAI 2000) (pp. 621–626). New York: MIT Press.
NIST (2004). trec.nist.gov/data/reuters/reuters.html.
Sculley, D. (2008). Advances in online learning-based spam filtering. PhD Thesis in Computer Science, Tufts University. August.
Shawe-Taylor, J., Williams, C. K. I., Cristianini, N., & Kandola, J. (2005). On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Transactions on Information Theory, 51(7), 2510–2522.
Steinwart, I., & Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Annals of Statistics, 35, 575–560.
Strehl, A. L., & Littman, M. L. (2008). Online linear regression and its application to model-based reinforcement learning. In Advances in neural information processing systems (NIPS) (Vol. 21, pp. 1417–1424). New York: MIT Press.
Tong, S., & Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 999–1006). San Mateo: Morgan Kaufmann.
Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1), 135–166.
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Ying, Y., & Zhou, D. X. (2006). Online regularized classification algorithms. IEEE Transactions on Information Theory, 52, 4775–4788.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Avrim Blum.
Preliminary versions of this paper appeared in the proceedings of NIPS 2002 (Margin-based algorithms for information filtering), COLT 2003 (Learning probabilistic linear-threshold classifiers via selective sampling), and NIPS 2008 (Linear classification and selective sampling under low noise conditions).
The authors gratefully acknowledge partial support by the PASCAL2 Network of Excellence under EC grant no. 216886. This publication only reflects the authors’ views.
Rights and permissions
About this article
Cite this article
Cavallanti, G., Cesa-Bianchi, N. & Gentile, C. Learning noisy linear classifiers via adaptive and selective sampling. Mach Learn 83, 71–102 (2011). https://doi.org/10.1007/s10994-010-5191-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5191-x