Abstract
A common way of solving the multiclass categorization problem is to reformulate the problem into a set of binary classification problems. Discriminative binary classifiers like, e.g., Support Vector Machines (SVMs), directly optimize the decision boundary with respect to a certain cost function. In a pragmatic and computationally simple approach, Least Squares SVMs (LS-SVMs) are inferred by minimizing a related regression least squares cost function. The moderated outputs of the binary classifiers are obtained in a second step within the evidence framework. In this paper, Bayes' rule is repeatedly applied to infer the posterior multiclass probabilities, using the moderated outputs of the binary plug-in classifiers and the prior multiclass probabilities. This Bayesian decoding motivates the use of loss function based decoding instead of Hamming decoding. For SVMs and LS-SVMs with linear kernel, experimental evidence suggests the use of one-versus-one coding. With a Radial Basis Function kernel one-versus-one and error correcting output codes yield the best performances, but simpler codings may still yield satisfactory results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allwein, E., Schapire, R. and Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers, Journal of Machine Learning Research, 1 (2000), 113–141.
Baudat, G. and Anouar, F.: Generalized discriminant analysis using a kernel approach, Neural Comput., 12 (2000), 2385–2404.
Bishop,C. M.: Neural Networks for Pattern Recognition, Oxford University Press, 1995.
Cawley, G. C.: MATLAB Support Vector Machine Toolbox (v0.54β), University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K., 2000.
Cristianini, N. and Shawe-Taylor, J.: An Introduction to Support Vector Machines, Cambridge University Press, 2000.
Cortes, C. and Vapnik, V.: Support vector networks. Machine Learning, 20 (1995), 273–297.
Dietterich, T. G. and Bakiri, G.: Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, 2 (1995), 263–286.
Duda, R. O. and Hart, P. E.: Pattern Classification and Scene Analysis, John Wiley, New York, 1973.
Hastie, T., Tibshirani, R. and Buja, A.: Flexible discriminant analysis by optimal scoring, Journal of the American Statistical Association, 98 (1994), 1255–1270.
Kressel, U. H.-G.: Pairwise classification and support vector machines, In: B. Schölkopf, C. J. C. Burges and A. J. Smola (Eds.), Advances in Kernel Methods, Support Vector Learning, MIT Press,1999.
Kwok, J. T.: The evidence framework applied to Support Vector Machines, IEEE Transactions on Neural Networks, 10 (2000), 1018–1031.
MacKay, D. J. C.: Probable networks and plausible predictions-A review of practical Bayesian methods for supervised neural networks, Network: Computation in Neural Systems, 6 (1995), 469–505.
Mika, S., Rätsch, G., Weston, J., Schölkopf, B. and Müller, K.-R.: Fisher Discriminant Analysis with Kernels, In: Y.-H. Hu, J. Larsen, E. Wilson & S. Douglas (Eds.), Proc. Neural Networks for Signal Processing IX, IEEE, pp. 41–48, 1999.
Platt, J. C., Cristianini, N. and Shawe-Taylor, J.: Large margin DAGs for multiclass classification, In: S. A. Solla, T. K. Leen and K.-R. Muller (Eds.), Advances in Neural Information Processing Systems, 12, MIT Press, 2000.
Saunders, C., Gammerman, A. and Vovk, V.: Ridge regression learning algorithm in dual variables, In Proc. 15th Int. Conf. on Machine Learning ICML'98, Morgan Kaufmann, pp. 515–521, 1998.
Sejnowski, T. J. and Rosenberg, C. R.: Parallel networks that learn to pronounce english text, Journal of Complex Systems, 1 (1987), 145–168.
Suykens, J. A. K. and Vandewalle, J.: Least Squares Support Vector Machine Classifiers, Neural Processing Letters, 9 (1999), 293–300.
Suykens, J. A. K. and Vandewalle, J.: Multiclass Least Squares Support Vector Machines, In: Proc. International Joint Conference on Neural Networks (IJCNN'99), Washington DC, 1999.
Utschick, W.: A regularization method for non-trivial codes in polychotomous classification, International Journal of Pattern Recognition and Artificial Intelligence, 12 (1998), 453–474.
Van Gestel, T., Suykens, J. A. K., Lanckriet, G., Lambrechts, A., De Moor, B. and Vandewalle, J.: A Bayesian Framework for Least Squares Support Vector Machine Classifiers, Gaussian Processes and Kernel Fisher Discriminant Analysis. Neural Computation, in press.
Van Gestel, T., Suykens, J. A. K., Baestaens, D.-E., Lambrechts, A., Lanckriet, G., Vandaele, B., De Moor, B. and Vandewalle, J.: Predicting financial time series using Least Squares Support Vector Machines within the evidence framework, IEEE Transactions on Neural Networks, (Special Issue on Financial Engineering), 12 (2001), 809–821.
Vapnik, V.: Statistical Learning Theory, John Wiley, New York, 1998.
Williams, C. K. I.: Prediction with gaussian processes: from linear regression to linear prediction and beyond, In: M.I. Jordan (Ed.), Learning and Inference in Graphical Models, Kluwer Academic Press, 1998.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
van Gestel, T., Suykens, J.A.K., Lanckriet, G. et al. Multiclass LS-SVMs: Moderated Outputs and Coding-Decoding Schemes. Neural Processing Letters 15, 45–58 (2002). https://doi.org/10.1023/A:1013815310229
Issue Date:
DOI: https://doi.org/10.1023/A:1013815310229