Abstract
The analysis of cancer gene expression is intrinsically a semi- supervised problem, as one is interested in building a classifier for diagnosis, but also on finding new sub-classes of cancer. We propose here a method for Mixture Discriminant Analysis (MDA), which can simultaneously detect sub-classes of cancer and perform classification. We evaluate the method on 10 gene expression data sets. MDA not only improved the classification in some of these data sets, as it detected some known and putative sub-classes of cancer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30(1), 41–47 (2002)
Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)
Brunet, J.-P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 101(12), 4164–4169 (2004)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Costa, I.G., Schonhuth, A., Hafemeister, C., Schliep, A.: Constrained mixture estimation for analysis and robust classification of clinical time series. Bioinformatics 25(12), 6–14 (2009)
de Souto, M.C.P., Costa, I.G., de Araujo, D.S.A., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9, 497 (2008)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Hastie, T., Tibshirani, R.: Discriminant analysis by gaussian mixtures. Journal of the Royal Statistical Society, Series B 58, 155–176 (1996)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, inference and prediction. Springer, Heidelberg (2001)
Lange, T., Law, M.H., Jain, A.K., Buhmann, J.M.: Learning with constrained and unlabelled data. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 731–738 (2005)
Lu, Z., Leen, T.: Semi-supervised learning with penalized probabilistic clustering. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 849–856. MIT Press, Cambridge (2005)
MacLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, Chichester (2000)
Monti, S., Tamayo, P., Mesirov, J.P., Golub, T.R.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)
Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., von Deimling, A., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
Reimand, J., Kull, M., Peterson, H., Hansen, J., Vilo, J.: g:profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 35(Web Server issue), W193–W200 (2007)
Spang, R.: Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1(2), 64–68 (2003)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 99(10), 6567–6572 (2002)
van’t Veer, L.J., Bernards, R.: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature 452(7187), 564–570 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ribeiro, C., de Assis T. de Carvalho, F., Costa, I.G. (2010). Semi-supervised Approach for Finding Cancer Sub-classes on Gene Expression Data. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2010. Lecture Notes in Computer Science(), vol 6268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15060-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-15060-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15059-3
Online ISBN: 978-3-642-15060-9
eBook Packages: Computer ScienceComputer Science (R0)