Optimizing feature selection to improve medical diagnosis

Fan, Ya-Ju; Chaovalitwongse, Wanpracha Art

doi:10.1007/s10479-008-0506-z

Optimizing feature selection to improve medical diagnosis

Published: 06 January 2009

Volume 174, pages 169–183, (2010)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Ya-Ju Fan¹ &
Wanpracha Art Chaovalitwongse¹

439 Accesses
Explore all metrics

Abstract

In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Asuncion, A., & Newman, D. (2007). UCI machine learning repository.
Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proc. 15th international conf. on machine learning (pp. 82–90). San Francisco: Kaufmann.
Google Scholar
Chaovalitwongse, W., Fan, Y., & Sachdeo, R. (2007). On the time series k-nearest neighbor for abnormal brain activity classification. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 37(6), 1005–1016.
Article Google Scholar
Chaovalitwongse, W., Pardalos, P., & Prokoyev, O. (2006). Electroencephalogram (EEG) time series classification: applications in epilepsy. Annals of Operations Research, 148, 227–250.
Article Google Scholar
Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2008, to appear). Novel optimization models for abnormal brain activity classification. Operations Research.
Fung, G., & Mangasarian, O. L. (2002). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 28, 185–202.
Article Google Scholar
Fung, G., & Mangasarian, O. L. (2003). Finite Newton method for Lagrangian support vector machine classification. Neurocomputing, 1–2, 39–55.
Article Google Scholar
Glen, J. J. (2003). An iterative mixed integer programming method for classification accuracy maximizing discriminant analysis. Computers and Operations Research, 30, 181–198.
Article Google Scholar
Glover, F. (1990). Improved linear programming models for discriminant analysis. Decision Sciences, 21(4), 771–785.
Article Google Scholar
Glover, F. (1993). Improved linear and integer programming models for discriminant analysis. Creative and innovative approaches to the science of management (pp. 187–215). USA: RGK Foundation Press.
Google Scholar
Hammer, P. L., & Bonates, T. O. (2006). Logical analysis of data—An overview: From combinatorial optimization to medical applications. Annals of Operations Research, 148, 203–225.
Article Google Scholar
Zhang, J., & Mani, I. (2003). knn approach to unbalaned data distributions: A case study involving information extration. Workshop on Learning from Imbalanced Datasets II, ICML.
Kim, S. B., Chen, V. C. P., Park, Y., Ziegler, T. R., & Jones, D. P. (2008). Controlling the false discovery rate for feature selection in high-resolution nmr spectra. Statistical Analysis and Data Mining, 1(2), 57–66.
Article Google Scholar
Mangasarian, O. (1965). Linear and nonlinear separation of pattern by linear programming. Operations Research, 31, 445–453.
Google Scholar
Mangasarian, O. (1968). Multisurface method for pattern separation. IEEE Transactions on Information Theory, IT-14, 801–807.
Article Google Scholar
Pardalos, P. M., Boginski, V. L., & Vazacopoulos, A. (2007). Data mining in biomedicine. New York: Springer.
Book Google Scholar
Saastamoinen, K., & Ketola, J. (2006). Medical data classification using logical similarity based measures. In Proceedings of 2006 IEEE conference on cybernetics and intelligent systems (pp. 1–5).
Tan, S. (2005). Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28(4), 667–671.
Article Google Scholar
Tsirogiannis, G., Frossyniotis, D., Stoitsis, J., Golemati, S., Stafylopatis, A., & Nikita, K. (2004). Classification of medical data with a robust multi-level combination scheme. In Proceedings of 2004 IEEE international joint conference on neural networks (Vol. 3, pp. 25–29).
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Google Scholar
Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., & Klein, B. (2004). Variable selection and model building via likelihood basis pursuit. Journal of the American Statistical Association, 99(467), 659–672.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Rutgers University, Piscataway, NJ, 08854, USA
Ya-Ju Fan & Wanpracha Art Chaovalitwongse

Authors

Ya-Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wanpracha Art Chaovalitwongse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanpracha Art Chaovalitwongse.

Additional information

This work is supported by the National Science Foundation under CAREER Grant No. 0546574.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, YJ., Chaovalitwongse, W.A. Optimizing feature selection to improve medical diagnosis. Ann Oper Res 174, 169–183 (2010). https://doi.org/10.1007/s10479-008-0506-z

Download citation

Published: 06 January 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s10479-008-0506-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing feature selection to improve medical diagnosis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification of Medical Datasets Using Optimal Feature Selection Method with Multi-support Vector Machine

Processing Bio-medical Data with Class-Dependent Feature Selection

Feature Selection for Medical Diagnosis Using Machine Learning: A Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimizing feature selection to improve medical diagnosis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification of Medical Datasets Using Optimal Feature Selection Method with Multi-support Vector Machine

Processing Bio-medical Data with Class-Dependent Feature Selection

Feature Selection for Medical Diagnosis Using Machine Learning: A Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation