Abstract
In this paper, we propose a multiobjective differential evolution (MODE)-based feature selection and ensemble learning approaches for entity extraction in biomedical texts. The first step of the algorithm concerns with the problem of automatic feature selection in a machine learning framework, namely conditional random field. The final Pareto optimal front which is obtained as an output of the feature selection module contains a set of solutions, each of which represents a particular feature representation. In the second step of our algorithm, we combine a subset of these classifiers using a MODE-based ensemble technique. Our experiments on three benchmark datasets namely GENIA, GENETAG and AIMed show the F-measure values of 76.75, 94.15 and 91.91 %, respectively. Comparisons with the existing systems show that our proposed algorithm achieves the performance levels which are at par with the state of the art. These results also exhibit that our method is general in nature and because of this it performs well across the several domain of datasets. The key contribution of this work is the development of MODE-based generalized feature selection and ensemble learning techniques with the aim of extracting entities from the biomedical texts of several domains.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
A part of each training set is used as the development set.
References
Ali M, Pant M, Abraham A (2009) Simplex differential evolution. Acta Polytechnica Hungarica 6(5):95–115
Anderson TW, Scolve S (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston
Ando RK (2007) Biocreative ii gene mention tagging system at ibm watson. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 101–103
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evolut Comput 12(3):269–283
Brest J, Mauec MS (2011) Self-adaptive differential evolution algorithm using population size reduction and three strategies. Soft Comput 15(11):2157–2174
Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67:708–713
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, England
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):181–197
Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, MCS’00. Springer, London, pp 1–15
Ekbal A, Saha S (2012) Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. IJDAR 15(2):143–166
Ekbal A, Saha S (2010a) Classifier ensemble selection using genetic algorithm for named entity recognition. Res Lang Comput 8(1):73–99
Ekbal A, Saha S (2010b) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: 15th International conference on applications of natural language to information systems (NLDB 2010), pp 256–267
Ekbal A, Saha S (2010c) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: Proceedings of the natural language processing and information systems, and 15th international conference on applications of natural language to information systems, NLDB’10, pp 256–267
Ekbal A, Saha S (2011a) A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in indian languages as case studies. Expert Syst Appl 38(12):14760–14772
Ekbal A, Saha S (2011b) Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach. ACM Trans Asian Lang Inf Process 10(2):1–37
El-Hefnawy NA (2014) Solving bi-level problems using modified particle swarm optimization algorithm. Int J Artif Intell 12(2):88–101
Finkel J, Dingare S, Nguyen H, Nissim M, Sinclair G, Manning C (2004) Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004), pp 88–91
Gmperle R, Mller SD, Koumoutsakos P (2002) A parameter study for differential evolution. In: WSEAS international conference on advances in intelligent systems, fuzzy systems, evolutionary computation, pp 293–298
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York
GuoDong Z, Jian S (2004) Exploring deep knowledge resources in biomedical name recognition. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp 96–99
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Heidl W, Thumfart S, Lughofer E, Eitzinger C, Klement EP (2013) Machine learning based analysis of gender differences in visual inspection decision making. Inf Sci 224:62–76
Huang H, Lin Y, Lin K, Kuo C, Chang Y, Yang B, Chung I, Hsu C (2007) High-recall gene mention recognition by unification of multiple backward parsing models. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 109–111
Jin-Dong K, Tomoko O, Tsuruoka Y et al (2004) Introduction to the bio-entity recognition task at jnlpba. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 70–75
Kim S, Yoon J, Park KM, Rim HC (2005) Two-phase biomedical named entity recognition using a hybrid method. In: IJCNLP, pp 646–657
Kuo C, Chang Y, Huang H, Lin K, Yang B, Lin Y, Hsu C, Chung I (2007) Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 105–107
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289
Li L, Fan W, Huang D, Dang Y, Sun J (2012) Boosting performance of gene mention tagging system by hybrid methods. J Biomed Inform 45(1):156–164
Li L, Sun J, Huang D (2010) Boosting performance of gene mention tagging system by classifiers ensemble. In: Natural language processing and knowledge engineering (NLP-KE)
Oliveira LS, Benahmed N, Sabourin R, Bortolozzi F, Suen CY (2001) Feature subset selection using genetic algorithms for handwritten digit recognition. In: Proceedings of 14th Brazilian symposium on computer graphics and image processing, Florianopolis, Oct 2001, IEEE, pp 362–369
Park KM, Kim SH, Rim HC, Hwang YS (2004) Me-based biomedical named entity recognition using lexical knowledge. ACM Trans Asian Lang Inf Process 5:4–21
Ponomareva N, Pla F, Molina A, Rosso P (2007) Biomedical named entity recognition: a poor knowledge hmm-based approach. In: NLDB, pp 382–387
Preitl S, Precup RE (2006) Iterative feedback tuning in fuzzy control systems. Theory and applications. Acta Polytech Hung 3(3):81–96
Saha SK, Sarkar S, Mitra P (2009) Feature selection techniques for maximum entropy based biomedical named entity recognition. J Biomed Inform 42(5):905–911
Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 104–107
Sikdar UK, Ekbal A, Saha S (2012) Differential evolution based feature selection and classifier ensemble for named entity recognition. In: COLING, pp 2475–2490
Smith L, Tanabe L, Ando R, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai HJ, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli A, Lopez MM, Mata J, Wilbur WJ (2008) Overview of biocreative II gene mention recognition. Genome Biol 9(Suppl 2)
Song Y, Kim E, Lee GG, Yi B(2004) Posbiotm-ner in the shared task of bionlp/nlpba 2004. In. In Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004)
Storn R, Price K (1997) Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Victor O, Tiwari A, Roy R (2005) Evolutionary computing in manufacturing industry: an overview of recent applications. Appl Soft Comput 5(3):181–299
Wang H, Zhao T, Tan H, Zhang S (2008) Biomedical named entity recognition based on classifiers ensemble. Int J Comput Sci Appl 5:1–11
Yang J, Honavar VG (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by E. Lughofer.
Rights and permissions
About this article
Cite this article
Sikdar, U.K., Ekbal, A. & Saha, S. MODE: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19, 3529–3549 (2015). https://doi.org/10.1007/s00500-014-1565-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1565-5