Abstract
In the present era, there is a large amount of new data available readily from different sources to collect and store. One of the main problems is to label these new data for various machine learning applications correctly. The active learning approach presents a unique case of machine learning which is widely used to solve the above problem by significantly minimizing the need for labeled data. It aims to select the most appropriate samples from the unlabeled data to be correctly labeled by the oracle and is passed to train the active learner incrementally. There are several different query sampling strategies that exist using which the appropriate samples are selected. One of the main problems with the active learning approach is that it is very time-consuming. So in this research work, a new multi-core-based algorithm is proposed to speed up the active learning approach, which can utilize the complete computational resources present in the system. The experiments have been performed for the problem of named entity recognition which deals with labeling the sequences of words in an unstructured text by classifying them into pre-existing categories. The proposed algorithm is evaluated in terms of both: the performance and execution time over three named entity recognition corpus of distinct biomedical domains. The evaluation results shows considerable improvement in terms of execution time for the proposed active learning algorithm than the existing active learning approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
He Z, Li L, Zheng S, Zou X, Situ H (2019) Quantum speedup for pool-based active learning. Quantum Inf Process 18:345. https://doi.org/10.1007/s11128-019-2460-x
Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6:1–114. https://doi.org/10.2200/S00429ED1V01Y201207AIM018
Kumar P, Gupta A (2020) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35:913–945. https://doi.org/10.1007/s11390-020-9487-4
Agrawal A, Tripathi S (2020) Active learning using margin sampling strategy for entity recognition. In: Gunjan VK, Senatore S, Kumar A, Gao X-Z, Merugu S (eds) Advances in cybernetics, cognition, and machine learning for communication technologies. Springer, Singapore, pp 163–169
Agrawal A, Tripathi S, Vardhan M (2021) Active learning approach using a modified least confidence sampling strategy for named entity recognition. Prog Artif Intell. https://doi.org/10.1007/s13748-021-00230-w
Agrawal A, Tripathi S, Vardhan M (2021) Uncertainty query sampling strategies for active learning of named entity recognition task. Intell Decision Technol 15:99–114. https://doi.org/10.3233/IDT-200048
Alokaili A, Menai MEB (2020) SVM ensembles for named entity disambiguation. Computing 102:1051–1076. https://doi.org/10.1007/s00607-019-00748-x
Zhao Y, Zhang H, Zhou S, Zhang Z (2020) Active learning approaches to enhancing neural machine translation. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 1796–1806
Xia Y (2020) Research on statistical machine translation model based on deep neural network. Computing 102:643–661. https://doi.org/10.1007/s00607-019-00752-1
Jiang Z, Gao S, Chen L (2020) Study on text representation method based on deep learning and topic information. Computing 102:623–642. https://doi.org/10.1007/s00607-019-00755-y
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning based text classification: a comprehensive review. ACM Comput Surv. https://doi.org/10.1145/3439726
Shen Y, Yun H, Lipton ZC, Kronrod Y, Anandkumar A (2017) Deep active learning for named entity recognition. CoRR abs/1707.0
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1070–1079
Ekbal A, Saha S, Sikdar UK (2016) On active annotation for named entity recognition. Int J Mach Learn Cybern 7:623–640. https://doi.org/10.1007/s13042-014-0275-8
Liu M, Tu Z, Wang Z, Xu X (2020) LTP: a new active learning strategy for bert-CRF based named entity recognition. http://arxiv.org/abs/1707.05928
Huang H, Wang H, Jin D (2018) A low-cost named entity recognition research based on active learning. Sci Program 2018:10. https://doi.org/10.1155/2018/1890683
Tran VC, Hoang DT, Nguyen NT, Hwang D (2017) A hybrid method for named entity recognition on tweet streams. In: Nguyen NT, Tojo S, Nguyen LM, Trawiński B (eds) Intelligent information and database systems. Springer, Cham, pp 258–268
Tsymbalov E, Makarychev S, Shapeev A, Panov M (2019) Deeper connections between neural networks and Gaussian processes speed up active learning. CoRR. abs/1902.1
Sang KTEF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL. Association for Computational Linguistics, pp 142–147
Doğan RI, Leaman R, Lu Z (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform 47:1–10. https://doi.org/10.1016/j.jbi.2013.12.006
Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z (2016) BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. https://doi.org/10.1093/database/baw068
Crichton G, Pyysalo S, Chiu B, Korhonen A (2017) A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform 18:368. https://doi.org/10.1186/s12859-017-1776-8
Cancer Genetics (CG) task: BioNLP-ST 2013. http://2013.bionlp-st.org/tasks/cancer-genetics
Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, Stroudsburg, pp 104–107
Chen Y, Lasko TA, Mei Q, Denny JC, Xu H (2015) A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 58:11–18. https://doi.org/10.1016/j.jbi.2015.09.010
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agrawal, A., Tripathi, S. & Vardhan, M. Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition. Computing 105, 979–997 (2023). https://doi.org/10.1007/s00607-021-01000-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-021-01000-1