Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

Agrawal, Ankit; Tripathi, Sarsij; Vardhan, Manu

doi:10.1007/s00607-021-01000-1

Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

Special Issue Article
Published: 28 August 2021

Volume 105, pages 979–997, (2023)
Cite this article

Computing Aims and scope Submit manuscript

350 Accesses
3 Citations
Explore all metrics

Abstract

In the present era, there is a large amount of new data available readily from different sources to collect and store. One of the main problems is to label these new data for various machine learning applications correctly. The active learning approach presents a unique case of machine learning which is widely used to solve the above problem by significantly minimizing the need for labeled data. It aims to select the most appropriate samples from the unlabeled data to be correctly labeled by the oracle and is passed to train the active learner incrementally. There are several different query sampling strategies that exist using which the appropriate samples are selected. One of the main problems with the active learning approach is that it is very time-consuming. So in this research work, a new multi-core-based algorithm is proposed to speed up the active learning approach, which can utilize the complete computational resources present in the system. The experiments have been performed for the problem of named entity recognition which deals with labeling the sequences of words in an unstructured text by classifying them into pre-existing categories. The proposed algorithm is evaluated in terms of both: the performance and execution time over three named entity recognition corpus of distinct biomedical domains. The evaluation results shows considerable improvement in terms of execution time for the proposed active learning algorithm than the existing active learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active learning approach using a modified least confidence sampling strategy for named entity recognition

Article 19 January 2021

Unsupervised Bootstrapping of Active Learning for Entity Resolution

ERABQS: entity resolution based on active machine learning and balancing query strategy

Article 26 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

He Z, Li L, Zheng S, Zou X, Situ H (2019) Quantum speedup for pool-based active learning. Quantum Inf Process 18:345. https://doi.org/10.1007/s11128-019-2460-x
Article MathSciNet MATH Google Scholar
Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6:1–114. https://doi.org/10.2200/S00429ED1V01Y201207AIM018
Article MathSciNet MATH Google Scholar
Kumar P, Gupta A (2020) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35:913–945. https://doi.org/10.1007/s11390-020-9487-4
Article Google Scholar
Agrawal A, Tripathi S (2020) Active learning using margin sampling strategy for entity recognition. In: Gunjan VK, Senatore S, Kumar A, Gao X-Z, Merugu S (eds) Advances in cybernetics, cognition, and machine learning for communication technologies. Springer, Singapore, pp 163–169
Chapter Google Scholar
Agrawal A, Tripathi S, Vardhan M (2021) Active learning approach using a modified least confidence sampling strategy for named entity recognition. Prog Artif Intell. https://doi.org/10.1007/s13748-021-00230-w
Article Google Scholar
Agrawal A, Tripathi S, Vardhan M (2021) Uncertainty query sampling strategies for active learning of named entity recognition task. Intell Decision Technol 15:99–114. https://doi.org/10.3233/IDT-200048
Article Google Scholar
Alokaili A, Menai MEB (2020) SVM ensembles for named entity disambiguation. Computing 102:1051–1076. https://doi.org/10.1007/s00607-019-00748-x
Article MathSciNet Google Scholar
Zhao Y, Zhang H, Zhou S, Zhang Z (2020) Active learning approaches to enhancing neural machine translation. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 1796–1806
Xia Y (2020) Research on statistical machine translation model based on deep neural network. Computing 102:643–661. https://doi.org/10.1007/s00607-019-00752-1
Article MathSciNet MATH Google Scholar
Jiang Z, Gao S, Chen L (2020) Study on text representation method based on deep learning and topic information. Computing 102:623–642. https://doi.org/10.1007/s00607-019-00755-y
Article MathSciNet MATH Google Scholar
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning based text classification: a comprehensive review. ACM Comput Surv. https://doi.org/10.1145/3439726
Article Google Scholar
Shen Y, Yun H, Lipton ZC, Kronrod Y, Anandkumar A (2017) Deep active learning for named entity recognition. CoRR abs/1707.0
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1070–1079
Ekbal A, Saha S, Sikdar UK (2016) On active annotation for named entity recognition. Int J Mach Learn Cybern 7:623–640. https://doi.org/10.1007/s13042-014-0275-8
Article Google Scholar
Liu M, Tu Z, Wang Z, Xu X (2020) LTP: a new active learning strategy for bert-CRF based named entity recognition. http://arxiv.org/abs/1707.05928
Huang H, Wang H, Jin D (2018) A low-cost named entity recognition research based on active learning. Sci Program 2018:10. https://doi.org/10.1155/2018/1890683
Article Google Scholar
Tran VC, Hoang DT, Nguyen NT, Hwang D (2017) A hybrid method for named entity recognition on tweet streams. In: Nguyen NT, Tojo S, Nguyen LM, Trawiński B (eds) Intelligent information and database systems. Springer, Cham, pp 258–268
Chapter Google Scholar
Tsymbalov E, Makarychev S, Shapeev A, Panov M (2019) Deeper connections between neural networks and Gaussian processes speed up active learning. CoRR. abs/1902.1
Sang KTEF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL. Association for Computational Linguistics, pp 142–147
Doğan RI, Leaman R, Lu Z (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform 47:1–10. https://doi.org/10.1016/j.jbi.2013.12.006
Article Google Scholar
Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z (2016) BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. https://doi.org/10.1093/database/baw068
Article Google Scholar
Crichton G, Pyysalo S, Chiu B, Korhonen A (2017) A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform 18:368. https://doi.org/10.1186/s12859-017-1776-8
Article Google Scholar
Cancer Genetics (CG) task: BioNLP-ST 2013. http://2013.bionlp-st.org/tasks/cancer-genetics
Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, Stroudsburg, pp 104–107
Chen Y, Lasko TA, Mei Q, Denny JC, Xu H (2015) A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 58:11–18. https://doi.org/10.1016/j.jbi.2015.09.010
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Ankit Agrawal & Manu Vardhan
Motilal Nehru National Institute of Technology Allahabad, Prayagraj, Uttar Pradesh, India
Sarsij Tripathi

Authors

Ankit Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Sarsij Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Manu Vardhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankit Agrawal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrawal, A., Tripathi, S. & Vardhan, M. Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition. Computing 105, 979–997 (2023). https://doi.org/10.1007/s00607-021-01000-1

Download citation

Received: 12 May 2021
Accepted: 04 August 2021
Published: 28 August 2021
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00607-021-01000-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active learning approach using a modified least confidence sampling strategy for named entity recognition

Unsupervised Bootstrapping of Active Learning for Entity Resolution

ERABQS: entity resolution based on active machine learning and balancing query strategy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active learning approach using a modified least confidence sampling strategy for named entity recognition

Unsupervised Bootstrapping of Active Learning for Entity Resolution

ERABQS: entity resolution based on active machine learning and balancing query strategy

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation