Abstract
The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80% for automated term extraction and F1 of 70% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.
J. Delaunay and T. H. Tran—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Corpus and code are available at https://github.com/jdelaunay/coastal_area_term_extraction.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Abdelmageed, N., et al.: BiodivNERE: gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodiversity Data J. 10 (2022)
Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Local-global vectors to improve unigram terminology extraction. Computerm 2016, 2 (2016)
Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Distributed specificity for automatic terminology extraction. Terminology. Int. J. Theoret. Appl. Issues Special. Commun. 24(1), 23–40 (2018)
Andersen, G.: Utilising heterogeneous language resources for term extraction in maritime domains. Terminology 28(1), 1–36 (2022)
Andrius, U.: Automatic extraction of lithuanian cybersecurity terms using deep learning approaches. In: Human Language Technologies–The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020. vol. 328, p. 39. IOS Press (2020)
Arp, R., Smith, B., Spear, A.D.: Building Ontologies with Basic Formal Ontology. MIT Press (2015)
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E., Consortium, E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4, 1–9 (2013)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Etienne, M., Du Toit, D.R., Pollard, S.: ARDI: a co-construction method for participatory modeling in natural resources management. Ecol. Soc. 16(1) (2011)
Faber, P., León-Araúz, P., Reimerink, A.: Representing environmental knowledge in EcoLexicon. In: Bárcena, E., Read, T., Arús, J. (eds.) Languages for Specific Purposes in the Digital Era. EL, vol. 19, pp. 267–301. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02222-2_13
Faber, P., León-Araúz, P., Reimerink, A.: EcoLexicon: new features and challenges. GLOBALEX 73–80 (2016)
Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C-value/NC-value method of automatic recognition for multi-word terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49653-X_35
Gkoutos, G.V., Schofield, P.N., Hoehndorf, R.: The anatomy of phenotype ontologies: principles, properties and applications. Briefings Bioinform. 19(5), 1008–1021 (04 2017). https://doi.org/10.1093/bib/bbx035
Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Termeval 2020: Taln-ls2n system for automatic term extraction. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020) (2020)
Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Cross-lingual and cross-domain transfer learning for automatic term extraction from low resource data. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 648–662 (2022)
Kessler, R., Béchet, N., Berio, G.: Extraction of terminology in the field of construction. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 22–26. IEEE (2019)
Klie, J.C., Bugert, M., Boullosa, B., Eckart de Castilho, R., Gurevych, I.: The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5–9. Santa Fe, New Mexico (2018). https://www.aclweb.org/anthology/C18-2002
Kockaert, H.J., Steurs, F.: Handbook of terminology, vol. 1. John Benjamins Publishing Company (2015)
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011)
Kucza, M., Niehues, J., Zenkel, T., Waibel, A., Stüker, S.: Term extraction via neural sequence labeling a comparative evaluation of strategies using recurrent neural networks. In: Interspeech, pp. 2072–2076 (2018)
Lang, C., Wachowiak, L., Heinisch, B., Gromann, D.: Transforming term extraction: transformer-based approaches to multilingual term extraction across domains. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3607–3620 (2021)
Le Guillarme, N., Thuiller, W.: TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature. Methods Ecol. Evol. 13(3), 625–641 (2022)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mouratidis, D., et al.: Domain-specific term extraction: a case study on Greek maritime legal texts. In: Proceedings of the 12th Hellenic Conference on Artificial Intelligence. SETN ’22, New York, NY, USA. Association for Computing Machinery (2022). https://doi.org/10.1145/3549737.3549751
Nguyen, N.T., Gabud, R.S., Ananiadou, S.: Copious: a gold standard corpus of named entities towards extracting species occurrence from biodiversity literature. Biodiversity Data J. (7) (2019)
Pafilis, E., et al.: The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS ONE 8(6), e65390 (2013)
Pollak, S., Repar, A., Martinc, M., Podpečan, V.: Karst exploration: extracting terms and definitions from karst domain corpus. In: Proceedings of eLex 2019, pp. 934–956 (2019)
QasemiZadeh, B., Schumann, A.K.: The acl rd-tec 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 1862–1868 (2016)
Rigouts Terryn, A., Hoste, V., Drouin, P., Lefever, E.: Termeval 2020: shared task on automatic term extraction using the annotated corpora for term extraction research (ACTER) dataset. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020), pp. 85–94. European Language Resources Association (ELRA) (2020)
Rigouts Terryn, A., Hoste, V., Lefever, E.: HAMLET: hybrid adaptable machine learning approach to extract terminology. Terminology 27(2), 254–293 (2021)
Terryn, A.R., Hoste, V., Lefever, E.: Tagging terms in text: a supervised sequential labelling approach to automatic term extraction. Terminol. Int. J. Theoret. Appl. Issues Specialized Commun. 28(1), 157–189 (2022)
Tran, H.T.H., Martinc, M., Caporusso, J., Doucet, A., Pollak, S.: The recent advances in automatic term extraction: a survey. arXiv preprint arXiv:2301.06767 (2023)
Tran, H.T.H., Martinc, M., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer? In: Pascal, P., Ienco, D. (eds.) DS 2022. LNCS, vol. 13601, pp. 363–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18840-4_26
Tran, H.T.H., Martinc, M., Pelicon, A., Doucet, A., Pollak, S.: Ensembling transformers for cross-domain automatic term extraction. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds.) ICADL 2022. LNCS, vol. 13636, pp. 90–100. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21756-2_7
Tran, H.T.H., Martinc, M., Repar, A., Ljubešić, N., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling? Mach. Learn. 1–30 (2024)
Tran, T.H.H., Martinc, M., Repar, A., Doucet, A., Pollak, S.: A transformer-based sequence-labeling approach to the Slovenian cross-domain automatic term extraction (2022)
Veena, G., Kanjirangat, V., Gupta, D.: AGRONER: an unsupervised agriculture named entity recognition using weighted distributional semantic model. Expert Syst. Appl. 229, 120440 (2023)
Vintar, Š, Martinc, M.: Framing karstology: from definitions to knowledge structures and automatic frame population. Terminology 28(1), 129–156 (2022)
Wang, J., Feng, C., Liu, F., Li, X., Wang, X.: Extract then adjust: a two-stage approach for automatic term extraction. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds.) NLPCC 2023. LNCS, vol. 14303, pp. 236–247. Springer (2023). https://doi.org/10.1007/978-3-031-44696-2_19
Zhang, Z., Gao, J., Ciravegna, F.: Semre-rank: improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Trans. Knowl. Disc. Data (TKDD) 12(5), 1–41 (2018)
Zhao, X., Lian, X., Liu, P., Gao, C.: Research on knowledge extraction of maritime operational decision-making sentences. In: C2 2022. LNEE, vol. 949, pp. 836–845. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-6052-9_75
Acknowledgments
The work was supported by the TERMITRAD (2020-2019-8510010) project funded by the Nouvelle-Aquitaine Region, France, by the Slovenian Research and Innovation Agency core research program Knowledge Technologies (P2-0103) and the project Cross-lingual and Cross-domain Methods for Terminology Extraction and Alignment, a bilateral project funded by the program PROTEUS under the grant number BI-FR/23-24-PROTEUS006. We express our gratitude to Kenza HERMAN and Geraldine DUBOS for their invaluable assistance in the annotation process.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Delaunay, J. et al. (2024). CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15048. Springer, Cham. https://doi.org/10.1007/978-3-031-70563-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-70563-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70562-5
Online ISBN: 978-3-031-70563-2
eBook Packages: Computer ScienceComputer Science (R0)