CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

Delaunay, Julien; Tran, Hanh Thi Hong; González-Gallardo, Carlos-Emiliano; Bordea, Georgeta; Ducos, Mathilde; Sidere, Nicolas; Doucet, Antoine; Pollak, Senja; De Viron, Olivier

doi:10.1007/978-3-031-70563-2_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15048))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

212 Accesses

Abstract

The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80% for automated term extraction and F1 of 70% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.

J. Delaunay and T. H. Tran—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Article Open access 10 July 2021

Cross-Evaluation of Automated Term Extraction Tools by Measuring Terminological Saturation

SsciBERT: a pre-trained language model for social science texts

Article 17 December 2022

Notes

1.
Corpus and code are available at https://github.com/jdelaunay/coastal_area_term_extraction.
2.
https://nlp.stanford.edu/projects/glove/.
3.
https://github.com/flairNLP/flair.
4.
https://termframe.ff.uni-lj.si/.
5.
https://www.elsevier.com/fr-fr/solutions/scopus.
6.
https://agrovoc.fao.org/browse/agrovoc/en/.
7.
https://www.eionet.europa.eu/gemet/en/about/.
8.
https://finto.fi/afo/en/.
9.
https://github.com/frmichel/taxref-ld.

References

Abdelmageed, N., et al.: BiodivNERE: gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodiversity Data J. 10 (2022)
Google Scholar
Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Local-global vectors to improve unigram terminology extraction. Computerm 2016, 2 (2016)
Google Scholar
Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Distributed specificity for automatic terminology extraction. Terminology. Int. J. Theoret. Appl. Issues Special. Commun. 24(1), 23–40 (2018)
Google Scholar
Andersen, G.: Utilising heterogeneous language resources for term extraction in maritime domains. Terminology 28(1), 1–36 (2022)
MathSciNet Google Scholar
Andrius, U.: Automatic extraction of lithuanian cybersecurity terms using deep learning approaches. In: Human Language Technologies–The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020. vol. 328, p. 39. IOS Press (2020)
Google Scholar
Arp, R., Smith, B., Spear, A.D.: Building Ontologies with Basic Formal Ontology. MIT Press (2015)
Google Scholar
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E., Consortium, E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4, 1–9 (2013)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Etienne, M., Du Toit, D.R., Pollard, S.: ARDI: a co-construction method for participatory modeling in natural resources management. Ecol. Soc. 16(1) (2011)
Google Scholar
Faber, P., León-Araúz, P., Reimerink, A.: Representing environmental knowledge in EcoLexicon. In: Bárcena, E., Read, T., Arús, J. (eds.) Languages for Specific Purposes in the Digital Era. EL, vol. 19, pp. 267–301. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02222-2_13
Chapter Google Scholar
Faber, P., León-Araúz, P., Reimerink, A.: EcoLexicon: new features and challenges. GLOBALEX 73–80 (2016)
Google Scholar
Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C-value/NC-value method of automatic recognition for multi-word terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49653-X_35
Chapter Google Scholar
Gkoutos, G.V., Schofield, P.N., Hoehndorf, R.: The anatomy of phenotype ontologies: principles, properties and applications. Briefings Bioinform. 19(5), 1008–1021 (04 2017). https://doi.org/10.1093/bib/bbx035
Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Termeval 2020: Taln-ls2n system for automatic term extraction. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020) (2020)
Google Scholar
Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Cross-lingual and cross-domain transfer learning for automatic term extraction from low resource data. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 648–662 (2022)
Google Scholar
Kessler, R., Béchet, N., Berio, G.: Extraction of terminology in the field of construction. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 22–26. IEEE (2019)
Google Scholar
Klie, J.C., Bugert, M., Boullosa, B., Eckart de Castilho, R., Gurevych, I.: The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5–9. Santa Fe, New Mexico (2018). https://www.aclweb.org/anthology/C18-2002
Kockaert, H.J., Steurs, F.: Handbook of terminology, vol. 1. John Benjamins Publishing Company (2015)
Google Scholar
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011)
Google Scholar
Kucza, M., Niehues, J., Zenkel, T., Waibel, A., Stüker, S.: Term extraction via neural sequence labeling a comparative evaluation of strategies using recurrent neural networks. In: Interspeech, pp. 2072–2076 (2018)
Google Scholar
Lang, C., Wachowiak, L., Heinisch, B., Gromann, D.: Transforming term extraction: transformer-based approaches to multilingual term extraction across domains. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3607–3620 (2021)
Google Scholar
Le Guillarme, N., Thuiller, W.: TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature. Methods Ecol. Evol. 13(3), 625–641 (2022)
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mouratidis, D., et al.: Domain-specific term extraction: a case study on Greek maritime legal texts. In: Proceedings of the 12th Hellenic Conference on Artificial Intelligence. SETN ’22, New York, NY, USA. Association for Computing Machinery (2022). https://doi.org/10.1145/3549737.3549751
Nguyen, N.T., Gabud, R.S., Ananiadou, S.: Copious: a gold standard corpus of named entities towards extracting species occurrence from biodiversity literature. Biodiversity Data J. (7) (2019)
Google Scholar
Pafilis, E., et al.: The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS ONE 8(6), e65390 (2013)
Article Google Scholar
Pollak, S., Repar, A., Martinc, M., Podpečan, V.: Karst exploration: extracting terms and definitions from karst domain corpus. In: Proceedings of eLex 2019, pp. 934–956 (2019)
Google Scholar
QasemiZadeh, B., Schumann, A.K.: The acl rd-tec 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 1862–1868 (2016)
Google Scholar
Rigouts Terryn, A., Hoste, V., Drouin, P., Lefever, E.: Termeval 2020: shared task on automatic term extraction using the annotated corpora for term extraction research (ACTER) dataset. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020), pp. 85–94. European Language Resources Association (ELRA) (2020)
Google Scholar
Rigouts Terryn, A., Hoste, V., Lefever, E.: HAMLET: hybrid adaptable machine learning approach to extract terminology. Terminology 27(2), 254–293 (2021)
Google Scholar
Terryn, A.R., Hoste, V., Lefever, E.: Tagging terms in text: a supervised sequential labelling approach to automatic term extraction. Terminol. Int. J. Theoret. Appl. Issues Specialized Commun. 28(1), 157–189 (2022)
Google Scholar
Tran, H.T.H., Martinc, M., Caporusso, J., Doucet, A., Pollak, S.: The recent advances in automatic term extraction: a survey. arXiv preprint arXiv:2301.06767 (2023)
Tran, H.T.H., Martinc, M., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer? In: Pascal, P., Ienco, D. (eds.) DS 2022. LNCS, vol. 13601, pp. 363–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18840-4_26
Tran, H.T.H., Martinc, M., Pelicon, A., Doucet, A., Pollak, S.: Ensembling transformers for cross-domain automatic term extraction. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds.) ICADL 2022. LNCS, vol. 13636, pp. 90–100. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21756-2_7
Tran, H.T.H., Martinc, M., Repar, A., Ljubešić, N., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling? Mach. Learn. 1–30 (2024)
Google Scholar
Tran, T.H.H., Martinc, M., Repar, A., Doucet, A., Pollak, S.: A transformer-based sequence-labeling approach to the Slovenian cross-domain automatic term extraction (2022)
Google Scholar
Veena, G., Kanjirangat, V., Gupta, D.: AGRONER: an unsupervised agriculture named entity recognition using weighted distributional semantic model. Expert Syst. Appl. 229, 120440 (2023)
Article Google Scholar
Vintar, Š, Martinc, M.: Framing karstology: from definitions to knowledge structures and automatic frame population. Terminology 28(1), 129–156 (2022)
Google Scholar
Wang, J., Feng, C., Liu, F., Li, X., Wang, X.: Extract then adjust: a two-stage approach for automatic term extraction. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds.) NLPCC 2023. LNCS, vol. 14303, pp. 236–247. Springer (2023). https://doi.org/10.1007/978-3-031-44696-2_19
Zhang, Z., Gao, J., Ciravegna, F.: Semre-rank: improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Trans. Knowl. Disc. Data (TKDD) 12(5), 1–41 (2018)
Article Google Scholar
Zhao, X., Lian, X., Liu, P., Gao, C.: Research on knowledge extraction of maritime operational decision-making sentences. In: C2 2022. LNEE, vol. 949, pp. 836–845. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-6052-9_75

Download references

Acknowledgments

The work was supported by the TERMITRAD (2020-2019-8510010) project funded by the Nouvelle-Aquitaine Region, France, by the Slovenian Research and Innovation Agency core research program Knowledge Technologies (P2-0103) and the project Cross-lingual and Cross-domain Methods for Terminology Extraction and Alignment, a bilateral project funded by the program PROTEUS under the grant number BI-FR/23-24-PROTEUS006. We express our gratitude to Kenza HERMAN and Geraldine DUBOS for their invaluable assistance in the annotation process.

Author information

Authors and Affiliations

University of La Rochelle, L3i, La Rochelle, France
Julien Delaunay, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Mathilde Ducos, Nicolas Sidere & Antoine Doucet
University of La Rochelle, LIENSs, La Rochelle, France
Julien Delaunay & Olivier De Viron
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
Hanh Thi Hong Tran
Jožef Stefan Institute, Ljubljana, Slovenia
Hanh Thi Hong Tran & Senja Pollak

Authors

Julien Delaunay
View author publications
You can also search for this author in PubMed Google Scholar
Hanh Thi Hong Tran
View author publications
You can also search for this author in PubMed Google Scholar
Carlos-Emiliano González-Gallardo
View author publications
You can also search for this author in PubMed Google Scholar
Georgeta Bordea
View author publications
You can also search for this author in PubMed Google Scholar
Mathilde Ducos
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Sidere
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar
Senja Pollak
View author publications
You can also search for this author in PubMed Google Scholar
Olivier De Viron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hanh Thi Hong Tran , Carlos-Emiliano González-Gallardo , Georgeta Bordea , Mathilde Ducos , Nicolas Sidere , Antoine Doucet , Senja Pollak or Olivier De Viron .

Editor information

Editors and Affiliations

Friedrich-Alexander-Universität, Erlangen, Germany
Elmar Nöth
Masaryk University, Brno, Czech Republic
Aleš Horák
Masaryk University, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Delaunay, J. et al. (2024). CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15048. Springer, Cham. https://doi.org/10.1007/978-3-031-70563-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-70563-2_8
Published: 01 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70562-5
Online ISBN: 978-3-031-70563-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Cross-Evaluation of Automated Term Extraction Tools by Measuring Terminological Saturation

SsciBERT: a pre-trained language model for social science texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Cross-Evaluation of Automated Term Extraction Tools by Measuring Terminological Saturation

SsciBERT: a pre-trained language model for social science texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation