iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/978-3-031-70563-2_8
CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature | SpringerLink
Skip to main content

CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2024)

Abstract

The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80% for automated term extraction and F1 of 70% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.

J. Delaunay and T. H. Tran—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Corpus and code are available at https://github.com/jdelaunay/coastal_area_term_extraction.

  2. 2.

    https://nlp.stanford.edu/projects/glove/.

  3. 3.

    https://github.com/flairNLP/flair.

  4. 4.

    https://termframe.ff.uni-lj.si/.

  5. 5.

    https://www.elsevier.com/fr-fr/solutions/scopus.

  6. 6.

    https://agrovoc.fao.org/browse/agrovoc/en/.

  7. 7.

    https://www.eionet.europa.eu/gemet/en/about/.

  8. 8.

    https://finto.fi/afo/en/.

  9. 9.

    https://github.com/frmichel/taxref-ld.

References

  1. Abdelmageed, N., et al.: BiodivNERE: gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodiversity Data J. 10 (2022)

    Google Scholar 

  2. Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Local-global vectors to improve unigram terminology extraction. Computerm 2016, 2 (2016)

    Google Scholar 

  3. Amjadian, E., Inkpen, D., Paribakht, T.S., Faez, F.: Distributed specificity for automatic terminology extraction. Terminology. Int. J. Theoret. Appl. Issues Special. Commun. 24(1), 23–40 (2018)

    Google Scholar 

  4. Andersen, G.: Utilising heterogeneous language resources for term extraction in maritime domains. Terminology 28(1), 1–36 (2022)

    MathSciNet  Google Scholar 

  5. Andrius, U.: Automatic extraction of lithuanian cybersecurity terms using deep learning approaches. In: Human Language Technologies–The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020. vol. 328, p. 39. IOS Press (2020)

    Google Scholar 

  6. Arp, R., Smith, B., Spear, A.D.: Building Ontologies with Basic Formal Ontology. MIT Press (2015)

    Google Scholar 

  7. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)

  8. Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E., Consortium, E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4, 1–9 (2013)

    Google Scholar 

  9. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

  10. Etienne, M., Du Toit, D.R., Pollard, S.: ARDI: a co-construction method for participatory modeling in natural resources management. Ecol. Soc. 16(1) (2011)

    Google Scholar 

  11. Faber, P., León-Araúz, P., Reimerink, A.: Representing environmental knowledge in EcoLexicon. In: Bárcena, E., Read, T., Arús, J. (eds.) Languages for Specific Purposes in the Digital Era. EL, vol. 19, pp. 267–301. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02222-2_13

    Chapter  Google Scholar 

  12. Faber, P., León-Araúz, P., Reimerink, A.: EcoLexicon: new features and challenges. GLOBALEX 73–80 (2016)

    Google Scholar 

  13. Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C-value/NC-value method of automatic recognition for multi-word terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49653-X_35

    Chapter  Google Scholar 

  14. Gkoutos, G.V., Schofield, P.N., Hoehndorf, R.: The anatomy of phenotype ontologies: principles, properties and applications. Briefings Bioinform. 19(5), 1008–1021 (04 2017). https://doi.org/10.1093/bib/bbx035

  15. Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Termeval 2020: Taln-ls2n system for automatic term extraction. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020) (2020)

    Google Scholar 

  16. Hazem, A., Bouhandi, M., Boudin, F., Daille, B.: Cross-lingual and cross-domain transfer learning for automatic term extraction from low resource data. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 648–662 (2022)

    Google Scholar 

  17. Kessler, R., Béchet, N., Berio, G.: Extraction of terminology in the field of construction. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 22–26. IEEE (2019)

    Google Scholar 

  18. Klie, J.C., Bugert, M., Boullosa, B., Eckart de Castilho, R., Gurevych, I.: The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5–9. Santa Fe, New Mexico (2018). https://www.aclweb.org/anthology/C18-2002

  19. Kockaert, H.J., Steurs, F.: Handbook of terminology, vol. 1. John Benjamins Publishing Company (2015)

    Google Scholar 

  20. Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011)

    Google Scholar 

  21. Kucza, M., Niehues, J., Zenkel, T., Waibel, A., Stüker, S.: Term extraction via neural sequence labeling a comparative evaluation of strategies using recurrent neural networks. In: Interspeech, pp. 2072–2076 (2018)

    Google Scholar 

  22. Lang, C., Wachowiak, L., Heinisch, B., Gromann, D.: Transforming term extraction: transformer-based approaches to multilingual term extraction across domains. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3607–3620 (2021)

    Google Scholar 

  23. Le Guillarme, N., Thuiller, W.: TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature. Methods Ecol. Evol. 13(3), 625–641 (2022)

    Article  Google Scholar 

  24. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  25. Mouratidis, D., et al.: Domain-specific term extraction: a case study on Greek maritime legal texts. In: Proceedings of the 12th Hellenic Conference on Artificial Intelligence. SETN ’22, New York, NY, USA. Association for Computing Machinery (2022). https://doi.org/10.1145/3549737.3549751

  26. Nguyen, N.T., Gabud, R.S., Ananiadou, S.: Copious: a gold standard corpus of named entities towards extracting species occurrence from biodiversity literature. Biodiversity Data J. (7) (2019)

    Google Scholar 

  27. Pafilis, E., et al.: The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS ONE 8(6), e65390 (2013)

    Article  Google Scholar 

  28. Pollak, S., Repar, A., Martinc, M., Podpečan, V.: Karst exploration: extracting terms and definitions from karst domain corpus. In: Proceedings of eLex 2019, pp. 934–956 (2019)

    Google Scholar 

  29. QasemiZadeh, B., Schumann, A.K.: The acl rd-tec 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 1862–1868 (2016)

    Google Scholar 

  30. Rigouts Terryn, A., Hoste, V., Drouin, P., Lefever, E.: Termeval 2020: shared task on automatic term extraction using the annotated corpora for term extraction research (ACTER) dataset. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020), pp. 85–94. European Language Resources Association (ELRA) (2020)

    Google Scholar 

  31. Rigouts Terryn, A., Hoste, V., Lefever, E.: HAMLET: hybrid adaptable machine learning approach to extract terminology. Terminology 27(2), 254–293 (2021)

    Google Scholar 

  32. Terryn, A.R., Hoste, V., Lefever, E.: Tagging terms in text: a supervised sequential labelling approach to automatic term extraction. Terminol. Int. J. Theoret. Appl. Issues Specialized Commun. 28(1), 157–189 (2022)

    Google Scholar 

  33. Tran, H.T.H., Martinc, M., Caporusso, J., Doucet, A., Pollak, S.: The recent advances in automatic term extraction: a survey. arXiv preprint arXiv:2301.06767 (2023)

  34. Tran, H.T.H., Martinc, M., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer? In: Pascal, P., Ienco, D. (eds.) DS 2022. LNCS, vol. 13601, pp. 363–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18840-4_26

  35. Tran, H.T.H., Martinc, M., Pelicon, A., Doucet, A., Pollak, S.: Ensembling transformers for cross-domain automatic term extraction. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds.) ICADL 2022. LNCS, vol. 13636, pp. 90–100. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21756-2_7

  36. Tran, H.T.H., Martinc, M., Repar, A., Ljubešić, N., Doucet, A., Pollak, S.: Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling? Mach. Learn. 1–30 (2024)

    Google Scholar 

  37. Tran, T.H.H., Martinc, M., Repar, A., Doucet, A., Pollak, S.: A transformer-based sequence-labeling approach to the Slovenian cross-domain automatic term extraction (2022)

    Google Scholar 

  38. Veena, G., Kanjirangat, V., Gupta, D.: AGRONER: an unsupervised agriculture named entity recognition using weighted distributional semantic model. Expert Syst. Appl. 229, 120440 (2023)

    Article  Google Scholar 

  39. Vintar, Š, Martinc, M.: Framing karstology: from definitions to knowledge structures and automatic frame population. Terminology 28(1), 129–156 (2022)

    Google Scholar 

  40. Wang, J., Feng, C., Liu, F., Li, X., Wang, X.: Extract then adjust: a two-stage approach for automatic term extraction. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds.) NLPCC 2023. LNCS, vol. 14303, pp. 236–247. Springer (2023). https://doi.org/10.1007/978-3-031-44696-2_19

  41. Zhang, Z., Gao, J., Ciravegna, F.: Semre-rank: improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Trans. Knowl. Disc. Data (TKDD) 12(5), 1–41 (2018)

    Article  Google Scholar 

  42. Zhao, X., Lian, X., Liu, P., Gao, C.: Research on knowledge extraction of maritime operational decision-making sentences. In: C2 2022. LNEE, vol. 949, pp. 836–845. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-6052-9_75

Download references

Acknowledgments

The work was supported by the TERMITRAD (2020-2019-8510010) project funded by the Nouvelle-Aquitaine Region, France, by the Slovenian Research and Innovation Agency core research program Knowledge Technologies (P2-0103) and the project Cross-lingual and Cross-domain Methods for Terminology Extraction and Alignment, a bilateral project funded by the program PROTEUS under the grant number BI-FR/23-24-PROTEUS006. We express our gratitude to Kenza HERMAN and Geraldine DUBOS for their invaluable assistance in the annotation process.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hanh Thi Hong Tran , Carlos-Emiliano González-Gallardo , Georgeta Bordea , Mathilde Ducos , Nicolas Sidere , Antoine Doucet , Senja Pollak or Olivier De Viron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Delaunay, J. et al. (2024). CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15048. Springer, Cham. https://doi.org/10.1007/978-3-031-70563-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70563-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70562-5

  • Online ISBN: 978-3-031-70563-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics