Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up

Jakub Piskorski; Nicolas Stefanovitch; Guillaume Jacquet; Aldo Podavini

Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up

Jakub Piskorski, Nicolas Stefanovitch, Guillaume Jacquet, Aldo Podavini

Abstract

This paper presents a study of state-of-the-art unsupervised and linguistically unsophisticated keyword extraction algorithms, based on statistic-, graph-, and embedding-based approaches, including, i.a., Total Keyword Frequency, TF-IDF, RAKE, KPMiner, YAKE, KeyBERT, and variants of TextRank-based keyword extraction algorithms. The study was motivated by the need to select the most appropriate technique to extract keywords for indexing news articles in a real-world large-scale news analysis engine. The algorithms were evaluated on a corpus of circa 330 news articles in 7 languages. The overall best F1 scores for all languages on average were obtained using a combination of the recently introduced YAKE algorithm and KPMiner (20.1%, 46.6% and 47.2% for exact, partial and fuzzy matching resp.).

Anthology ID:: 2021.hackashop-1.6
Volume:: Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Month:: April
Year:: 2021
Address:: Online
Editors:: Hannu Toivonen, Michele Boggia
Venue:: Hackashop
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35–44
Language:
URL:: https://aclanthology.org/2021.hackashop-1.6
DOI:
Bibkey:
Cite (ACL):: Jakub Piskorski, Nicolas Stefanovitch, Guillaume Jacquet, and Aldo Podavini. 2021. Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 35–44, Online. Association for Computational Linguistics.
Cite (Informal):: Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up (Piskorski et al., Hackashop 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.hackashop-1.6.pdf

PDF Cite Search