A global analysis of metrics used for measuring performance in natural language processing

Blagec, Kathrin; Dorffner, Georg; Moradi, Milad; Ott, Simon; Samwald, Matthias

Computer Science > Computation and Language

arXiv:2204.11574 (cs)

[Submitted on 25 Apr 2022]

Title:A global analysis of metrics used for measuring performance in natural language processing

Authors:Kathrin Blagec, Georg Dorffner, Milad Moradi, Simon Ott, Matthias Samwald

View PDF

Abstract:Measuring the performance of natural language processing models is challenging. Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine translation and summarization, have been shown to suffer from low correlation with human judgment and a lack of transferability to other tasks and languages. In the past 15 years, a wide range of alternative metrics have been proposed. However, it is unclear to what extent this has had an impact on NLP benchmarking efforts. Here we provide the first large-scale cross-sectional analysis of metrics used for measuring performance in natural language processing. We curated, mapped and systematized more than 3500 machine learning model performance results from the open repository 'Papers with Code' to enable a global and comprehensive analysis. Our results suggest that the large majority of natural language processing metrics currently used have properties that may result in an inadequate reflection of a models' performance. Furthermore, we found that ambiguities and inconsistencies in the reporting of metrics may lead to difficulties in interpreting and comparing model performances, impairing transparency and reproducibility in NLP research.

Comments:	"NLP Power" workshop at ACL 2022. This work is based on a previous arXiv submission: arXiv:2008.02577 [cs.AI]
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2204.11574 [cs.CL]
	(or arXiv:2204.11574v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.11574

Submission history

From: Matthias Samwald [view email]
[v1] Mon, 25 Apr 2022 11:41:50 UTC (721 KB)

Computer Science > Computation and Language

Title:A global analysis of metrics used for measuring performance in natural language processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A global analysis of metrics used for measuring performance in natural language processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators