How well do Large Language Models perform in Arithmetic tasks?

Yuan, Zheng; Yuan, Hongyi; Tan, Chuanqi; Wang, Wei; Huang, Songfang

Computer Science > Computation and Language

arXiv:2304.02015 (cs)

[Submitted on 16 Mar 2023]

Title:How well do Large Language Models perform in Arithmetic tasks?

Authors:Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang

View PDF

Abstract:Large language models have emerged abilities including chain-of-thought to answer math word problems step by step. Solving math word problems not only requires abilities to disassemble problems via chain-of-thought but also needs to calculate arithmetic expressions correctly for each step. To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models. In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models including GPT-4, ChatGPT, InstrctGPT, Galactica, and LLaMA with various arithmetic expressions and provide a detailed analysis of the ability of large language models. MATH 401 and evaluation codes are released at \url{this https URL}.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2304.02015 [cs.CL]
	(or arXiv:2304.02015v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.02015

Submission history

From: Zheng Yuan [view email]
[v1] Thu, 16 Mar 2023 09:28:15 UTC (7,113 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2023-04

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:How well do Large Language Models perform in Arithmetic tasks?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:How well do Large Language Models perform in Arithmetic tasks?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators