SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

Si, Shuzheng; Ma, Wentao; Gao, Haoyu; Wu, Yuchuan; Lin, Ting-En; Dai, Yinpei; Li, Hangyu; Yan, Rui; Huang, Fei; Li, Yongbin

Computer Science > Computation and Language

arXiv:2305.13040 (cs)

[Submitted on 22 May 2023 (v1), last revised 12 Mar 2024 (this version, v5)]

Title:SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

Authors:Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li

View PDF HTML (experimental)

Abstract:Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e.g., ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25.65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52.1% of dialogues. The dataset, code, and leaderboard are available: this https URL.

Comments:	NeurIPS 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.13040 [cs.CL]
	(or arXiv:2305.13040v5 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.13040

Submission history

From: Shuzheng Si [view email]
[v1] Mon, 22 May 2023 13:47:51 UTC (7,602 KB)
[v2] Wed, 7 Jun 2023 16:04:30 UTC (7,844 KB)
[v3] Mon, 24 Jul 2023 03:31:42 UTC (8,334 KB)
[v4] Tue, 24 Oct 2023 15:19:39 UTC (8,341 KB)
[v5] Tue, 12 Mar 2024 08:52:02 UTC (8,341 KB)

Computer Science > Computation and Language

Title:SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators