Introducing Semantics into Speech Encoders

Derek Xu; Shuyan Dong; Changhan Wang; Suyoun Kim; Zhaojiang Lin; Bing Liu; Akshat Shrivastava; Shang-Wen Li; Liang-Hsuan Tseng; Guan-Ting Lin; Alexei Baevski; Hung-Yi Lee; Yizhou Sun; Wei Wang

doi:10.18653/v1/2023.acl-long.639

Introducing Semantics into Speech Encoders

Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Bing Liu, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Guan-Ting Lin, Alexei Baevski, Hung-yi Lee, Yizhou Sun, Wei Wang

Abstract

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

Anthology ID:: 2023.acl-long.639
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11413–11429
Language:
URL:: https://aclanthology.org/2023.acl-long.639
DOI:: 10.18653/v1/2023.acl-long.639
Bibkey:
Cite (ACL):: Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Bing Liu, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Guan-Ting Lin, Alexei Baevski, Hung-yi Lee, Yizhou Sun, and Wei Wang. 2023. Introducing Semantics into Speech Encoders. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11413–11429, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Introducing Semantics into Speech Encoders (Xu et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.639.pdf
Video:: https://aclanthology.org/2023.acl-long.639.mp4

PDF Cite Search Video