LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Wang, Peidong; Sun, Eric; Xue, Jian; Wu, Yu; Zhou, Long; Gaur, Yashesh; Liu, Shujie; Li, Jinyu

Computer Science > Computation and Language

arXiv:2211.02809 (cs)

[Submitted on 5 Nov 2022 (v1), last revised 19 Oct 2023 (this version, v3)]

Title:LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Authors:Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

View PDF

Abstract:Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST models may need to be streaming and do not require source language identification (i.e. language-agnostic). In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers. Based on the transducer model structure, we propose four methods, a unified joint and prediction network for multilingual output, a clustered multilingual encoder, target language identification for encoder, and connectionist temporal classification regularization. Experimental results show that LAMASSU not only drastically reduces the model size but also reaches the performances of monolingual ASR and bilingual ST models.

Comments:	INTERSPEECH 2023
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2211.02809 [cs.CL]
	(or arXiv:2211.02809v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.02809

Submission history

From: Peidong Wang [view email]
[v1] Sat, 5 Nov 2022 04:03:55 UTC (266 KB)
[v2] Mon, 29 May 2023 23:49:31 UTC (613 KB)
[v3] Thu, 19 Oct 2023 20:35:13 UTC (613 KB)

Computer Science > Computation and Language

Title:LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators