Yifan Peng et al.: VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. (2024)journals/corr/abs-2410-1748510.48550/ARXIV.2410.17485VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.10Yifan Peng1Krishna C. Puvvada2Zhehuai Chen3Piotr Zelasko4He Huang5Kunal Dhawan6Ke Hu7Shinji Watanabe 00018Jagadeesh Balam9Boris Ginsburg10CoRRCoRRabs/2410.174852024provenance information for RDF data of dblp record 'journals/corr/abs-2410-17485'2024-11-27T22:20:44+0100