default search action
Shinji Watanabe 0001
Person information
- affiliation: Carnegie Mellon University, Pittsburgh, PA, USA
- affiliation (former): Johns Hopkins University, Baltimore, MD, USA
- affiliation (2012 - 2017): Mitsubishi Electric Research Laboratories, Cambridge, MA, USA
- affiliation (2001 - 2011): NTT Communication Science Laboratories, Kyoto, Japan
- affiliation (PhD 2006): Waseda University, Tokyo, Japan
Other persons with the same name
- Shinji Watanabe 0002 — Kanagawa University, Department of Electrical Engineering, Yokohama, Japan
- Shinji Watanabe 0003 — Osaka Prefecture University, School of Knowledge and Information Systems, Sakai, Japan
- Shinji Watanabe 0004 — Renesas Electronics Corporation, Kawasaki, Japan
- Shinji Watanabe 0005 — Nintendo Co.,Ltd, Kyoto, Japan
- Shinji Watanabe 0006 — Gifu National College of Technology, Motosu-gun, Gifu-ken, Japan
- Shinji Watanabe 0007 — University of Miyazaki, Miyazaki, Japan
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j60]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - [j59]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024) - [j58]Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan:
Music ControlNet: Multiple Time-Varying Controls for Music Generation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2692-2703 (2024) - [j57]Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2884-2899 (2024) - [c417]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Yuexian Zou, Zhou Zhao, Shinji Watanabe:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. AAAI 2024: 23802-23804 - [c416]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. ACL (1) 2024: 568-582 - [c415]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. ACL (1) 2024: 10192-10209 - [c414]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. ACL (Findings) 2024: 11923-11938 - [c413]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. EMNLP (Industry Track) 2024: 440-451 - [c412]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. EMNLP 2024: 10205-10224 - [c411]Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong-Qiu Wang, Bao-Cai Yin, Jia Pan:
Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge. ICASSP Workshops 2024: 123-124 - [c410]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-Weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe:
Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation. ICASSP 2024: 316-320 - [c409]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe:
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor. ICASSP 2024: 446-450 - [c408]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. ICASSP Workshops 2024: 570-574 - [c407]Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe:
Understanding Probe Behaviors Through Variational Bounds of Mutual Information. ICASSP 2024: 5655-5659 - [c406]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974 - [c405]Salvador Medina, Sarah L. Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, Iain A. Matthews:
PhISANet: Phonetically Informed Speech Animation Network. ICASSP 2024: 8225-8229 - [c404]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper. ICASSP 2024: 10471-10475 - [c403]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe:
Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR. ICASSP 2024: 10641-10645 - [c402]Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search. ICASSP 2024: 10896-10900 - [c401]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu:
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2024: 11156-11160 - [c400]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. ICASSP 2024: 11481-11485 - [c399]Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury:
Semi-Autoregressive Streaming ASR with Label Context. ICASSP 2024: 11681-11685 - [c398]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model. ICASSP 2024: 11741-11745 - [c397]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. ICASSP 2024: 11831-11835 - [c396]Samuele Cornell, Jee-Weon Jung, Shinji Watanabe, Stefano Squartini:
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. ICASSP 2024: 11856-11860 - [c395]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. ICASSP 2024: 11941-11945 - [c394]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. ICASSP 2024: 11971-11975 - [c393]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur:
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora. ICASSP 2024: 12006-12010 - [c392]Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe:
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models. ICASSP 2024: 12071-12075 - [c391]Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee:
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech. ICASSP 2024: 12136-12140 - [c390]Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe, Joon Son Chung:
VoxMM: Rich Transcription of Conversations in the Wild. ICASSP 2024: 12551-12555 - [c389]William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe:
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing. ICASSP 2024: 13066-13070 - [c388]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330 - [c387]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. IJCAI 2024: 5171-5180 - [c386]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. ACM Multimedia 2024: 11279-11281 - [c385]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions. NAACL-HLT 2024: 2754-2774 - [i329]Jee-weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe:
AugSumm: towards generalizable speech summarization using synthetic labels from large language model. CoRR abs/2401.06806 (2024) - [i328]Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Improving ASR Contextual Biasing with Guided Attention. CoRR abs/2401.08835 (2024) - [i327]Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search. CoRR abs/2401.10449 (2024) - [i326]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe:
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor. CoRR abs/2401.12473 (2024) - [i325]Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. CoRR abs/2401.14271 (2024) - [i324]Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. CoRR abs/2401.16658 (2024) - [i323]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024) - [i322]Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models. CoRR abs/2401.17230 (2024) - [i321]Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2. CoRR abs/2401.17619 (2024) - [i320]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024) - [i319]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald:
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? CoRR abs/2402.00340 (2024) - [i318]Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj:
Evaluating and Improving Continual Learning in Spoken Language Understanding. CoRR abs/2402.10427 (2024) - [i317]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. CoRR abs/2402.12654 (2024) - [i316]Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024) - [i315]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. CoRR abs/2403.13169 (2024) - [i314]Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. CoRR abs/2404.09385 (2024) - [i313]Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Dynamic Vocabulary. CoRR abs/2405.13344 (2024) - [i312]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. CoRR abs/2405.13514 (2024) - [i311]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. CoRR abs/2405.20402 (2024) - [i310]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
YODAS: Youtube-Oriented Dataset for Audio and Speech. CoRR abs/2406.00899 (2024) - [i309]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. CoRR abs/2406.02560 (2024) - [i308]Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe:
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders. CoRR abs/2406.02950 (2024) - [i307]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. CoRR abs/2406.04269 (2024) - [i306]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i305]Jee-weon Jung, Xin Wang, Nicholas W. D. Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung:
To what extent can ASV systems naturally defend against spoofing attacks? CoRR abs/2406.05339 (2024) - [i304]Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann:
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation. CoRR abs/2406.06185 (2024) - [i303]Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. CoRR abs/2406.07725 (2024) - [i302]Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe:
Neural Blind Source Separation and Diarization for Distant Speech Recognition. CoRR abs/2406.08396 (2024) - [i301]Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe:
Self-Supervised Speech Representations are More Phonetic than Semantic. CoRR abs/2406.08619 (2024) - [i300]Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets. CoRR abs/2406.08641 (2024) - [i299]Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe:
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation. CoRR abs/2406.08761 (2024) - [i298]Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. CoRR abs/2406.09282 (2024) - [i297]Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu:
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding. CoRR abs/2406.09345 (2024) - [i296]Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Y. Sun, Shinji Watanabe:
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model. CoRR abs/2406.09869 (2024) - [i295]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. CoRR abs/2406.10083 (2024) - [i294]Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model. CoRR abs/2406.12317 (2024) - [i293]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting. CoRR abs/2406.12611 (2024) - [i292]Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian:
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement. CoRR abs/2406.13471 (2024) - [i291]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Streaming End-to-end Speech Recognition. CoRR abs/2406.16107 (2024) - [i290]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. CoRR abs/2406.16120 (2024) - [i289]Hye-jin Shim, Md. Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen:
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing. CoRR abs/2406.17246 (2024) - [i288]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024) - [i287]Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels. CoRR abs/2407.03718 (2024) - [i286]Samuele Cornell, Taejin Park, Steve Huang, Christoph Böddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola García, Shinji Watanabe:
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization. CoRR abs/2407.16447 (2024) - [i285]Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe:
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data. CoRR abs/2408.00624 (2024) - [i284]Xi Xu, Siqi Ouyang, Brian Yan, Patrick Fernandes, William Chen, Lei Li, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2024 Simultaneous Speech Translation System. CoRR abs/2408.07452 (2024) - [i283]Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe:
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition. CoRR abs/2408.09215 (2024) - [i282]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. CoRR abs/2409.07226 (2024) - [i281]Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe:
Text-To-Speech Synthesis In The Wild. CoRR abs/2409.08711 (2024) - [i280]Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration. CoRR abs/2409.09506 (2024) - [i279]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition. CoRR abs/2409.09785 (2024) - [i278]Li-Wei Chen, Takuya Higuchi, He Bai, Ahmed Hussen Abdelaziz, Alexander Rudnicky, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald, Zakaria Aldeneh:
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models. CoRR abs/2409.10788 (2024) - [i277]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald:
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels. CoRR abs/2409.10791 (2024) - [i276]Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe:
Task Arithmetic for Language Expansion in Speech Translation. CoRR abs/2409.11274 (2024) - [i275]Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. CoRR abs/2409.12370 (2024) - [i274]Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu:
Preference Alignment Improves Language Model-Based TTS. CoRR abs/2409.12403 (2024) - [i273]Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James R. Glass, Shinji Watanabe, Hung-yi Lee:
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models. CoRR abs/2409.14085 (2024) - [i272]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens. CoRR abs/2409.15732 (2024) - [i271]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech. CoRR abs/2409.15897 (2024) - [i270]Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas W. D. Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe:
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild. CoRR abs/2409.17285 (2024) - [i269]Brian Yan, Vineel Pratap, Shinji Watanabe, Michael Auli:
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking. CoRR abs/2409.18428 (2024) - [i268]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. CoRR abs/2410.03007 (2024) - [i267]Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. CoRR abs/2410.17485 (2024) - 2023
- [j56]Matthew Maciejewski, Jing Shi, Shinji Watanabe, Sanjeev Khudanpur:
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data. Comput. Speech Lang. 77: 101410 (2023) - [j55]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j54]Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, Jonathan Le Roux:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. IEEE ACM Trans. Audio Speech Lang. Process. 31: 397-410 (2023) - [j53]Shota Horiguchi, Shinji Watanabe, Paola García, Yuki Takashima, Yohei Kawaguchi:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. IEEE ACM Trans. Audio Speech Lang. Process. 31: 706-720 (2023) - [j52]Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao:
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. IEEE ACM Trans. Audio Speech Lang. Process. 31: 2738-2750 (2023) - [j51]Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3112-3126 (2023) - [j50]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3221-3236 (2023) - [c384]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. AAAI 2023: 12644-12652 - [c383]Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411 - [c382]Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S. Sharma, Wei-Lun Wu, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. ACL (1) 2023: 8906-8937 - [c381]Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino:
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. ACL (1) 2023: 15655-15680 - [c380]Tuan Vu Ho, Shota Horiguchi, Shinji Watanabe, Paola García, Takashi Sumiyoshi:
Synthetic Data Augmentation for ASR with Domain Filtering. APSIPA ASC 2023: 1760-1765 - [c379]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8 - [c378]Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. ASRU 2023: 1-6 - [c377]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. ASRU 2023: 1-9 - [c376]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe:
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. ASRU 2023: 1-8 - [c375]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
Yodas: Youtube-Oriented Dataset for Audio and Speech. ASRU 2023: 1-8 - [c374]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8 - [c373]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. ASRU 2023: 1-6 - [c372]Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. ASRU 2023: 1-8 - [c371]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. ASRU 2023: 1-8 - [c370]Yusuke Shinohara, Shinji Watanabe:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. ASRU 2023: 1-7 - [c369]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. ASRU 2023: 1-8 - [c368]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. ASRU 2023: 1-6 - [c367]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe:
CTC Alignments Improve Autoregressive Translation. EACL 2023: 1615-1631 - [c366]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. ICASSP 2023: 1-5 - [c365]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge. ICASSP 2023: 1-2 - [c364]Dan Berrebbi, Brian Yan, Shinji Watanabe:
Avoid Overthinking in Self-Supervised Models for Speech Recognition. ICASSP 2023: 1-5 - [c363]Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. ICASSP 2023: 1-2 - [c362]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units. ICASSP 2023: 1-5 - [c361]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5 - [c360]Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono, Stefano Squartini:
Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge. ICASSP 2023: 1-2 - [c359]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge. ICASSP 2023: 1-2 - [c358]Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe:
Streaming Joint Speech Recognition and Disfluency Detection. ICASSP 2023: 1-5 - [c357]Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur:
Euro: Espnet Unsupervised ASR Open-Source Toolkit. ICASSP 2023: 1-5 - [c356]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss. ICASSP 2023: 1-5 - [c355]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder. ICASSP 2023: 1-5 - [c354]Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe:
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance. ICASSP 2023: 1-5 - [c353]Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung:
In Search of Strong Embedding Extractors for Speaker Diarisation. ICASSP 2023: 1-5 - [c352]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe:
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. ICASSP 2023: 1-5 - [c351]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge. ICASSP 2023: 1-2 - [c350]Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization. ICASSP 2023: 1-5 - [c349]Takashi Maekaku, Yuya Fujita, Xuankai Chang, Shinji Watanabe:
Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model. ICASSP 2023: 1-5 - [c348]Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5 - [c347]Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe:
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation. ICASSP 2023: 1-5 - [c346]Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding. ICASSP 2023: 1-5 - [c345]Yifan Peng, Jaesong Lee, Shinji Watanabe:
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition. ICASSP 2023: 1-5 - [c344]Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García, Shinji Watanabe, Ann Lee, Hung-Yi Lee:
Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR. ICASSP 2023: 1-5 - [c343]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe:
Enhancing Speech-To-Speech Translation with Multiple TTS Targets. ICASSP 2023: 1-5 - [c342]Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe:
Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2023: 1-5 - [c341]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation. ICASSP 2023: 1-5 - [c340]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling. ICASSP 2023: 1-5 - [c339]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition. ICASSP 2023: 1-5 - [c338]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. ICASSP 2023: 1-5 - [c337]Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Jeong Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi:
Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages. ICASSP 2023: 1-5 - [c336]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg:
Multi-Blank Transducers for Speech Recognition. ICASSP 2023: 1-5 - [c335]Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe:
Towards Zero-Shot Code-Switched Speech Recognition. ICASSP 2023: 1-5 - [c334]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c333]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c332]Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks. ICLR 2023 - [c331]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. ICML 2023: 38462-38484 - [c330]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187 - [c329]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. INTERSPEECH 2023: 62-66 - [c328]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400 - [c327]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
Tensor decomposition for minimization of E2E SLU model toward on-device processing. INTERSPEECH 2023: 710-714 - [c326]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. INTERSPEECH 2023: 720-724 - [c325]Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. INTERSPEECH 2023: 884-888 - [c324]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. INTERSPEECH 2023: 1369-1373 - [c323]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. INTERSPEECH 2023: 1399-1403 - [c322]Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. INTERSPEECH 2023: 1454-1458 - [c321]Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. INTERSPEECH 2023: 1528-1532 - [c320]Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. INTERSPEECH 2023: 2208-2212 - [c319]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolution. INTERSPEECH 2023: 3287-3291 - [c318]Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. INTERSPEECH 2023: 3312-3316 - [c317]Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. INTERSPEECH 2023: 3979-3983 - [c316]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408 - [c315]Yui Sudo, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training. INTERSPEECH 2023: 4479-4483 - [c314]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. INTERSPEECH 2023: 4968-4972 - [c313]Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W. Black, Louis Goldstein, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from MRI-Based Articulatory Representations. INTERSPEECH 2023: 5132-5136 - [c312]Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondrej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John P. McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, Rodolfo Zevallos:
Findings of the IWSLT 2023 Evaluation Campaign. IWSLT@ACL 2023: 1-61 - [c311]Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240 - [c310]Zhong-Qiu Wang, Shinji Watanabe:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. NeurIPS 2023 - [c309]Taiqi He, Lindia Tjuatja, Nathaniel R. Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, Lori S. Levin:
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing. SIGMORPHON 2023: 209-216 - [c308]Georgios Karakasidis, Nathaniel R. Robinson, Yaroslav Getman, Atieno Ogayo, Ragheb Al-Ghezi, Ananya Ayasi, Shinji Watanabe, David R. Mortensen, Mikko Kurimo:
Multilingual TTS Accent Impressions for Accented ASR. TSD 2023: 317-327 - [c307]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. WASPAA 2023: 1-5 - [d1]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i266]Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023) - [i265]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i264]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. CoRR abs/2302.04215 (2023) - [i263]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. CoRR abs/2302.06774 (2023) - [i262]Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono:
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge. CoRR abs/2302.07928 (2023) - [i261]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08088 (2023) - [i260]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08095 (2023) - [i259]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023) - [i258]Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding. CoRR abs/2302.14132 (2023) - [i257]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i256]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition. CoRR abs/2303.06326 (2023) - [i255]Yifan Peng, Jaesong Lee, Shinji Watanabe:
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition. CoRR abs/2303.07624 (2023) - [i254]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe:
Enhancing Speech-to-Speech Translation with Multiple TTS Targets. CoRR abs/2304.04618 (2023) - [i253]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. CoRR abs/2304.06795 (2023) - [i252]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling. CoRR abs/2304.08707 (2023) - [i251]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. CoRR abs/2304.12995 (2023) - [i250]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. CoRR abs/2305.00926 (2023) - [i249]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge. CoRR abs/2305.01194 (2023) - [i248]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge. CoRR abs/2305.01620 (2023) - [i247]Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee:
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation. CoRR abs/2305.07455 (2023) - [i246]Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei-Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. CoRR abs/2305.10615 (2023) - [i245]Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. CoRR abs/2305.11073 (2023) - [i244]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023) - [i243]Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. CoRR abs/2305.13331 (2023) - [i242]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. CoRR abs/2305.17651 (2023) - [i241]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. CoRR abs/2305.18108 (2023) - [i240]Zhong-Qiu Wang, Shinji Watanabe:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. CoRR abs/2305.20054 (2023) - [i239]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolutions. CoRR abs/2306.01084 (2023) - [i238]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023) - [i237]Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola García, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur:
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios. CoRR abs/2306.13734 (2023) - [i236]Roshan S. Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. CoRR abs/2307.08217 (2023) - [i235]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. CoRR abs/2307.11005 (2023) - [i234]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. CoRR abs/2307.12231 (2023) - [i233]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. CoRR abs/2307.12767 (2023) - [i232]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. CoRR abs/2308.10107 (2023) - [i231]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023) - [i230]Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao:
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction. CoRR abs/2309.08348 (2023) - [i229]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023) - [i228]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model. CoRR abs/2309.08535 (2023) - [i227]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation. CoRR abs/2309.08876 (2023) - [i226]Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee:
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech. CoRR abs/2309.09510 (2023) - [i225]Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023) - [i224]Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury:
Semi-Autoregressive Streaming ASR With Label Context. CoRR abs/2309.10926 (2023) - [i223]Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. CoRR abs/2309.11379 (2023) - [i222]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023) - [i221]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. CoRR abs/2309.14922 (2023) - [i220]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023) - [i219]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed M. Ali, Shinji Watanabe, Sanjeev Khudanpur:
Speech collage: code-switched audio generation by collaging monolingual corpora. CoRR abs/2309.15674 (2023) - [i218]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. CoRR abs/2309.15686 (2023) - [i217]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. CoRR abs/2309.15800 (2023) - [i216]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. CoRR abs/2309.15826 (2023) - [i215]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe:
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation. CoRR abs/2309.17352 (2023) - [i214]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement for Diverse Input Conditions. CoRR abs/2309.17384 (2023) - [i213]Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng:
UniAudio: An Audio Foundation Model Toward Universal Audio Generation. CoRR abs/2310.00704 (2023) - [i212]Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini:
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. CoRR abs/2310.01688 (2023) - [i211]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network. CoRR abs/2310.02973 (2023) - [i210]Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe:
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios. CoRR abs/2310.03938 (2023) - [i209]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model. CoRR abs/2310.03975 (2023) - [i208]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond. CoRR abs/2310.05513 (2023) - [i207]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction. CoRR abs/2310.08277 (2023) - [i206]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis:
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. CoRR abs/2310.17864 (2023) - [i205]Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan:
Music ControlNet: Multiple Time-varying Controls for Music Generation. CoRR abs/2311.07069 (2023) - [i204]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe:
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR. CoRR abs/2312.09582 (2023) - [i203]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu:
Generative Context-aware Fine-tuning of Self-supervised Speech Models. CoRR abs/2312.09895 (2023) - [i202]Kwanghee Choi, Jee-weon Jung, Shinji Watanabe:
Understanding Probe Behaviors through Variational Bounds of Mutual Information. CoRR abs/2312.10019 (2023) - 2022
- [j49]Amir Hussein, Shinji Watanabe, Ahmed Ali:
Arabic speech recognition by end-to-end, modular systems and human. Comput. Speech Lang. 71: 101272 (2022) - [j48]Zili Huang, Marc Delcroix, Leibny Paola García-Perera, Shinji Watanabe, Desh Raj, Sanjeev Khudanpur:
Joint speaker diarization and speech recognition based on region proposal networks. Comput. Speech Lang. 72: 101316 (2022) - [j47]Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022) - [j46]Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu:
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. Comput. Speech Lang. 73: 101327 (2022) - [j45]Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu:
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75: 101360 (2022) - [j44]Jing Shi, Xuankai Chang, Shinji Watanabe, Bo Xu:
Train from scratch: Single-stage joint training of speech separation and recognition. Comput. Speech Lang. 76: 101387 (2022) - [j43]Hung-Yi Lee, Shinji Watanabe, Karen Livescu, Abdelrahman Mohamed, Tara N. Sainath:
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1174-1178 (2022) - [j42]Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe:
Self-Supervised Speech Representation Learning: A Review. IEEE J. Sel. Top. Signal Process. 16(6): 1179-1210 (2022) - [j41]Zhong-Qiu Wang, Shinji Watanabe:
Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction. IEEE Signal Process. Lett. 29: 1422-1426 (2022) - [j40]Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola García:
Encoder-Decoder Based Attractors for End-to-End Neural Diarization. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1493-1507 (2022) - [j39]Wangyou Zhang, Xuankai Chang, Christoph Böddeker, Tomohiro Nakatani, Shinji Watanabe, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c306]Xinjian Li, Florian Metze, David R. Mortensen, Shinji Watanabe, Alan W. Black:
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. ACL (Findings) 2022: 2106-2115 - [c305]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. ACL (1) 2022: 8479-8492 - [c304]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. EMNLP (Findings) 2022: 5419-5429 - [c303]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. EMNLP (Findings) 2022: 5486-5503 - [c302]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe:
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022: 6237-6241 - [c301]Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu:
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization. ICASSP 2022: 6412-6416 - [c300]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c299]Motoi Omachi, Yuya Fujita, Shinji Watanabe, Tianzi Wang:
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing. ICASSP 2022: 6772-6776 - [c298]Zili Huang, Shinji Watanabe, Shu-Wen Yang, Paola García, Sanjeev Khudanpur:
Investigating Self-Supervised Learning for Speech Enhancement and Separation. ICASSP 2022: 6837-6841 - [c297]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair:
Torchaudio: Building Blocks for Audio and Speech Processing. ICASSP 2022: 6982-6986 - [c296]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion. ICASSP 2022: 7107-7111 - [c295]Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe:
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet. ICASSP 2022: 7167-7171 - [c294]Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Sequence Transduction with Graph-Based Supervision. ICASSP 2022: 7212-7216 - [c293]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. ICASSP 2022: 7322-7326 - [c292]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe, Yohei Kawaguchi:
Multi-Channel End-To-End Neural Diarization with Distributed Microphones. ICASSP 2022: 7332-7336 - [c291]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. ICASSP 2022: 7402-7406 - [c290]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Jeong Han, Shinji Watanabe:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. ICASSP 2022: 7872-7876 - [c289]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe:
Joint Speech Recognition and Audio Captioning. ICASSP 2022: 7892-7896 - [c288]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe:
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR. ICASSP 2022: 8287-8291 - [c287]Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models. ICASSP 2022: 8522-8526 - [c286]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c285]Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. ICASSP 2022: 9266-9270 - [c284]Yifan Peng, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. ICML 2022: 17627-17643 - [c283]Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20 - [c282]Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. INTERSPEECH 2022: 779-783 - [c281]Takashi Maekaku, Yuya Fujita, Yifan Peng, Shinji Watanabe:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. INTERSPEECH 2022: 1071-1075 - [c280]Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1111-1115 - [c279]Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. INTERSPEECH 2022: 1656-1660 - [c278]Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. INTERSPEECH 2022: 1746-1750 - [c277]Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan:
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1766-1770 - [c276]Yusuke Shinohara, Shinji Watanabe:
Minimum latency training of sequence transducers for streaming end-to-end speech recognition. INTERSPEECH 2022: 2098-2102 - [c275]Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Leibny Paola García-Perera, Yohei Kawaguchi:
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models. INTERSPEECH 2022: 2218-2222 - [c274]Muqiao Yang, Ian R. Lane, Shinji Watanabe:
Online Continual Learning of End-to-End Speech Recognition Models. INTERSPEECH 2022: 2668-2672 - [c273]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. INTERSPEECH 2022: 2953-2957 - [c272]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe:
Two-Pass Low Latency End-to-End Spoken Language Understanding. INTERSPEECH 2022: 3478-3482 - [c271]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. INTERSPEECH 2022: 3533-3537 - [c270]Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, Shinji Watanabe:
When Is TTS Augmentation Through a Pivot Language Useful? INTERSPEECH 2022: 3538-3542 - [c269]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. INTERSPEECH 2022: 3819-3823 - [c268]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, Shinji Watanabe:
Residual Language Model for End-to-end Speech Recognition. INTERSPEECH 2022: 3899-3903 - [c267]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. INTERSPEECH 2022: 4272-4276 - [c266]Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. INTERSPEECH 2022: 4277-4281 - [c265]Jaesong Lee, Lukas Lee, Shinji Watanabe:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. INTERSPEECH 2022: 4441-4445 - [c264]Yui Sudo, Muhammad Shakeel, Kazuhiro Nakadai, Jiatong Shi, Shinji Watanabe:
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection. INTERSPEECH 2022: 4641-4645 - [c263]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. INTERSPEECH 2022: 4885-4889 - [c262]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida:
Better Intermediates Improve CTC Inference. INTERSPEECH 2022: 4965-4969 - [c261]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [c260]Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondrej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vera Kloudová, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Miguel Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe:
Findings of the IWSLT 2022 Evaluation Campaign. IWSLT@ACL 2022: 98-157 - [c259]Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2022 Dialect Speech Translation System. IWSLT@ACL 2022: 298-307 - [c258]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. LREC 2022: 1061-1067 - [c257]Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. SLT 2022: 84-91 - [c256]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. SLT 2022: 260-265 - [c255]Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. SLT 2022: 406-413 - [c254]Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487 - [c253]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. SLT 2022: 496-501 - [c252]