default search action
ASRU 2023: Taipei, Taiwan
- IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023. IEEE 2023, ISBN 979-8-3503-0689-7
- Shilong Wu, Jun Du, Mao-Kui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee:
Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention. 1-8 - Da-Hee Yang, Joon-Hyuk Chang:
Towards Robust Packet Loss Concealment System With ASR-Guided Representations. 1-8 - Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu:
Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning. 1-8 - Zexu Pan, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux:
Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction. 1-8 - Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. 1-8 - Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan Honza Silovsky:
Importance of Smoothness Induced by Optimizers in Fl4Asr: Towards Understanding Federated Learning for End-To-End ASR. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
QUICKVC: A Lightweight VITS-Based Any-to-Many Voice Conversion Model using ISTFT for Faster Conversion. 1-7 - Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton:
Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning. 1-7 - Bahman Mirheidari, Ronan O'Malley, Daniel Blackburn, Heidi Christensen:
Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia using Speech Analysis. 1-6 - Sathvik Udupa, Jesuraja Bandekar, Deekshitha G, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati:
Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages. 1-8 - Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. 1-8 - Mark Lindsey, Nathaniel R. Robinson, Francis Kubala, Richard M. Stern:
Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models. 1-7 - Alexandra Antonova:
Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization. 1-8 - Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. 1-8 - Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie:
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS. 1-7 - Anusha Prakash, Srinivasan Umesh, Hema A. Murthy:
Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments. 1-8 - Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie:
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. 1-8 - Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastrow, Ivan Bulyko:
Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. 1-8 - Yuewei Zhang, Huanbin Zou, Jie Zhu:
Vsanet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention. 1-8 - Chenglin Xu, Xiguang Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu:
KAQ: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning. 1-8 - Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin:
CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition. 1-5 - Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain:
MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition. 1-6 - Gan Song, Zelin Wu, Golan Pundak, Angad Chandorkar, Kandarp Joshi, Xavier Velez, Diamantino Caseiro, Ben Haynor, Weiran Wang, Nikhil Siddhartha, Pat Rondon, Khe Chai Sim:
Contextual Spelling Correction with Large Language Models. 1-8 - Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie:
Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis. 1-8 - Elaf Islam, Thomas Hain, Protima Nomo Sudro:
Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. 1-6 - Jae-Hong Lee, Do-Hee Kim, Joon-Hyuk Chang:
AWMC: Online Test-Time Adaptation Without Mode Collapse for Continual Adaptation. 1-8 - Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. 1-8 - Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. 1-8 - Daniel Galvez, Tim Kaldewey:
GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition. 1-7 - Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu:
Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. 1-7 - Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He:
Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. 1-8 - Alexander Blatt, Badr M. Abdullah, Dietrich Klakow:
Ending the Blind Flight: Analyzing the Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control. 1-8 - Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet:
Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation. 1-8 - Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. 1-9 - Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao:
VITS-Based Singing Voice Conversion System with DSPGAN Post-Processing for SVCC2023. 1-8 - Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:
Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
Using Joint Training Speaker Encoder With Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion. 1-8 - Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura:
After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition. 1-8 - Yan Huang, Piyush Behre, Guoli Ye, Shawn Chang, Yifan Gong:
Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model. 1-6 - Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu:
PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System. 1-8 - Pavel Denisov, Ngoc Thang Vu:
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding. 1-8 - Mun-Hak Lee, Sang-Eon Lee, Ji-Eun Choi, Joon-Hyuk Chang:
Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training. 1-8 - Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki:
Generalized Zero-Shot Audio-to-Intent Classification. 1-8 - Rajeev Rajan, Noumida Abdul Kareem, Sreelakshmi S:
Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation. 1-7 - Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li:
Bisinger: Bilingual Singing Voice Synthesis. 1-8 - Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu:
Few-Shot Spoken Language Understanding Via Joint Speech-Text Models. 1-8 - Jiajun He, Zekun Yang, Tomoki Toda:
ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. 1-6 - Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu:
LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR. 1-8 - Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng:
Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking. 1-8 - Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li:
Haha-POD: An Attempt for Laughter-Based Non-Verbal Speaker Verification. 1-7 - Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee:
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization. 1-8 - Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models. 1-7 - Ji-Hwan Mo, Jae-Jin Jeon, Mun-Hak Lee, Joon-Hyuk Chang:
Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments. 1-7 - Tanel Alumäe, Jiaming Kong, Daniil Robnikov:
Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge. 1-7 - William Ravenscroft, Stefan Goetze, Thomas Hain:
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments. 1-7 - Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie:
MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement. 1-8 - Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. 1-8 - Chun-Yi Kuan, Chen-An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-Yi Lee:
Towards General-Purpose Text-Instruction-Guided Voice Conversion. 1-8 - Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke:
Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting. 1-8 - Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao:
Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility. 1-7 - Sibo Tong, Philip Harding, Simon Wiesler:
Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers. 1-8 - Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. 1-8 - Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung:
Consistency Based Unsupervised Self-Training for ASR Personalisation. 1-8 - Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen:
Variational Gaussian Process Data Uncertainty. 1-8 - Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung:
Transformer Attractors for Robust and Efficient End-To-End Neural Diarization. 1-8 - Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:
Cross-Modal Alignment With Optimal Transport For CTC-Based ASR. 1-7 - Kailai Shen, Diqun Yan, Li Dong, Ying Ren, Xiaoxun Wu, Jing Hu:
SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction. 1-6 - Chang Chen, Xun Gong, Yanmin Qian:
Efficient Text-Only Domain Adaptation For CTC-Based ASR. 1-7 - Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney:
Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers. 1-8 - Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition. 1-8 - Dongning Yang, Wei Wang, Yanmin Qian:
FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition. 1-8 - David Qiu, Shaojin Ding, Yanzhang He:
The Role of Feature Correlation on Quantized Neural Networks. 1-7 - Shaoxiong Lin, Chao Zhang, Yanmin Qian:
Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning. 1-7 - Yoshiki Sato, Julián Villegas:
Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise. 1-5 - Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
Yodas: Youtube-Oriented Dataset for Audio and Speech. 1-8 - Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li:
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection. 1-6 - Hiroyoshi Yamasaki, Jérôme Louradour, Julie Hunter, Laurent Prévot:
Transcribing and Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations. 1-6 - Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Ghosh:
Flap: Fast Language-Audio Pre-Training. 1-8 - Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:
COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. 1-8 - Jiarui Hai, Yu-Jeh Liu, Mounya Elhilali:
Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis. 1-8 - Armand Stricker, Patrick Paroubek:
Enhancing Task-Oriented Dialogues With Chitchat: A Comparative Study Based on Lexical Diversity And Divergence. 1-8 - Seongjin Park, Rutuja Ubale:
Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring. 1-7 - Martin Sustek, Sonal Joshi, Henry Li, Thomas Thebaud, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:
Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks. 1-8 - Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. 1-6 - Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston H. Hsu, Hung-Yi Lee:
Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models. 1-8 - Tzu-Quan Lin, Hung-Yi Lee, Hao Tang:
MelHuBERT: A Simplified Hubert on Mel Spectrograms. 1-8 - Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. 1-8 - Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang:
Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers. 1-6 - Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu:
SLM: Bridge the Thin Gap Between Speech and Text Foundation Models. 1-8 - Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet:
Detection of Vowel Errors in Children's Speech using Synthetic Phonetic Transcripts. 1-8 - Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu:
On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration. 1-8 - Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung:
Locality Enhanced Dynamic Biasing and Sampling Strategies For Contextual ASR. 1-8 - Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. 1-7 - Junchen Liu, Jesin James, Karan Nathwani:
Improved Multi-Modal Emotion Recognition Using Squeeze-and-Excitation Block in Cross-Modal Attention. 1-8 - Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li:
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach. 1-7 - Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet:
Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments. 1-6 - Jun-You Wang, Hung-Yi Lee, Jyh-Shing Roger Jang, Li Su:
Zero-Shot Singing Voice Synthesis from Musical Score. 1-8 - Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose:
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition. 1-8 - Jerome R. Bellegarda:
Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation. 1-8 - Daichi Hayakawa, Takehiko Kagoshima, Kenji Iwata, Norbert Braunschweiler, Rama Doddipatla:
Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances. 1-7 - Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur:
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments. 1-8 - Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer. 1-8 - Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar, Supreeth Rao:
HEVAL: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks. 1-7 - Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristià, Emmanuel Dupoux, Hervé Bredin:
Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation. 1-7 - Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie:
Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation. 1-8 - Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu:
Acoustic Model Fusion For End-to-End Speech Recognition. 1-7 - Yusuke Shinohara, Shinji Watanabe:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. 1-7 - Pasquale D'Alterio, Christian Hensel, Bashar Awwad Shiekh Hasan:
Can Unpaired Textual Data Replace Synthetic Speech in ASR Model Adaptation? 1-8 - Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland Barnard, Keith A. Josephs, Jennifer L. Whitwell, David T. Jones, Hugo Botha:
Not All Errors Are Created Equal: Evaluating The Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations. 1-6 - Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. 1-6 - Varun Krishna, Sriram Ganapathy:
Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations. 1-8 - Yuewei Zhang, Huanbin Zou, Jie Zhu:
Magnitude-and-Phase-Aware Speech Enhancement With Parallel Sequence Modeling. 1-8 - Jen-Tzung Chien, Wei-Yu Sun:
Adversarial Augmentation For Adapter Learning. 1-7 - Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu:
Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement. 1-8 - Can Cui, Imran A. Sheikh, Mostafa Sadeghi, Emmanuel Vincent:
End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis. 1-8 - Sakriani Sakti, Benita Angela Titalim:
Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task. 1-8 - Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-Yu Huang, Shang-Wen Li, Hung-Yi Lee:
Prompting and Adapter Tuning For Self-Supervised Encoder-Decoder Speech Model. 1-8 - Yuang Li, Yu Wu, Jinyu Li, Shujie Liu:
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition. 1-8 - Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. 1-6 - Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi:
Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus. 1-7 - William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. 1-8 - Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:
Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers. 1-8 - Wonjun Lee, Gary Geunbae Lee, Yunsu Kim:
Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition And Phoneme To Grapheme Translation. 1-8 - Junteng Jia, Ke Li, Mani Malek, Kshitiz Malik, Jay Mahadeokar, Ozlem Kalinli, Frank Seide:
Joint Federated Learning and Personalization for on-Device ASR. 1-8 - Vanitha Devi R, Vasundhara:
Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing aids. 1-5 - Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie:
Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling. 1-8 - Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff-Korbayová, Josef van Genabith:
Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain. 1-7 - Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James R. Glass:
Joint Audio and Speech Understanding. 1-8 - Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow:
Two-Pass Endpoint Detection for Speech Recognition. 1-8 - Hillary Ngai, Rohan Agrawal, Neeraj Gaur, W. Ronny Huang, Parisa Haghani, Pedro Moreno Mengibar:
Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition. 1-8 - Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli:
Permod: Perceptually Grounded Voice Modification With Latent Diffusion Models. 1-8 - Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. 1-6 - Amit Meghanani, Thomas Hain:
Deriving Translational Acoustic Sub-Word Embeddings. 1-8 - Yusheng Tian, Wei Liu, Tan Lee:
Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data. 1-7 - Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney:
End-To-End Training of a Neural HMM with Label and Transition Probabilities. 1-8 - Jenthe Thienpondt, Kris Demuynck:
ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. 1-8 - Gil Keren:
A Token-Wise Beam Search Algorithm for RNN-T. 1-8 - Wangyou Zhang, Lei Yang, Yanmin Qian:
Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing. 1-6 - Abinay Reddy Naini, Shruthi Subramanium, Seong-Gyun Leem, Carlos Busso:
Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech. 1-8 - Yanmei Gu, Jing Li, Jiayi Zhou, Zhiming Wang, Huijia Zhu:
Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification. 1-8 - Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie:
U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. 1-8 - Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg:
Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech. 1-8 - Gene-Ping Yang, Hao Tang:
Towards Matching Phones and Speech Representations. 1-8 - Abderrahim Fathan, Jahangir Alam:
CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition. 1-8 - Lakshmi Rajendram Bashyam, Alexander Blatt, Dietrich Klakow:
Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection. 1-8 - Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong:
Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training. 1-7 - Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee:
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking. 1-8 - Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie:
Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR. 1-7 - Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains. 1-7 - Jason Clarke, Yoshihiko Gotoh, Stefan Goetze:
Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer. 1-8 - Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:
WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer. 1-8 - Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen:
E3 TTS: Easy End-to-End Diffusion-Based Text To Speech. 1-8 - Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli:
No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation. 1-8 - Mohan Li, Catalin Zorila, Cong-Thanh Do, Rama Doddipatla:
Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs. 1-8 - Geoffroy Vanderreydt, Amrutha Prasad, Driss Khalil, Srikanth R. Madikeri, Kris Demuynck, Petr Motlícek:
Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition. 1-7 - Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie:
An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation. 1-7 - Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli:
Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection. 1-8 - Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li:
Promptspeaker: Speaker Generation Based on Text Descriptions. 1-7 - Nicholas Sanders, Korin Richmond:
Invert-Classify: Recovering Discrete Prosody Inputs for Text-To-Speech. 1-7 - Hao Zhang, Meng Yu, Dong Yu:
Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings. 1-7 - Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews:
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning. 1-7 - Peng Shen, Xuguang Lu, Hisashi Kawai:
Generative Linguistic Representation for Spoken Language Identification. 1-8 - Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen:
Preserving Phonemic Distinctions For Ordinal Regression: A Novel Loss Function For Automatic Pronunciation Assessment. 1-7 - Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning. 1-7 - Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. 1-6 - Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang:
Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation. 1-7 - Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe:
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. 1-8 - Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter:
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition. 1-8 - Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth:
MASR: Multi-Label Aware Speech Representation. 1-8 - Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao:
Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model. 1-8 - Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang:
Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection. 1-8 - Quentin Meeus, Marie-Francine Moens, Hugo Van hamme:
Whisper-Slu: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding. 1-6 - Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko:
Discriminative Speech Recognition Rescoring With Pre-Trained Language Models. 1-7 - Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu:
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement. 1-6 - Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert, David T. Jones, Hugo Botha:
Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model. 1-7 - Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi:
Speech Emotion Diarization: Which Emotion Appears When? 1-7 - Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim:
Transduce and Speak: Neural Transducer for Text-To-Speech with Semantic Token Prediction. 1-7 - Artit Suwanbandit, Jaturong Chitiyaphol, Sutthinan Chuenchom, Kanyarat Kwiecien, Husen Sawal, Ruslan Uthai, Orathai Sangpetch, Ekapol Chuangsuwanich:
Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora. 1-8 - Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao:
LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models. 1-8 - Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier:
Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility? 1-7 - Zhengyang Li, Thomas Graave, Jing Liu, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt:
Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition. 1-8 - Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. 1-8 - Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah:
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting. 1-7 - Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie:
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. 1-7 - Jian Xue, Peidong Wang, Jinyu Li, Eric Sun:
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability. 1-7 - Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang:
Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment. 1-8 - Zhaofeng Lin, Tanvina Patel, Odette Scharenborg:
Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation. 1-8 - Jeong-Hwan Choi, Jehyun Kyung, Ju-Seok Seong, Ye-Rin Jeoung, Joon-Hyuk Chang:
Extending Self-Distilled Self-Supervised Learning For Semi-Supervised Speaker Verification. 1-8 - Yosuke Higuchi, Andrew Rosenberg, Yuan Wang, Murali Karthick Baskar, Bhuvana Ramabhadran:
Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder. 1-8 - Maliha Jahan, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, Jesús Villalba:
Model-Based Fairness Metric for Speaker Verification. 1-7
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.