iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://dblp.org/db/conf/interspeech/interspeech2022.html

dblp: Interspeech 2022

default search action

combined dblp search
author search
venue search
publication search

ask others

23rd Interspeech 2022: Incheon, Korea

> Home > Conferences and Workshops > Interspeech

SPARQL queries

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/2022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/2022
Hanseok Ko, John H. L. Hansen:
23rd Annual Conference of the International Speech Communication Association, Interspeech 2022, Incheon, Korea, September 18-22, 2022. ISCA 2022

Speech Synthesis: Toward end-to-end synthesis

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoJLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoJLW22
Hyunjae Cho, Wonbin Jung, Junhyeok Lee, Sang Hoon Woo:
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech. 1-5
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeJ22
Hanbin Bae, Young-Sun Joo:
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch. 6-10
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LengletPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LengletPB22
Martin Lenglet, Olivier Perrotin, Gérard Bailly:
Speaking Rate Control of end-to-end TTS Models by Direct Manipulation of the Encoder's Output Embeddings. 11-15
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JuKYKKM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JuKYKKM022
Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. 16-20
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LimJK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LimJK22
Dan Lim, Sunghee Jung, Eesung Kim:
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech. 21-25

Technology for Disordered Speech

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TurrisiB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TurrisiB22
Rosanna Turrisi, Leonardo Badino:
Interpretable dysarthric speaker adaptation based on optimal-transport. 26-30
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueLCBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueLCBC22
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic:
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. 31-35
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PranantaH0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PranantaH0S22
Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg:
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition. 36-40
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VioletaHT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VioletaHT22
Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. 41-45
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhatPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhatPS22
Chitralekha Bhat, Ashish Panda, Helmer Strik:
Improved ASR Performance for Dysarthric Speech Using Two-stage DataAugmentation. 46-50
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HernandezPNOMY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HernandezPNOMY22
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang:
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition. 51-55

Neural Network Training Methods for ASR I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCLSPK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCLSPK22
Mun-Hak Lee, Joon-Hyuk Chang, Sang-Eon Lee, Ju-Seok Seong, Chanhee Park, Haeyoung Kwon:
Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights. 56-60
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanG22
David M. Chan, Shalini Ghosh:
Content-Context Factorized Representations for Automated Speech Recognition. 61-65
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KarakasidisGK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KarakasidisGK22
Georgios Karakasidis, Tamás Grósz, Mikko Kurimo:
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR. 66-70
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BabyDM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BabyDM22
Deepak Baby, Pasquale D'Alterio, Valentin Mendelev:
Incremental learning for RNN-Transducer based speech recognition models. 71-75
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HardPCASP0NNLMB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HardPCASP0NNLMB22
Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio López-Moreno, Rajiv Mathews, Françoise Beaufays:
Production federated keyword spotting via distillation, filtering, and joint federated-centralized training. 76-80

Acoustic Phonetics and Prosody

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongJK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongJK22
Jieun Song, Hae-Sung Jeon, Jieun Kiaer:
Use of prosodic and lexical cues for disambiguating wh-words in Korean. 81-85
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RibeiroL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RibeiroL22
Vinicius Ribeiro, Yves Laprie:
Autoencoder-Based Tongue Shape Estimation During Continuous Speech. 86-90
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MagistroC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MagistroC22
Giuseppe Magistro, Claudia Crocco:
Phonetic erosion and information structure in function words: the case of mia. 91-95
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhL22
Miran Oh, Yoon-Jeong Lee:
Dynamic Vertical Larynx Actions Under Prosodic Focus. 96-100
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BradshawCJD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BradshawCJD22
Leah Bradshaw, Eleanor Chodroff, Lena A. Jäger, Volker Dellwo:
Fundamental Frequency Variability over Time in Telephone Interactions. 101-105

Spoken Machine Translation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsiamasGFC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsiamasGFC22
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà:
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation. 106-110
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoYHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoYHS22
Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi:
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation. 111-115
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaidiL0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaidiL0K22
Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim:
Cross-Modal Decision Regularization for Simultaneous Speech Translation. 116-120
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FukudaS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FukudaS022
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura:
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation. 121-125
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RKNJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RKNJ22
Kirandevraj R, Vinod Kumar Kurmi, Vinay P. Namboodiri, C. V. Jawahar:
Generalized Keyword Spotting using ASR embeddings. 126-130

(Multimodal) Speech Emotion Recognition I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnLS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnLS22
Youngdo Ahn, Sung Joo Lee, Jong Won Shin:
Multi-Corpus Speech Emotion Recognition for Unseen Corpus Using Corpus-Wise Weights in Classification Loss. 131-135
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimAK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimAK22
Junghun Kim, Yoojin An, Jihie Kim:
Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms. 136-140
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lee22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lee22
Joosung Lee:
The Emotion is Not One-hot Encoding: Learning with Grayscale Label for Emotion Recognition in Conversation. 141-145
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos22
Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller:
Probing speech emotion recognition transformers for linguistic knowledge. 146-150
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrabhuCLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrabhuCLG22
Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann:
End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks. 151-155
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PerezJNGRTLKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PerezJNGRTLKP22
Matthew Perez, Mimansa Jaiswal, Minxue Niu, Cristina Gorrostieta, Matthew Roddy, Kye Taylor, Reza Lotfian, John Kane, Emily Mower Provost:
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition. 156-160
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChouLB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChouLB22
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso:
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier. 161-165
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DhamyalRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DhamyalRS22
Hira Dhamyal, Bhiksha Raj, Rita Singh:
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection. 166-170

Dereverberation, Noise Reduction, and Speaker Extraction

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoKA22
Tuan Vu Ho, Maori Kobayashi, Masato Akagi:
Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion. 171-175
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoNAU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoNAU22
Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki:
Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement. 176-180
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSCS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSCS22
Minseung Kim, Hyungchan Song, Sein Cheong, Jong Won Shin:
iDeepMMSE: An improved deep learning approach to MMSE speech and noise power spectrum estimation for speech enhancement. 181-185
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HungFTC0L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HungFTC0L22
Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin:
Boosting Self-Supervised Embeddings for Speech Enhancement. 186-190
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangPP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangPP22
Seorim Hwang, Youngcheol Park, Sungwook Park:
Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections. 191-195
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MuckenhirnSEQTW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MuckenhirnSEQTW22
Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey:
CycleGAN-based Unpaired Speech Dereverberation. 196-200
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004W22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004W22
Ashutosh Pandey, DeLiang Wang:
Attentive Training: A New Training Framework for Talker-independent Speaker Extraction. 201-205
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VuongS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VuongS22
Tyler Vuong, Richard M. Stern:
Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement. 206-210
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengCSY0C22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengCSY0C22
Chiang-Jen Peng, Yun-Ju Chan, Yih-Liang Shen, Cheng Yu, Yu Tsao, Tai-Shih Chi:
Perceptual Characteristics Based Multi-objective Model for Speech Enhancement. 211-215
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DelcroixKOZSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DelcroixKOZSN22
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? 216-220
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiW0DK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiW0DK22
Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara:
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction. 221-225
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LemercierTKG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LemercierTKG22
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann:
Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments. 226-230

Source Separation II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchmidtPM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchmidtPM22
Nicolás Schmidt, Jordi Pons, Marius Miron:
PodcastMix: A dataset for separating music and speech in podcasts. 231-235
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoS22
Kohei Saijo, Robin Scheibler:
Independence-based Joint Dereverberation and Separation with Neural Source Model. 236-240
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoS22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoS22a
Kohei Saijo, Robin Scheibler:
Spatial Loss for Unsupervised Multi-channel Source Separation. 241-245
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BellowsL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BellowsL22
Samuel Bellows, Timothy W. Leishman:
Effect of Head Orientation on Speech Directivity. 246-250
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoO22
Kohei Saijo, Tetsuji Ogawa:
Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals. 251-255
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BorsdorfS0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BorsdorfS0S22
Marvin Borsdorf, Kevin Scheck, Haizhou Li, Tanja Schultz:
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language. 256-260
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuzikK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuzikK22
Mateusz Guzik, Konrad Kowalczyk:
NTF of Spectral and Spatial Features for Tracking and Separation of Moving Sound Sources in Spherical Harmonic Domain. 261-265
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeadmanB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeadmanB22
Jack Deadman, Jon Barker:
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation. 266-270
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoddekerCNH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoddekerCNH22
Christoph Böddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach:
An Initialization Scheme for Meeting Separation with Spatial Mixture Models. 271-275
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunGLHLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunGLHLK22
Seongkyu Mun, Dhananjaya Gowda, Jihwan Lee, Changwoo Han, Dokyun Lee, Chanwoo Kim:
Prototypical speaker-interference loss for target voice separation using non-parallel audio samples. 276-280

Embedding and Network Architecture for Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BousquetRB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BousquetRB22
Pierre-Michel Bousquet, Mickael Rouvier, Jean-François Bonastre:
Reliability criterion based on learning-phase entropy for speaker recognition with neural network. 281-285
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCQ22
Bei Liu, Zhengyang Chen, Yanmin Qian:
Attentive Feature Fusion for Robust Speaker Verification. 286-290
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCQ22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCQ22a
Bei Liu, Zhengyang Chen, Yanmin Qian:
Dual Path Embedding Learning for Speaker Verification with Triplet Attention. 291-295
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCWWHQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCWWHQ22
Bei Liu, Zhengyang Chen, Shuai Wang, Haoyu Wang, Bing Han, Yanmin Qian:
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design. 296-300
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiFML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiFML22
Ruida Li, Shuo Fang, Chenguang Ma, Liang Li:
Adaptive Rectangle Loss for Speaker Verification. 301-305
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLWZH0LM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLWZH0LM22
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng:
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification. 306-310
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCQ22
Leying Zhang, Zhengyang Chen, Yanmin Qian:
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification. 311-315
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianLL22
Yusheng Tian, Jingyu Li, Tan Lee:
Transport-Oriented Feature Aggregation for Speaker Embedding Learning. 316-320
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SangH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SangH22
Mufan Sang, John H. L. Hansen:
Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning. 321-325
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaiYCTC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaiYCTC22
Linjun Cai, Yuhong Yang, Xufeng Chen, Weiping Tu, Hongyang Chen:
CS-CTCSCONV1D: Small footprint speaker verification with channel split time-channel-time separable 1-dimensional convolution. 326-330
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLHW22
Pengqi Li, Lantian Li, Askar Hamdulla, Dong Wang:
Reliable Visualization for Deep Speaker Recognition. 331-335
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengHDLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengHDLW22
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan:
Unifying Cosine and PLDA Back-ends for Speaker Verification. 336-340
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiDLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiDLW22
Yuheng Wei, Junzhao Du, Hui Liu, Qian Wang:
CTFALite: Lightweight Channel-specific Temporal and Frequency Attention Mechanism for Enhancing the Speaker Embedding Extractor. 341-345

Speech Representation II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXXPD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXXPD22
Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du:
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech. 346-350
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Feinberg22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Feinberg22
David Feinberg:
VoiceLab: Software for Fully Reproducible Automated Voice Analysis. 351-355
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShorV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShorV22
Joel Shor, Subhashini Venugopalan:
TRILLsson: Distilled Universal Paralinguistic Speech Representations. 356-360
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGWU0D22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGWU0D22
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li, Jianwu Dang:
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network. 361-365
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadeghiM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadeghiM22
Mostafa Sadeghi, Paul Magron:
A Sparsity-promoting Dictionary Model for Variational Autoencoders. 366-370
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWYZZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWYZZZ22
Yan Zhao, Jincen Wang, Ru Ye, Yuan Zong, Wenming Zheng, Li Zhao:
Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition. 371-375
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HansenW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HansenW22
John H. L. Hansen, Zhenyu Wang:
Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning. 376-380
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BergsmaYC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BergsmaYC22
Boris Bergsma, Minhao Yang, Milos Cernak:
PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition. 381-385
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElbannaBSOMKBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElbannaBSOMKBC22
Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak:
Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load. 386-390
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHGB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHGB22
Shijun Wang, Hamed Hemati, Jón Guðnason, Damian Borth:
Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition. 391-395
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavZ22
Sarthak Yadav, Neil Zeghidour:
Learning neural audio features without supervision. 396-400
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWW22
Yixuan Zhang, Heming Wang, DeLiang Wang:
Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech. 401-405
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FarideeG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FarideeG22
Abu Zaher Md Faridee, Hannes Gamper:
Predicting label distribution improves non-intrusive speech quality estimation. 406-410
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshiharaMMT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshiharaMMT22
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka:
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models. 411-415
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzeemiQR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzeemiQR22
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza:
Dataset Pruning for Resource-constrained Spoofed Audio Detection. 416-420

Speech Synthesis: Linguistic Processing, Paradigms and Other Topics II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaeKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaeKK22
Jaesung Tae, Hyeongju Kim, Taesu Kim:
EdiTTS: Score-based Editing for Controllable Text-to-Speech. 421-425
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSTWK0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSTWK0M22
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information. 426-430
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BorsosST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BorsosST22
Zalan Borsos, Matthew Sharifi, Marco Tagliasacchi:
SpeechPainter: Text-conditioned Speech Inpainting. 431-435
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZZL22
Song Zhang, Ken Zheng, Xiaoxu Zhu, Baoxiang Li:
A polyphone BERT for Polyphone Disambiguation in Mandarin Chinese. 436-440
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001Y0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001Y0S22
Mutian He, Jingzhou Yang, Lei He, Frank K. Soong:
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge. 441-445
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuZJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuZJ22
Jian Zhu, Cong Zhang, David Jurgens:
ByT5 model for massively multilingual grapheme-to-phoneme conversion. 446-450
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MathurDTGNMJM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MathurDTGNMJM22
Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, Jiuxiang Gu, Ani Nenkova, Vlad I. Morariu, Rajiv Jain, Dinesh Manocha:
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis. 451-455
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangS0TYLWZQLZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangS0TYLWZQLZ22
Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao:
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech. 456-460
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiWGQ0CH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiWGQ0CH22
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition. 461-465
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranCHBT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranCHBT22
Tho Nguyen Duc Tran, The Chuong Chu, Vu Hoang, Trung Huu Bui, Steven Hung Quoc Truong:
An Efficient and High Fidelity Vietnamese Streaming End-to-End Speech Synthesis. 466-470
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Valentini-Botinhao22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Valentini-Botinhao22
Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter:
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks. 471-475
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWP022
Zikai Chen, Lin Wu, Junjie Pan, Xiang Yin:
An Automatic Soundtracking System for Text-to-Speech Audiobooks. 476-480
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanZL22
Daxin Tan, Guangyan Zhang, Tan Lee:
Environment Aware Text-to-Speech Synthesis. 481-485
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PloujnikovR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PloujnikovR22
Artem Ploujnikov, Mirco Ravanelli:
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation. 486-490
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BakhturinaZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BakhturinaZG22
Evelina Bakhturina, Yang Zhang, Boris Ginsburg:
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization. 491-495
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VirkarFEB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VirkarFEB22
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote:
Prosodic alignment for off-screen automatic dubbing. 496-500
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiKZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiKZ22
Qibing Bai, Tom Ko, Yu Zhang:
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis. 501-505
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KameokaKST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KameokaKST22
Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki, Kou Tanaka:
CAUSE: Crossmodal Action Unit Sequence Estimation from Speech. 506-510
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbeysingheJ0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbeysingheJ0M22
Binu Nisal Abeysinghe, Jesin James, Catherine I. Watson, Felix Marattukalam:
Visualising Model Training via Vowel Space for Text-To-Speech Systems. 511-515

Other Topics in Speech Recognition

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Saeed22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Saeed22
Aaqib Saeed:
Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices. 516-520
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kanda0WXMWG00Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kanda0WXMWG00Y22
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. 521-525
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakishimaSAM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakishimaSAM22
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura:
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data. 526-530
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0YZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0YZ22
Yi-Kai Zhang, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan:
Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation. 531-535
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaMZSKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaMZSKS22
Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide:
Federated Domain Adaptation for ASR with Full Self-Supervision. 536-540
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangW0LS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangW0LS22
Longfei Yang, Wenqing Wei, Sheng Li, Jiyi Li, Takahiro Shinozaki:
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection. 541-545
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KonsAMDK0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KonsAMDK0S22
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon:
Extending RNN-T-based speech recognition systems with emotion and language classification. 546-549
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AntonovaBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AntonovaBG22
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization. 550-554
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoNTO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoNTO22
Yeonjin Cho, Sara Ng, Trang Tran, Mari Ostendorf:
Leveraging Prosody for Punctuation Prediction of Spontaneous Speech. 555-559
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuDZL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuDZL022
Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie:
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings. 560-564

Audio Deep PLC (Packet Loss Concealment) Challenge

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuanYLZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuanYLZW22
Yuansheng Guan, Guochen Yu, Andong Li, Chengshi Zheng, Jie Wang:
TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network. 565-569
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValinMMTKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValinMMTKSK22
Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy:
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model. 570-574
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuSYYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuSYYW22
Baiyun Liu, Qi Song, Mingxue Yang, Wuwen Yuan, Tianbao Wang:
PLCNet: Real-time Packet Loss Concealment with Semi-supervised Generative Adversarial Network. 575-579
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DienerSBSAC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DienerSBSAC22
Lorenz Diener, Sten Sootla, Solomiya Branets, Ando Saabas, Robert Aichner, Ross Cutler:
INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge. 580-584
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZZGY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZZGY22
Nan Li, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu:
End-to-End Multi-Loss Training for Low Delay Packet Loss Concealment. 585-589

Robust Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimHSY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimHSY22
Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu:
Extended U-Net for Speaker Verification in Noisy Environments. 590-594
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangDCPY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangDCPY22
Seunghan Yang, Debasmit Das, Janghoon Cho, Hyoungwoo Park, Sungrack Yun:
Domain Agnostic Few-shot Learning for Speaker Verification. 595-599
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangL022
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu:
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA? 600-604
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StafylakisMPRSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StafylakisMPRSB22
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký:
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries. 605-609
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuuR022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuuR022
Chau Luu, Steve Renals, Peter Bell:
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations. 610-614
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KatariaVMD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KatariaVMD22
Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak:
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification. 615-619

Speech Production

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshinagaMI22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshinagaMI22
Tsukasa Yoshinaga, Kikuo Maekawa, Akiyoshi Iida:
Variability in Production of Non-Sibilant Fricative [ç] in /hi/. 620-624
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdupaIG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdupaIG22
Sathvik Udupa, Aravind Illa, Prasanta Kumar Ghosh:
Streaming model for Acoustic to Articulatory Inversion with transformer networks. 625-629
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RakotomalalaBP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RakotomalalaBP22
Tsiky Rakotomalala, Pierre Baraduc, Pascal Perrier:
Trajectories predicted by optimal speech motor control using LSTM networks. 630-634
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiekerkXGKBX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiekerkXGKBX22
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu:
Exploration strategies for articulatory synthesis of complex syllable onsets. 635-639
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeK22
Yoonjeong Lee, Jody Kreiman:
Linguistic versus biological factors governing acoustic voice variation. 640-643
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Nagamine22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Nagamine22
Takayuki Nagamine:
Acquisition of allophonic variation in second language speech: An acoustic and articulatory study of English laterals by Japanese speakers. 644-648

Speech Quality Assessment

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Manocha0XMGIC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Manocha0XMGIC22
Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel Dejene Gebru, Vamsi Krishna Ithapu, Paul Calamia:
SAQAM: Spatial Audio Quality Assessment Metric. 649-653
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Manocha022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Manocha022
Pranay Manocha, Anurag Kumar:
Speech Quality Assessment through MOS using Non-Matching References. 654-658
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawaharaYSKBM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawaharaYSKBM22
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise:
An objective test tool for pitch extractors' response attributes. 659-663
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0LALZZWDU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0LALZZWDU22
Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki:
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. 664-668
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaiemPE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaiemPE22
Salah Zaiem, Titouan Parcollet, Slim Essid:
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. 669-673
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MumtazJJNG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MumtazJJNG22
Deebha Mumtaz, Ajit Jena, Vinit Jakhetiya, Karan Nathwani, Sharath Chandra Guntuku:
Transformer-based quality assessment model for generalized user-generated multimedia audio content. 674-678

Language Modeling and Lexical Modeling for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GyselHPOO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GyselHPOO22
Christophe Van Gysel, Mirko Hannemann, Ernest Pusateri, Youssef Oualil, Ilya Oparin:
Space-Efficient Representation of Entity-centric Query Language Models. 679-683
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingliwalSBGGK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingliwalSBGGK22
Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff:
Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems. 684-688
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangPSPSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangPSPSK22
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar:
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition. 689-693
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BreinerRVGMSGCM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BreinerRVGMSGCM22
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey:
UserLibri: A Dataset for ASR Personalization Using Only Text. 694-698
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienC22
Chin-Yueh Chien, Kuan-Yu Chen:
A BERT-based Language Modeling Framework. 699-703

Challenges and Opportunities for Signal Processing and Machine Learning for Multiple Smart Devices

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasuyamaYO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasuyamaYO22
Yoshiki Masuyama, Kouei Yamaoka, Nobutaka Ono:
Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones. 704-708
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CiccarelliBNCZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CiccarelliBNCZ22
Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang:
Challenges and Opportunities in Multi-device Speech Processing. 709-713
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Agaskar22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Agaskar22
Ameya Agaskar:
Practical Over-the-air Perceptual AcousticWatermarking. 714-718
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoppelmannBNGS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoppelmannBNGS022
Timm Koppelmann, Luca Becker, Alexandru Nelus, Rene Glitza, Lea Schönherr, Rainer Martin:
Clustering-based Wake Word Detection in Privacy-aware Acoustic Sensor Networks. 719-723
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NespoliBN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NespoliBN22
Francesco Nespoli, Daniel Barreda, Patrick A. Naylor:
Relative Acoustic Features for Distance Estimation in Smart-Homes. 724-728
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004X0DCW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004X0DCW22
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang:
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network. 729-733

Speech Processing & Measurement

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FietkauSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FietkauSB22
Arne-Lukas Fietkau, Simon Stone, Peter Birkholz:
Relationship between the acoustic time intervals and tongue movements of German diphthongs. 734-738
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatsuiIM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatsuiIM22
Sanae Matsui, Kyoji Iwamoto, Reiko Mazuka:
Development of allophonic realization until adolescence: A production study of the affricate-fricative variation of /z/ among Japanese children. 739-743
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnKSR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnKSR22
Chung Soo Ahn, L. L. Chamara Kasun, Sunil Sivadas, Jagath C. Rajapakse:
Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition. 744-748
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GibsonG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GibsonG22
Louise Coppieters de Gibson, Philip N. Garner:
Low-Level Physiological Implications of End-to-End Learning for Speech Recognition. 749-753
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MachadoDH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MachadoDH22
Carolina Lins Machado, Volker Dellwo, Lei He:
Idiosyncratic lingual articulation of American English /æ/ and /ɑ/ using network analysis. 754-758
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToyaZKNU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToyaZKNU22
Teruki Toya, Wenyu Zhu, Maori Kobayashi, Kenichi Nakamura, Masashi Unoki:
Method for improving the word intelligibility of presented speech using bone-conduction headphones. 759-763
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohapatraFZBF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohapatraFZBF22
Debasish Ray Mohapatra, Mario Fleischer, Victor Zappi, Peter Birkholz, Sidney S. Fels:
Three-dimensional finite-difference time-domain acoustic analysis of simplified vocal tract shapes. 764-768
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JongPND22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JongPND22
Dorina De Jong, Aldo Pastore, Noël Nguyen, Alessandro D'Ausilio:
Speech imitation skills predict automatic phonetic convergence: a GMM-UBM study on L2. 769-773
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeorgesSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeorgesSH22
Marc-Antoine Georges, Jean-Luc Schwartz, Thomas Hueber:
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE. 774-778
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0GBA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0GBA22
Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. 779-783
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshokumarS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshokumarS022
Monica Ashokumar, Jean-Luc Schwartz, Takayuki Ito:
Orofacial somatosensory inputs in speech perceptual training modulate speech production. 784-787

Speech Synthesis: Acoustic Modeling and Neural Waveform Generation I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJCALK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJCALK22
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim:
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus. 788-792
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiTY22
Takaaki Saeki, Kentaro Tachibana, Ryuichi Yamamoto:
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning. 793-797
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitsuiS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitsuiS22
Kentaro Mitsui, Kei Sawada:
MSR-NV: Neural Vocoder Using Multiple Sampling Rates. 798-802
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoizumiZYCB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoizumiZYCB22
Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani:
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. 803-807
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkCLPOS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkCLPOS22
Sangjun Park, Kihyun Choo, Joohyung Lee, Anton V. Porov, Konstantin Osipov, June Sig Sung:
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge. 808-812
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeYBJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeYBJ22
Jae-Sung Bae, Jinhyeok Yang, Taejun Bak, Young-Sun Joo:
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech. 813-817
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SubramaniVISK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SubramaniVISK22
Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy:
End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation. 818-822
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LamZCS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamZCS22
Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman:
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models. 823-827
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NikitarasVEKMRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NikitarasVEKMRS22
Karolos Nikitaras, Georgios Vamvoukakis, Nikolaos Ellinas, Konstantinos Klapsas, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis:
Fine-grained Noise Control for Multispeaker Speech Synthesis. 828-832
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiuzdakDRJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiuzdakDRJ22
Hubert Siuzdak, Piotr Dura, Pol van Rijn, Nori Jacoby:
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis. 833-837
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VovkSGPKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VovkSGPKW22
Ivan Vovk, Tasnima Sadekova, Vladimir Gogoryan, Vadim Popov, Mikhail A. Kudinov, Jiansheng Wei:
Fast Grad-TTS: Towards Efficient Diffusion-Based Speech Generation on CPU. 838-842
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLHABG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLHABG22
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. 843-847
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoneyamaWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoneyamaWT22
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. 848-852

Show and Tell I

- view
  - electronic edition @ isca-speech.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/ParkKJBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKJBG22
Taejin Park, Nithin Rao Koluguri, Fei Jia, Jagadeesh Balam, Boris Ginsburg:
NeMo Open Source Speaker Diarization System. 853-854
- view
  - electronic edition @ isca-speech.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Lin22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lin22
Baihan Lin:
Voice2Alliance: Automatic Speaker Diarization and Quality Assurance of Conversational Alignment. 855-856
- view
  - electronic edition @ isca-speech.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/KumarAKDRJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarAKDRJ22
Rishabh Kumar, Devaraja Adiga, Mayank Kothyari, Jatin Dalal, Ganesh Ramakrishnan, Preethi Jyothi:
VAgyojaka: An Annotating and Post-Editing Tool for Automatic Speech Recognition. 857-858
- view
  - electronic edition @ isca-speech.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/BadiPKARB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BadiPKARB22
Alzahra Badi, Chungho Park, Min-Seok Keum, Miguel Alba, Youngsuk Ryu, Jeongmin Bae:
SKYE: More than a conversational AI. 859-860

Spatial Audio

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunakataTK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunakataTK22
Hokuto Munakata, Ryu Takeda, Kazunori Komatani:
Training Data Generation with DOA-based Selecting and Remixing for Unsupervised Training of Deep Separation Models. 861-865
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYDZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYDZ22
Hangting Chen, Yi Yang, Feng Dang, Pengyuan Zhang:
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output. 866-870
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongWYF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongWYF22
Feifei Xiong, Pengyu Wang, Zhongfu Ye, Jinwei Feng:
Joint Estimation of Direction-of-Arrival and Distance for Arrays with Directional Sensors based on Sparse Bayesian Learning. 871-875
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuFSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuFSB22
Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello:
How to Listen? Rethinking Visual Sound Localization. 876-880
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OuyangW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OuyangW022
Zhiheng Ouyang, Miao Wang, Wei-Ping Zhu:
Small Footprint Neural Networks for Acoustic Direction of Arrival Estimation. 881-885
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangKPL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangKPL22
Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu:
Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation. 886-890
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinGFZWZQD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinGFZWZQD22
Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang:
MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources. 891-895
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuGYQWZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuGYQWZD22
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang:
Iterative Sound Source Localization for Unknown Number of Sources. 896-900
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PattersonWWH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PattersonWWH22
Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Distance-Based Sound Separation. 901-905
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGPWD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGPWD22
Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang:
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network. 906-910
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AroudiUF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AroudiUF22
Ali Aroudi, Stefan Uhlich, Marc Ferras Font:
TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation. 911-915

Single-channel Speech Enhancement II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeHLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeHLG22
Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan:
PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement. 916-920
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZ22
Zhuangqi Chen, Pingjian Zhang:
Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement. 921-925
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengLXZSJP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengLXZSJP22
Jiaming Cheng, Ruiyu Liang, Yue Xie, Li Zhao, Björn W. Schuller, Jie Jia, Yiyuan Peng:
Cross-Layer Similarity Knowledge Distillation for Speech Enhancement. 926-930
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongCWLF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongCWLF22
Feifei Xiong, Weiguang Chen, Pengyu Wang, Xiaofei Li, Jinwei Feng:
Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation. 931-935
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoAY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoAY22
Ruizhe Cao, Sherif Abdulatif, Bin Yang:
CMGAN: Conformer-based Metric GAN for Speech Enhancement. 936-940
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiHZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiHZ22
Zeyuan Wei, Li Hao, Xueliang Zhang:
Model Compression by Iterative Pruning with Knowledge Distillation and Its Application to Speech Enhancement. 941-945
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangP22
Chenhui Zhang, Xiang Pan:
Single-channel speech enhancement using Graph Fourier Transform. 946-950
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Guo0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Guo0Y22
Zilu Guo, Xu Xu, Zhongfu Ye:
Joint Optimization of the Module and Sign of the Spectral Real Part Based on CRN for Speech Denoising. 951-955
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0W22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0W22
Hao Zhang, Ashutosh Pandey, DeLiang Wang:
Attentive Recurrent Network for Low-Latency Active Noise Control. 956-960
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangW22
Jen-Hung Huang, Chung-Hsien Wu:
Memory-Efficient Multi-Step Speech Enhancement with Neural ODE. 961-965
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWJCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWJCH22
Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Jianjun Hao:
GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block. 966-970
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWJCL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWJCL22
Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Dejun Li:
Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention. 971-975
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenRW0WYSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenRW0WYSM22
Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng:
Speech Enhancement with Fullband-Subband Cross-Attention Network. 976-980
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuFH0R22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuFH0R22
Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli:
OSSEM: one-shot speaker adaptive speech enhancement using meta learning. 981-985
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangLY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangLY22
Wenbin Jiang, Tao Liu, Kai Yu:
Efficient Speech Enhancement with Neural Homomorphic Synthesis. 986-990
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThakkerEYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThakkerEYW22
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang:
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation. 991-995
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SatoODKMMITM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatoODKMMITM22
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura:
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. 996-1000

Novel Models and Training Methods for ASR II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehmoodDSO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehmoodDSO22
Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay:
FedNST: Federated Noisy Student Training for Automatic Speech Recognition. 1001-1005
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuLWFZ0W022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuLWFZ0W022
Li Fu, Xiaoxiao Li, Runyu Wang, Lu Fan, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He:
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition. 1006-1010
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLZ022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan:
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition. 1011-1015
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiZSXM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiZSXM22
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR. 1016-1020
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaHYHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaHYHH22
Guodong Ma, Pengfei Hu, Nurmemet Yolwas, Shen Huang, Hao Huang:
PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition. 1021-1025
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AudhkhasiHRM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AudhkhasiHRM22
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno:
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition. 1026-1030
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangCSVPHRGMPSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangCSVPHRGMPSH22
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. 1031-1035
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZeineldeenXLSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZeineldeenXLSN22
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney:
Improving the Training Recipe for a Robust Conformer-based Hybrid Model. 1036-1040
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaptevMG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaptevMG22
Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg:
CTC Variations Through New WFST Topologies. 1041-1045
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SustekSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SustekSH22
Martin Sustek, Samik Sadhu, Hynek Hermansky:
Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition. 1046-1050
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaoZZWMW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaoZZWMW022
Chenfeng Miao, Kun Zou, Ziyang Zhuang, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao:
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition. 1051-1055
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZDB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZDB22
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker:
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training. 1056-1060
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KabilB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KabilB22
Selen Hande Kabil, Hervé Bourlard:
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition. 1061-1065
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanakaMSIMAM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanakaMSIMAM22
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya:
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks. 1066-1070
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaekakuFP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaekakuFP022
Takashi Maekaku, Yuya Fujita, Yifan Peng, Shinji Watanabe:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. 1071-1075

Spoken Dialogue Systems and Multimodality

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UchidaHIS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UchidaHIS22
Naokazu Uchida, Takeshi Homma, Makoto Iwayama, Yasuhiro Sogawa:
Reducing Offensive Replies in Open Domain Dialogue Systems. 1076-1080
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuJ22
Ting-Wei Wu, Biing-Hwang Juang:
Induce Spoken Dialog Intents via Deep Unsupervised Context Contrastive Clustering. 1081-1085
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiheiINNMFN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiheiINNMFN22
Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura:
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information. 1086-1090
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BekalSRBK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BekalSRBK22
Dhanush Bekal, Sundararajan Srinivasan, Srikanth Ronanki, Sravan Bodapati, Katrin Kirchhoff:
Contextual Acoustic Barge-In Classification for Spoken Dialog Systems. 1091-1095
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouCWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouCWZ22
Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng:
Calibrate and Refine! A Novel and Agile Framework for ASR Error Robust Intent Detection. 1096-1100
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengYWL0Z22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengYWL0Z22
Lingyun Feng, Jianwei Yu, Yan Wang, Songxiang Liu, Deng Cai, Haitao Zheng:
ASR-Robust Natural Language Understanding on ASR-GLUE dataset. 1101-1105
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaoTN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaoTN22
Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen:
From Disfluency Detection to Intent Detection and Slot Filling. 1106-1110
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouDZNLS0SCXG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouDZNLS0SCXG22
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. 1111-1115
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SartzetakiPP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SartzetakiPP22
Christina Sartzetaki, Georgios Paraskevopoulos, Alexandros Potamianos:
Extending Compositional Attention Networks for Social Reasoning in Videos. 1116-1120
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSWWZZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSWWZZD22
Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang:
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation. 1121-1125
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiesenfeldD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiesenfeldD22
Andreas Liesenfeld, Mark Dingemanse:
Bottom-up discovery of structure and variation in response tokens ('backchannels') across diverse languages. 1126-1130
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuWLWFC022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuWLWFC022
Yi Zhu, Zexun Wang, Hang Liu, Peiying Wang, Mingchao Feng, Meng Chen, Xiaodong He:
Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding. 1131-1135
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OchiOOKSY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OchiOOKSY22
Keiko Ochi, Nobutaka Ono, Keiho Owada, Miho Kuroda, Shigeki Sagayama, Hidenori Yamasue:
Use of Nods Less Synchronized with Turn-Taking and Prosody During Conversations in Adults with Autism. 1136-1140

Show and Tell I(VR)

- view
  - electronic edition @ isca-speech.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/IvankoRKAKL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IvankoRKAKL022
Denis Ivanko, Dmitry Ryumin, Alexey M. Kashevnik, Alexandr Axyonov, Andrey Kitenko, Igor Lashkov, Alexey Karpov:
DAVIS: Driver's Audio-Visual Speech recognition. 1141-1142

Speech Emotion Recognition I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaarasAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaarasAR22
Einari Vaaras, Manu Airaksinen, Okko Räsänen:
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition. 1143-1147
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLL22
Chun-Yu Chen, Yun-Shao Lin, Chi-Chun Lee:
Emotion-Shift Aware CRF for Decoding Emotion Sequence in Conversation. 1148-1152
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuL22
Bo-Hao Su, Chi-Chun Lee:
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy. 1153-1157
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParryDKIMCP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParryDKIMCP22
Jack Parry, Eric DeMattos, Anita Klementiev, Axel Ind, Daniela Morse-Kopp, Georgia Clarke, Dimitri Palaz:
Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning. 1158-1162
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GudmalwarBDR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GudmalwarBDR22
Ashishkumar Prabhakar Gudmalwar, Biplove Basel, Anirban Dutta, Ch V. Rama Rao:
The Magnitude and Phase based Speech Representation Learning using Autoencoder for Classifying Speech Emotions using Deep Canonical Correlation Analysis. 1163-1167
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoncalvesB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoncalvesB22
Lucas Goncalves, Carlos Busso:
Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks. 1168-1172

Single-channel Speech Enhancement I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoizumiKNPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoizumiKNPB22
Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani:
SNRi Target Training for Joint Speech Enhancement and Recognition. 1173-1177
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanadaNWTZTKY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanadaNWTZTKY22
Yutaro Sanada, Takumi Nakagawa, Yuichiro Wada, Kosaku Takanashi, Yuhui Zhang, Kiichi Tokuyama, Takafumi Kanamori, Tomonori Yamada:
Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches. 1178-1182
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeHLCWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeHLCWT22
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao:
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling. 1183-1187
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShchekotovAIAV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShchekotovAIAV22
Ivan Shchekotov, Pavel K. Andreev, Oleg Ivanov, Aibek Alanov, Dmitry P. Vetrov:
FFC-SE: Fast Fourier Convolution for Speech Enhancement. 1188-1192
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalMKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalMKA22
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi:
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement. 1193-1197
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinPKLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinPKLH22
Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Byung Hoon Lee, Sung Won Han:
Multi-View Attention Transfer for Efficient Speech Enhancement. 1198-1202

Speech Synthesis: New Applications

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoswamiH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoswamiH22
Nabarun Goswami, Tatsuya Harada:
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate. 1203-1207
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimonKACK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimonKACK22
Talia Ben Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph Keshet:
Correcting Mispronunciations in Speech using Spectrogram Inpainting. 1208-1212
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FongLHTK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FongLHTK22
Jason Fong, Daniel Lyth, Gustav Eje Henter, Hao Tang, Simon King:
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech. 1213-1217
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangMRGM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangMRGM22
Wen-Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon:
End-to-End Binaural Speech Synthesis. 1218-1222
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KochLSBDKRVV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KochLSBDKRVV22
Julia Koch, Florian Lux, Nadja Schauffler, Toni Bernhart, Felix Dieterle, Jonas Kuhn, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu:
PoeticTTS - Controllable Poetry Reading for Literary Studies. 1223-1227
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrugBGNXX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrugBGNXX22
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu:
Articulatory Synthesis for Data Augmentation in Phoneme Recognition. 1228-1232

Spoken Language Understanding I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeL22
Jihyun Lee, Gary Geunbae Lee:
SF-DST: Few-Shot Self-Feeding Reading Comprehension Dialogue State Tracking with Auxiliary Task. 1233-1237
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CattanGSR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CattanGSR22
Oralie Cattan, Sahar Ghannay, Christophe Servan, Sophie Rosset:
Benchmarking Transformers-based models on French Spoken Language Understanding tasks. 1238-1242
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoLL22
Seong-Hwan Heo, WonKee Lee, Jong-Hyeok Lee:
mcBERT: Momentum Contrastive Learning with BERT for Zero-Shot Slot Filling. 1243-1247
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wangh22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wangh22
Pu Wang, Hugo Van hamme:
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding. 1248-1252
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RajuRTDAZLBR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajuRTDAZLBR22
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow:
On joint training with interfaces for spoken language understanding. 1253-1257
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GargRDAMADT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GargRDAMADT22
Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed Hussen Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed H. Tewfik:
Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models. 1258-1262

Inclusive and Fair Speech Technologies I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OgayoNB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OgayoNB22
Perez Ogayo, Graham Neubig, Alan W. Black:
Building African Voices. 1263-1267
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DheramRRCKPSSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DheramRRCKPSSS22
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke:
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities. 1268-1272
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanCLCGH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanCLCGH22
May Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, Nicole R. Holliday:
Training and typological bias in ASR performance for world Englishes. 1273-1277

Inclusive and Fair Speech Technologies II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoitoBTE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoitoBTE22
Marcely Zanon Boito, Laurent Besacier, Natalia A. Tomashenko, Yannick Estève:
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems. 1278-1282
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JohnsonERGOA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JohnsonERGOA22
Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan:
Automatic Dialect Density Estimation for African American English. 1283-1287
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KukkA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KukkA22
Kunnar Kukk, Tanel Alumäe:
Improving Language Identification of Accented Speech. 1288-1292
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToussaintGD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToussaintGD22
Wiebke Toussaint, Lauriane Gorce, Aaron Yi Ding:
Design Guidelines for Inclusive Speaker Verification Evaluation Datasets. 1293-1297
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TrinhGKDSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TrinhGKDSM22
Viet Anh Trinh, Pegah Ghahremani, Brian John King, Jasha Droppo, Andreas Stolcke, Roland Maas:
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation. 1298-1302

Phonetics I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuniharaZMN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuniharaZMN22
Takuya Kunihara, Chuanbo Zhu, Nobuaki Minematsu, Noriko Nakanishi:
Gradual Improvements Observed in Learners' Perception and Production of L2 Sounds Through Continuing Shadowing Practices on a Daily Basis. 1303-1307
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KirchhubelB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KirchhubelB22
Christin Kirchhübel, Georgina Brown:
Spoofed speech from the perspective of a forensic phonetician. 1308-1312
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JeonN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JeonN22
Hae-Sung Jeon, Stephen Nichols:
Investigating Prosodic Variation in British English Varieties using ProPer. 1313-1317
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangHK22
Hyun Kyung Hwang, Manami Hirayama, Takaomi Kato:
Perceived prominence and downstep in Japanese. 1318-1321
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlicehajicH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlicehajicH22
Andrea Alicehajic, Silke Hamann:
The discrimination of [zi]-[dʑi] by Japanese listeners and the prospective phonologization of /zi/. 1322-1326
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LangheinrichSZB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LangheinrichSZB22
Ingo Langheinrich, Simon Stone, Xinyu Zhang, Peter Birkholz:
Glottal inverse filtering based on articulatory synthesis and deep learning. 1327-1331
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LudusanSW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LudusanSW22
Bogdan Ludusan, Marin Schröer, Petra Wagner:
Investigating phonetic convergence of laughter in conversation. 1332-1336
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DelvauxLDSNP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DelvauxLDSNP22
Véronique Delvaux, Audrey Lavallée, Fanny Degouis, Xavier Saloppe, Jean-Louis Nandrino, Thierry Pham:
Telling self-defining memories: An acoustic study of natural emotional speech productions. 1337-1341
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SpinuVLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SpinuVLL22
Laura Spinu, Ioana Vasilescu, Lori Lamel, Jason Lilley:
Voicing neutralization in Romanian fricatives across different speech styles. 1342-1346
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiaoHCKCSKVFH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiaoHCKCSKVFH22
Sishi Liao, Phil Hoole, Conceição Cunha, Esther Kunay, Aletheia Cui, Lia Saki Bucar Shigemori, Felicitas Kleber, Dirk Voit, Jens Frahm, Jonathan Harrington:
Nasal Coda Loss in the Chengdu Dialect of Mandarin: Evidence from RT-MRI. 1347-1351
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuechRPMH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuechRPMH22
Philipp Buech, Simon Roessig, Lena Pagel, Doris Mücke, Anne Hermes:
ema2wav: doing articulation by Praat. 1352-1356

Multi-, Cross-lingual and Other Topics in ASR I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RumbergGELO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RumbergGELO22
Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Ulrike Lüdtke, Jörn Ostermann:
Improving Phonetic Transcriptions of Children's Speech by Pronunciation Modelling with Constrained CTC-Decoding. 1357-1361
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kak0MCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kak0MCK22
Soky Kak, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara:
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism. 1362-1366
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MussakhojayevaK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MussakhojayevaK22
Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol:
KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus. 1367-1371
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzalaySAB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzalaySAB22
Tünde Szalay, Mostafa Ali Shahin, Beena Ahmed, Kirrie J. Ballard:
Knowledge of accent differences can be used to predict speech recognition. 1372-1376
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ScharfHWKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ScharfHWKW22
Maximilian Karl Scharf, Sabine Hochmuth, Lena L. N. Wong, Birger Kollmeier, Anna Warzybok:
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features. 1377-1381
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlechlYPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlechlYPS22
Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala:
End-to-end speech recognition modeling from de-identified data. 1382-1386
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavalliGV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavalliGV22
Aditya Yadavalli, Mirishkar Sai Ganesh, Anil Kumar Vuppala:
Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition. 1387-1391
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieH22
Jiamin Xie, John H. L. Hansen:
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition. 1392-1396

Zero, low-resource and multi-modal speech recognition I

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeB22
Yuna Lee, Seung Jun Baek:
Keyword Spotting with Synthetic Data using Heterogeneous Knowledge Distillation. 1397-1401
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeysselLADW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeysselLADW22
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski:
Probing phoneme, language and speaker information in unsupervised speech representations. 1402-1406
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BirladeanuMV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BirladeanuMV22
Andrei Bîrladeanu, Helen Minnis, Alessandro Vinciarelli:
Automatic Detection of Reactive Attachment Disorder Through Turn-Taking Analysis in Clinical Child-Caregiver Sessions. 1407-1410
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJSK22
Eesung Kim, Jae-Jin Jeon, Hyeji Seo, Hoon Kim:
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning. 1411-1415
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MillerH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MillerH22
Tyler Miller, David Harwath:
Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech. 1416-1420
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangSHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangSHS22
Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman:
Pseudo Label Is Better Than Human Label. 1421-1425
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MerweKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MerweKP22
Werner van der Merwe, Herman Kamper, Johan Adam du Preez:
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery. 1426-1430

Speaker Embedding and Diarization

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengSC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengSC22
Siqi Zheng, Hongbin Suo, Qian Chen:
PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification. 1431-1435
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Qin0W0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Qin0W0022
Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li:
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings. 1436-1440
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLL22
Weiqing Wang, Ming Li, Qingjian Lin:
Online Target Speaker Voice Activity Detection for Speaker Diarization. 1441-1445
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrummerSMSPSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrummerSMSPSB22
Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget:
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. 1446-1450
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gu22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gu22
Bin Gu:
Deep speaker embedding with frame-constrained training strategy for speaker verification. 1451-1455
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenGLCZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenGLCZ022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan:
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization. 1456-1460
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeDL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeDL22
Mao-Kui He, Jun Du, Chin-Hui Lee:
End-to-End Audio-Visual Neural Speaker Diarization. 1461-1465
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueDHYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueDHYW22
Yanyan Yue, Jun Du, Mao-Kui He, Yu Ting Yeung, Renyu Wang:
Online Speaker Diarization with Core Samples Selection. 1466-1470
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangW22
Chenyu Yang, Yu Wang:
Robust End-to-end Speaker Diarization with Generic Neural Clustering. 1471-1475
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Liu0XSLSHCYLWQ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Liu0XSLSHCYLWQ022
Tao Liu, Shuai Fan, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu:
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild. 1476-1480
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanveerCKJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanveerCKJ22
Md. Iftekhar Tanveer, Diego Casabuena, Jussi Karlgren, Rosie Jones:
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free. 1481-1485
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KinoshitaNDBH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KinoshitaNDBH22
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach:
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. 1486-1490
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLWZLXZTLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLWZLXZTLH22
Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong:
Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting. 1491-1495

Acoustic Event Detection and Classification

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangLLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangLLL22
Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang:
Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection. 1496-1500
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLT22
Peng Liu, Songbin Li, Jigang Tang:
An End-to-End Macaque Voiceprint Verification Method Based on Channel Fusion Mechanism. 1501-1505
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWWBZM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWWBZM22
Liang Xu, Jing Wang, Lizhong Wang, Sijun Bi, Jianqian Zhang, Qiuyue Ma:
Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal. 1506-1510
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangWYZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangWYZW22
Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang:
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection. 1511-1515
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TripathiP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TripathiP22
Achyut Mani Tripathi, Konark Paul:
Temporal Self Attention-Based Residual Network for Environmental Sound Classification. 1516-1520
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001Q0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001Q0M22
Juncheng Li, Shuhui Qu, Po-Yao Huang, Florian Metze:
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification. 1521-1525
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangYWYZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangYWYZ22
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou:
Improving Target Sound Extraction with Timestamp Information. 1526-1530
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuZLHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuZLHH22
Ying Hu, Xiujuan Zhu, Yunlong Li, Hao Huang, Liang He:
A Multi-grained based Attention Network for Semi-supervised Sound Event Detection. 1531-1535
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkKE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKE22
Sangwook Park, Sandeep Reddy Kothinti, Mounya Elhilali:
Temporal coding with magnitude-phase regularization for sound event detection. 1536-1540
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoLL22
Nian Shao, Erfan Loweimi, Xiaofei Li:
RCT: Random consistency training for semi-supervised sound event detection. 1541-1545
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinYZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinYZ22
Yifei Xin, Dongchao Yang, Yuexian Zou:
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification. 1546-1550
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0105CB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0105CB22
Yu Wang, Mark Cartwright, Juan Pablo Bello:
Active Few-Shot Learning for Sound Event Detection. 1551-1555
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeSWCX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeSWCX22
Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao:
Uncertainty Calibration for Deep Audio Classifiers. 1556-1560
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouB22
Yuanbo Hou, Dick Botteldooren:
Event-related data conditioning for acoustic event classification. 1561-1565

Speech Synthesis: Acoustic Modeling and Neural Waveform Generation II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLWM22
Haohan Guo, Hui Lu, Xixin Wu, Helen Meng:
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS. 1566-1570
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinTLWZZXZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinTLWZZXZL22
Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo:
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion. 1571-1575
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuongT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuongT22
Manh Luong, Viet-Anh Tran:
FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow for Speech Synthesis. 1576-1580
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuXH0Z22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuXH0Z22
Yanqing Liu, Ruiqing Xue, Lei He, Xu Tan, Sheng Zhao:
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders. 1581-1585
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanFYTZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanFYTZ22
Xin Yuan, Robin Feng, Mingming Ye, Cheng Tuo, Minghang Zhang:
AdaVocoder: Adaptive Vocoder for Custom Voice. 1586-1590
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuZG22
Shengyuan Xu, Wenxiao Zhao, Jing Guo:
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses. 1591-1595
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuGC022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuGC022
Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature. 1596-1600
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeGLZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeGLZG22
Mengnan He, Tingwei Guo, Zhenxing Lu, Ruixiong Zhang, Caixia Gong:
Improving GAN-based vocoder for fast and high-quality speech synthesis. 1601-1605
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YiHPWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YiHPWZ22
Yuanhao Yi, Lei He, Shifeng Pan, Xi Wang, Yuchao Zhang:
SoftSpeech: Unsupervised Duration Model in FastSpeech 2. 1606-1610
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoXSWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoXSWM22
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. 1611-1615
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0W022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0W022
Yuhan Li, Ying Shen, Dongqing Wang, Lin Zhang:
SiD-WaveFlow: A Low-Resource Vocoder Independent of Prior Knowledge. 1616-1620
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoraiSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoraiSM22
Takeru Gorai, Daisuke Saito, Nobuaki Minematsu:
Text-to-speech synthesis using spectral modeling based on non-negative autoencoder. 1621-1625
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanagawaIT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanagawaIT22
Hiroki Kanagawa, Yusuke Ijima, Hiroyuki Toda:
Joint Modeling of Multi-Sample and Subband Signals for Fast Neural Vocoding on CPU. 1626-1630
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTS22
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki:
MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks. 1631-1635
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaoCCMWX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaoCCMWX22
Chenfeng Miao, Ting Chen, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao:
A compact transformer-based GAN vocoder. 1636-1640
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TachibanaIGKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TachibanaIGKW22
Hideyuki Tachibana, Muneyoshi Inahara, Mocho Go, Yotaro Katayama, Yotaro Watanabe:
Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver. 1641-1645

ASR: Architecture and Search

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Variani0RACR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Variani0RACR22
Ehsan Variani, Michael Riley, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran:
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer. 1646-1650
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCG22
Ting-Wei Wu, I-Fan Chen, Ankur Gandhe:
Learning to rank with BERT-based confidence models in ASR rescoring. 1651-1655
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiSH0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiSH0K22
Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. 1656-1660
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWPSY00YP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWPSY00YP022
Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu:
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit. 1661-1665
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuMXHMZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuMXHMZ22
Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang:
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR. 1666-1670
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMDCTL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMDCTL022
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan:
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies. 1671-1675
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiLHNXZYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiLHNXZYW22
Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang:
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition. 1676-1680
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSLZWMX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSLZWMX22
Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie:
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer. 1681-1685
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangBAZXWZKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangBAZXWZKL22
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li:
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT. 1686-1690
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RathodDSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RathodDSG22
Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda:
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition. 1691-1695
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHS22
Weiran Wang, Ke Hu, Tara N. Sainath:
Streaming Align-Refine for Non-autoregressive Deliberation. 1696-1700
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinXYZ0MB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinXYZ0MB22
Rongmei Lin, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays:
Federated Pruning: Improving Neural Network Efficiency with Federated Learning. 1701-1705
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingWZSHDBWPLHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingWZSHDBWPLHM22
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. 1706-1710
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingMHLAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingMHLAR22
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov:
4-bit Conformer with Native Quantization Aware Training for Speech Recognition. 1711-1715
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuSWSLLG0D22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuSWSLLG0D22
Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu, Jianwu Dang:
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model. 1716-1720

Spoken Language Processing II

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaDBC0CM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaDBC0CM22
Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobu Morioka:
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation. 1721-1725
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenTDLN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenTDLN22
Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen:
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation. 1726-1730
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWZ22
Qian Wang, Chen Wang, Jiajun Zhang:
Investigating Parameter Sharing in Multilingual Speech Translation. 1731-1735
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCLYYCXJZZX022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCLYYCXJZZX022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan:
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset. 1736-1740
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- iBetuBet.com - Apostando apenas na Vitórias!