default search action
Heiga Zen
Person information
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c90]Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich:
Translatotron 3: Speech to Speech Translation with Monolingual Data. ICASSP 2024: 10686-10690 - [c89]Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov:
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data. ICASSP 2024: 11546-11550 - [i32]Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov:
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data. CoRR abs/2402.18932 (2024) - [i31]Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich:
SimulTron: On-Device Simultaneous Speech to Speech Translation. CoRR abs/2406.02133 (2024) - [i30]Min Ma, Yuma Koizumi, Shigeki Karita, Heiga Zen, Jason Riesa, Haruko Ishikawa, Michiel Bacchiani:
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks. CoRR abs/2408.06227 (2024) - [i29]Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur:
Geometric-Averaged Preference Optimization for Soft Preference Labels. CoRR abs/2409.06691 (2024) - 2023
- [j26]Jun Suzuki, Heiga Zen, Hideto Kazawa:
Extracting representative subset from extensive text data for training pre-trained language models. Inf. Process. Manag. 60(3): 103249 (2023) - [j25]Dong Yu, Yifan Gong, Michael A. Picheny, Bhuvana Ramabhadran, Dilek Hakkani-Tür, Rohit Prasad, Heiga Zen, Jan Skoglund, Jan Honza Cernocký, Lukás Burget, Abdelrahman Mohamed:
Twenty-Five Years of Evolution in Speech and Language Processing. IEEE Signal Process. Mag. 40(5): 27-39 (2023) - [j24]Shahin Amiriparian, Björn W. Schuller, Nabiha Asghar, Heiga Zen, Felix Burkhardt:
Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion. IEEE Trans. Affect. Comput. 14(1): 3-5 (2023) - [c88]Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada:
SayTap: Language to Quadrupedal Locomotion. CoRL 2023: 3556-3570 - [c87]Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran:
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech. ICASSP 2023: 1-5 - [c86]Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich:
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech. ICASSP 2023: 1-2 - [c85]Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna:
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus. INTERSPEECH 2023: 5496-5500 - [c84]Lev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark:
FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms. SSW 2023: 218-224 - [c83]Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani:
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations. WASPAA 2023: 1-5 - [i28]Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani:
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations. CoRR abs/2303.01664 (2023) - [i27]Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich:
Translatotron 3: Speech to Speech Translation with Monolingual Data. CoRR abs/2305.17547 (2023) - [i26]Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna:
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus. CoRR abs/2305.18802 (2023) - [i25]Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada:
SayTap: Language to Quadrupedal Locomotion. CoRR abs/2306.07580 (2023) - 2022
- [c82]Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani:
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. INTERSPEECH 2022: 803-807 - [c81]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen:
MAESTRO: Matched Speech Text Representations through Modality Matching. INTERSPEECH 2022: 4093-4097 - [c80]Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark:
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks. INTERSPEECH 2022: 4571-4575 - [c79]Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen:
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. LREC 2022: 6691-6703 - [c78]Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani:
Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration. SLT 2022: 884-891 - [i24]Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen:
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. CoRR abs/2201.03713 (2022) - [i23]Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani:
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. CoRR abs/2203.16749 (2022) - [i22]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen:
MAESTRO: Matched Speech Text Representations through Modality Matching. CoRR abs/2204.03409 (2022) - [i21]Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark:
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks. CoRR abs/2208.13183 (2022) - [i20]Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani:
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration. CoRR abs/2210.01029 (2022) - [i19]Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran:
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech. CoRR abs/2210.15447 (2022) - [i18]Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding:
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation. CoRR abs/2210.15868 (2022) - 2021
- [c77]Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu:
Parallel Tacotron: Non-Autoregressive and Controllable TTS. ICASSP 2021: 5709-5713 - [c76]Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan:
WaveGrad: Estimating Gradients for Waveform Generation. ICLR 2021 - [c75]Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu:
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. Interspeech 2021: 141-145 - [c74]Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu:
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. Interspeech 2021: 151-155 - [c73]Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno:
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation. Interspeech 2021: 736-740 - [c72]Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan:
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Interspeech 2021: 3765-3769 - [i17]Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu:
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. CoRR abs/2103.14574 (2021) - [i16]Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu:
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. CoRR abs/2103.15060 (2021) - [i15]Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan:
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. CoRR abs/2106.09660 (2021) - 2020
- [c71]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu:
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis. ICASSP 2020: 6264-6268 - [c70]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu:
Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior. ICASSP 2020: 6699-6703 - [i14]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu:
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis. CoRR abs/2002.03785 (2020) - [i13]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu:
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior. CoRR abs/2002.03788 (2020) - [i12]Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan:
WaveGrad: Estimating Gradients for Waveform Generation. CoRR abs/2009.00713 (2020) - [i11]Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu:
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling. CoRR abs/2010.04301 (2020) - [i10]Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu:
Parallel Tacotron: Non-Autoregressive and Controllable TTS. CoRR abs/2010.11439 (2020)
2010 – 2019
- 2019
- [j23]Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, Mehrez Souden:
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques. IEEE Signal Process. Mag. 36(6): 111-124 (2019) - [c69]Yutian Chen, Yannis M. Assael, Brendan Shillingford, David Budden, Scott E. Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Çaglar Gülçehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas:
Sample Efficient Adaptive Text-to-Speech. ICLR (Poster) 2019 - [c68]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. ICLR (Poster) 2019 - [c67]Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu:
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. INTERSPEECH 2019: 1526-1530 - [c66]Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran:
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning. INTERSPEECH 2019: 2080-2084 - [i9]Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George F. Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel Bacchiani, Thomas B. Jablin, Robert Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon:
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling. CoRR abs/1902.08295 (2019) - [i8]Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu:
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. CoRR abs/1904.02882 (2019) - [i7]Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran:
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning. CoRR abs/1907.04448 (2019) - 2018
- [c65]Heiga Zen:
[Invited] Generative Model-Based Text-to-Speech Synthesis. GCCE 2018: 327-328 - [c64]Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis:
Parallel WaveNet: Fast High-Fidelity Speech Synthesis. ICML 2018: 3915-3923 - [c63]Antoine Bruguier, Heiga Zen, Arkady Arkhangorodsky:
Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents. INTERSPEECH 2018: 1284-1287 - [i6]Yutian Chen, Yannis M. Assael, Brendan Shillingford, David Budden, Scott E. Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Çaglar Gülçehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas:
Sample Efficient Adaptive Text-to-Speech. CoRR abs/1809.10460 (2018) - [i5]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. CoRR abs/1810.07217 (2018) - 2017
- [p1]Michiel Bacchiani, Françoise Beaufays, Alexander Gruenstein, Pedro J. Moreno, Johan Schalkwyk, Trevor Strohman, Heiga Zen:
Speech Research at Google to Enable Universal Speech Interfaces. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 385-399 - [i4]Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis:
Parallel WaveNet: Fast High-Fidelity Speech Synthesis. CoRR abs/1711.10433 (2017) - 2016
- [c62]Keiichi Tokuda, Heiga Zen:
Directly modeling voiced and unvoiced components in speech waveforms by neural networks. ICASSP 2016: 5640-5644 - [c61]Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemyslaw Szczepaniak:
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices. INTERSPEECH 2016: 2273-2277 - [c60]Bo Li, Heiga Zen:
Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis. INTERSPEECH 2016: 2468-2472 - [c59]Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu:
WaveNet: A Generative Model for Raw Audio. SSW 2016: 125 - [c58]Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen:
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis. SSW 2016: 221-228 - [i3]Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen:
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis. CoRR abs/1605.07809 (2016) - [i2]Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemyslaw Szczepaniak:
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices. CoRR abs/1606.06061 (2016) - [i1]Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu:
WaveNet: A Generative Model for Raw Audio. CoRR abs/1609.03499 (2016) - 2015
- [j22]Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew W. Senior, Mike Schuster, Xiaojun Qian, Helen M. Meng, Li Deng:
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends. IEEE Signal Process. Mag. 32(3): 35-52 (2015) - [c57]Keiichi Tokuda, Heiga Zen:
Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis. ICASSP 2015: 4215-4219 - [c56]Heiga Zen, Hasim Sak:
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. ICASSP 2015: 4470-4474 - 2014
- [c55]Heiga Zen, Andrew W. Senior:
Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. ICASSP 2014: 3844-3848 - 2013
- [j21]Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, Keiichiro Oura:
Speech Synthesis Based on Hidden Markov Models. Proc. IEEE 101(5): 1234-1252 (2013) - [j20]Matt Shannon, Heiga Zen, William Byrne:
Autoregressive Models for Statistical Parametric Speech Synthesis. IEEE Trans. Speech Audio Process. 21(3): 587-597 (2013) - [c54]Heiga Zen, Andrew W. Senior, Mike Schuster:
Statistical parametric speech synthesis using deep neural networks. ICASSP 2013: 7962-7966 - [c53]Heiga Zen:
Deep learning in speech synthesis. SSW 2013: 309 - 2012
- [j19]Heiga Zen, Mark J. F. Gales, Yoshihiko Nankaku, Keiichi Tokuda:
Product of Experts for Statistical Parametric Speech Synthesis. IEEE Trans. Speech Audio Process. 20(3): 794-805 (2012) - [j18]Heiga Zen, Norbert Braunschweiler, Sabine Buchholz, Mark J. F. Gales, Kate M. Knill, Sacha Krstulovic, Javier Latorre:
Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization. IEEE Trans. Speech Audio Process. 20(6): 1713-1724 (2012) - [c52]Cassia Valentini-Botinhao, Ranniery Maia, Junichi Yamagishi, Simon King, Heiga Zen:
Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise. ICASSP 2012: 3997-4000 - [c51]Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate M. Knill, Masami Akamine:
Combining multiple high quality corpora for improving HMM-TTS. INTERSPEECH 2012: 1135-1138 - 2011
- [j17]Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Bayesian Context Clustering Using Cross Validation for Speech Recognition. IEICE Trans. Inf. Syst. 94-D(3): 668-678 (2011) - [j16]Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis. Speech Commun. 53(6): 914-923 (2011) - [j15]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Continuous Stochastic Feature Mapping Based on Trajectory HMMs. IEEE Trans. Speech Audio Process. 19(2): 417-430 (2011) - [c50]Heiga Zen, Mark J. F. Gales:
Decision tree-based context clustering based on cross validation and hierarchical priors. ICASSP 2011: 4560-4563 - [c49]Matt Shannon, Heiga Zen, William J. Byrne:
The Effect of Using Normalized Models in Statistical Speech Synthesis. INTERSPEECH 2011: 121-124 - [c48]Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai:
Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis. INTERSPEECH 2011: 1801-1804 - [c47]Ranniery Maia, Heiga Zen, Kate M. Knill, Mark J. F. Gales, Sabine Buchholz:
Multipulse Sequences for Residual Signal Modeling. INTERSPEECH 2011: 1833-1836 - [c46]Nicholas Pilkington, Heiga Zen, Mark J. F. Gales:
Gaussian Process Experts for Voice Conversion. INTERSPEECH 2011: 2761-2764 - 2010
- [j14]Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
A Covariance-Tying Technique for HMM-Based Speech Synthesis. IEICE Trans. Inf. Syst. 93-D(3): 595-601 (2010) - [c45]Heiga Zen, Mark J. F. Gales, Yoshihiko Nankaku, Keiichi Tokuda:
Statistical parametric speech synthesis based on product of experts. ICASSP 2010: 4242-4245 - [c44]Heiga Zen:
Speaker and language adaptive training for HMM-based polyglot speech synthesis. INTERSPEECH 2010: 410-413 - [c43]Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:
Context adaptive training with factorized decision trees for HMM-based speech synthesis. INTERSPEECH 2010: 414-417 - [c42]Nicholas Pilkington, Heiga Zen:
An implementation of decision tree-based context clustering on graphics processing units. INTERSPEECH 2010: 833-836 - [c41]Javier Latorre, Mark J. F. Gales, Heiga Zen:
Training a parametric-based logF0 model with the minimum generation error criterion. INTERSPEECH 2010: 2174-2177 - [c40]Ranniery Maia, Heiga Zen, Mark J. F. Gales:
Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters. SSW 2010: 88-93 - [c39]Heiga Zen, Norbert Braunschweiler, Sabine Buchholz, Kate M. Knill, Sacha Krstulovic, Javier Latorre:
HMM-based polyglot speech synthesis by speaker and language adaptive training. SSW 2010: 186-191
2000 – 2009
- 2009
- [j13]Heiga Zen, Keiichi Tokuda, Alan W. Black:
Statistical parametric speech synthesis. Speech Commun. 51(11): 1039-1064 (2009) - [j12]Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, Steve Renals:
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis. IEEE Trans. Speech Audio Process. 17(6): 1208-1230 (2009) - [c38]Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Takashi Masuko, Keiichi Tokuda:
A Bayesian approach to HMM-based speech synthesis. ICASSP 2009: 4029-4032 - [c37]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Stereo-based stochastic noise compensation based on trajectory GMMS. ICASSP 2009: 4577-4580 - [c36]Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems. INTERSPEECH 2009: 1759-1762 - [c35]Heiga Zen, Norbert Braunschweiler:
Context-dependent additive log f_0 model for HMM-based speech synthesis. INTERSPEECH 2009: 2091-2094 - 2008
- [j11]Heiga Zen, Tomoki Toda, Keiichi Tokuda:
The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. 91-D(6): 1764-1773 (2008) - [j10]Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System. IEICE Trans. Inf. Syst. 91-D(11): 2693-2700 (2008) - [c34]Junichi Yamagishi, Heiga Zen, Yi-Jian Wu, Tomoki Toda, Keiichi Tokuda:
The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge. Blizzard Challenge 2008 - [c33]Junichi Yamagishi, Takashi Nose, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007. ICASSP 2008: 3957-3960 - [c32]Yoshihiko Nankaku, Kazuhiro Nakamura, Heiga Zen, Keiichi Tokuda:
Acoustic modeling with contextual additive structure for HMM-based speech recognition. ICASSP 2008: 4469-4472 - [c31]Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Acoustic modeling based on model structure annealing for speech recognition. INTERSPEECH 2008: 932-935 - [c30]Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition. INTERSPEECH 2008: 936-939 - [c29]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Probabilistic feature mapping based on trajectory HMMs. INTERSPEECH 2008: 1068-1071 - [c28]Simon King, Keiichi Tokuda, Heiga Zen, Junichi Yamagishi:
Unsupervised adaptation for HMM-based speech synthesis. INTERSPEECH 2008: 1869-1872 - 2007
- [j9]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Comput. Speech Lang. 21(1): 153-173 (2007) - [j8]Heiga Zen, Tomoki Toda, Masaru Nakamura, Keiichi Tokuda:
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. 90-D(1): 325-333 (2007) - [j7]Heiga Zen, Takashi Masuko, Keiichi Tokuda, Takayoshi Yoshimura, Takao Kobayashi, Tadashi Kitamura:
State Duration Modeling for HMM-Based Speech Synthesis. IEICE Trans. Inf. Syst. 90-D(3): 692-693 (2007) - [j6]Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
A Hidden Semi-Markov Model-Based Speech Synthesis System. IEICE Trans. Inf. Syst. 90-D(5): 825-834 (2007) - [c27]Junichi Yamagishi, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. Blizzard Challenge 2007 - [c26]Alan W. Black, Heiga Zen, Keiichi Tokuda:
Statistical Parametric Speech Synthesis. ICASSP (4) 2007: 1229-1232 - [c25]Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
A trainable excitation model for HMM-based speech synthesis. INTERSPEECH 2007: 1909-1912 - [c24]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Model-space MLLR for trajectory HMMs. INTERSPEECH 2007: 2065-2068 - [c23]Junichi Yamagishi, Takao Kobayashi, Steve Renals, Simon King, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. SSW 2007: 125-130 - [c22]Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
An excitation model for HMM-based speech synthesis based on residual modeling. SSW 2007: 131-136 - [c21]Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, Keiichi Tokuda:
The HMM-based speech synthesis system (HTS) version 2.0. SSW 2007: 294-299 - 2006
- [c20]Heiga Zen, Tomoki Toda, Keiichi Tokuda:
The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. Blizzard Challenge 2006 - [c19]Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Hidden Semi-Markov Model Based Speech Recognition System using Weighted Finite-State Transducer. ICASSP (1) 2006: 33-36 - [c18]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura:
Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler. ICASSP (1) 2006: 1173-1176 - [c17]Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
An HMM-based singing voice synthesis system. INTERSPEECH 2006 - [c16]Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura:
Speaker adaptation of trajectory HMMs using feature-space MLLR. INTERSPEECH 2006 - 2005
- [j5]Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, Fernando Gil Resende:
Applying Sparse KPCA for Feature Extraction in Speech Recognition. IEICE Trans. Inf. Syst. 88-D(3): 401-409 (2005) - [j4]Hiroyuki Suzuki, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Continuous Speech Recognition Based on General Factor Dependent Acoustic Models. IEICE Trans. Inf. Syst. 88-D(3): 410-417 (2005) - [j3]Yohei Itaya, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition. IEICE Trans. Inf. Syst. 88-D(3): 425-431 (2005) - [j2]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Simultaneous clustering of phonetic context, dimension, and state position for acoustic modeling using decision trees. Syst. Comput. Jpn. 36(14): 44-55 (2005) - [c15]Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, Fernando Gil Resende:
Sparse KPCA for Feature Extraction in Speech Recognition. ICASSP (1) 2005: 353-356 - [c14]Heiga Zen, Tomoki Toda:
An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005. INTERSPEECH 2005: 93-96 - [c13]Wael Hamza, Raimo Bakis, Zhiwei Shuang, Heiga Zen:
On building a concatenative speech synthesis system from the blizzard challenge speech databases. INTERSPEECH 2005: 97-100 - 2004
- [j1]Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
On the Use of Kernel PCA for Feature Extraction in Speech Recognition. IEICE Trans. Inf. Syst. 87-D(12): 2802-2811 (2004) - [c12]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features. ICASSP (1) 2004: 837-840 - [c11]Yohei Itaya, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Deterministic annealing EM algorithm in parameter estimation for acoustic model. INTERSPEECH 2004: 433-436 - [c10]Heiga Zen, Tadashi Kitamura, Murtaza Bulut, Shrikanth S. Narayanan, Ryosuke Tsuzuki, Keiichi Tokuda:
Constructing emotional speech synthesizers with limited speech database. INTERSPEECH 2004: 1185-1188 - [c9]Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
Hidden semi-Markov model based speech synthesis. INTERSPEECH 2004: 1393-1396 - [c8]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
An introduction of trajectory model into HMM-based speech synthesis. SSW 2004: 191-196 - 2003
- [c7]Hiroyuki Suzuki, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Speech recognition using voice-characteristic-dependent acoustic models. ICASSP (1) 2003: 740-743 - [c6]Takahiro Hoshiya, Shinji Sako, Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
Improving the performance of HMM-based very low bit rate speech coding. ICASSP (1) 2003: 800-803 - [c5]Keiichi Tokuda, Heiga Zen, Tadashi Kitamura:
Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features. INTERSPEECH 2003: 865-868 - [c4]Ranniery Maia, Heiga Zen, Keiichi Tokuda, Tadashi Kitamura, Fernando Gil Vianna Resende Jr.:
Towards the development of a brazilian portuguese text-to-speech system based on HMM. INTERSPEECH 2003: 2465-2468 - [c3]Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
On the use of kernel PCA for feature extraction in speech recognition. INTERSPEECH 2003: 2625-2628 - [c2]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling. INTERSPEECH 2003: 3189-3192 - 2002
- [c1]Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Decision tree distribution tying based on a dimensional split technique. INTERSPEECH 2002: 1257-1260
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-15 00:22 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint