default search action
Tomoki Toda
Person information
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j71]Haruki Yamashita, Takuma Okamoto, Ryoichi Takashima, Yamato Ohtani, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai:
Fast Neural Speech Waveform Generative Models With Fully-Connected Layer-Based Upsampling. IEEE Access 12: 31409-31421 (2024) - [j70]Mohammad Eshghi, Tomoki Toda:
An Investigation of Fundamental Frequency Pattern Prediction for Japanese Electrolaryngeal Speech Enhancement Based on Frame-Wise Phoneme Representations. IEEE Access 12: 50137-50153 (2024) - [j69]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda:
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data. IEEE Signal Process. Lett. 31: 2995-2999 (2024) - [j68]Rui Wang, Li Li, Tomoki Toda:
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1968-1979 (2024) - [j67]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2777-2789 (2024) - [j66]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3185-3199 (2024) - [c339]Takuya Fujimura, Keisuke Imoto, Tomoki Toda:
Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection. EUSIPCO 2024: 156-160 - [c338]Jiachen Wang, Tomoki Toda:
Unsupervised Training of Neural Network-Based Virtual Microphone Estimator. EUSIPCO 2024: 256-260 - [c337]Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda:
Audio Difference Learning for Audio Captioning. ICASSP 2024: 1456-1460 - [c336]Yamato Ohtani, Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter. ICASSP 2024: 10871-10875 - [c335]Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders. ICASSP 2024: 10961-10965 - [c334]Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda:
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction. ICASSP 2024: 11066-11070 - [c333]Takuma Okamoto, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion. ICASSP 2024: 12456-12460 - [i85]Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda:
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction. CoRR abs/2401.13260 (2024) - [i84]Yusuke Yasuda, Tomoki Toda:
Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. CoRR abs/2403.06100 (2024) - [i83]Yuka Hashizume, Li Li, Atsushi Miyashita, Tomoki Toda:
Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment. CoRR abs/2404.06682 (2024) - [i82]You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan:
SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan. CoRR abs/2405.05244 (2024) - [i81]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda:
Multi-speaker Text-to-speech Training with Speaker Anonymized Data. CoRR abs/2405.11767 (2024) - [i80]Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan:
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection. CoRR abs/2406.02438 (2024) - [i79]Jiajun He, Tomoki Toda:
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval. CoRR abs/2406.06201 (2024) - [i78]Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Martijn Wieling, Defne Abur, Tomoki Toda:
Quantifying the effect of speech pathology on automatic and human speaker verification. CoRR abs/2406.06208 (2024) - [i77]You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, Zhiyao Duan:
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge. CoRR abs/2408.16132 (2024) - [i76]Wen-Chin Huang, Szu-Wei Fu, Erica Cooper, Ryandhimas E. Zezario, Tomoki Toda, Hsin-Min Wang, Junichi Yamagishi, Yu Tsao:
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction. CoRR abs/2409.07001 (2024) - [i75]Takuya Fujimura, Ibuki Kuroyanagi, Tomoki Toda:
Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions. CoRR abs/2409.09332 (2024) - [i74]Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda:
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions. CoRR abs/2409.19585 (2024) - [i73]Jinyi Mi, Sehun Kim, Tomoki Toda:
Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals. CoRR abs/2409.19614 (2024) - 2023
- [j65]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai:
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1902-1915 (2023) - [j64]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3717-3729 (2023) - [j63]Chao Xie, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3871-3882 (2023) - [c332]Wen-Chin Huang, Tomoki Toda:
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion. APSIPA ASC 2023: 1161-1166 - [c331]Lester Phillip Violeta, Tomoki Toda:
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing. APSIPA ASC 2023: 1862-1867 - [c330]Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains. ASRU 2023: 1-7 - [c329]Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. ASRU 2023: 1-7 - [c328]Jiajun He, Zekun Yang, Tomoki Toda:
ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. ASRU 2023: 1-6 - [c327]Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. ASRU 2023: 1-8 - [c326]Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:
WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer. ASRU 2023: 1-8 - [c325]Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. ASRU 2023: 1-6 - [c324]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Sound Field Interpolation with Unsupervised Calibration for Freely Spaced Circular Microphone Array in Rotation-Robust Beamforming. EUSIPCO 2023: 21-25 - [c323]Takuya Fujimura, Tomoki Toda:
Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement. ICASSP 2023: 1-5 - [c322]Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation. ICASSP 2023: 1-5 - [c321]Atsushi Miyashita, Tomoki Toda:
Representation of Vocal Tract Length Transformation Based on Group Theory. ICASSP 2023: 1-5 - [c320]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition. ICASSP 2023: 1-5 - [c319]Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda:
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit. ICASSP 2023: 1-5 - [c318]Yusuke Yasuda, Tomoki Toda:
Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder. ICASSP 2023: 1-5 - [c317]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder. ICASSP 2023: 1-5 - [c316]Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda:
Preference-based training framework for automatic speech quality assessment using deep neural network. INTERSPEECH 2023: 546-550 - [c315]Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation. INTERSPEECH 2023: 765-769 - [c314]Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion. INTERSPEECH 2023: 2043-2047 - [c313]Yeonjong Choi, Chao Xie, Tomoki Toda:
Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator. INTERSPEECH 2023: 2103-2107 - [c312]Yusuke Yasuda, Tomoki Toda:
Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech Based on Tail Probabilities. INTERSPEECH 2023: 5491-5495 - [c311]Sehun Kim, Kazuya Takeda, Tomoki Toda:
Sequence-to-Sequence Network Training Methods for Automatic Guitar Transcription With Tokenized Outputs. ISMIR 2023: 524-531 - [c310]Jingguang Tian, Desheng Hu, Xiaohan Shi, Jiajun He, Xingfeng Li, Yuan Gao, Tomoki Toda, Xinkang Xu, Xinhui Hu:
Semi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction. MRAC@MM 2023: 67-73 - [c309]Atsushi Miyashita, Tomoki Toda:
Differentiable Representation of Warping Based on Lie Group Theory. WASPAA 2023: 1-5 - [c308]Rui Wang, Tomoki Toda:
Directional Target Speaker Extraction under Noisy Underdetermined Conditions through Conditional Variational Autoencoder with Global Style Tokens. WASPAA 2023: 1-5 - [i72]Lester Phillip Violeta, Tomoki Toda:
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing. CoRR abs/2306.13953 (2023) - [i71]Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Yusuke Yasuda, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. CoRR abs/2306.14422 (2023) - [i70]Wen-Chin Huang, Tomoki Toda:
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion. CoRR abs/2309.02133 (2023) - [i69]Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion. CoRR abs/2309.07598 (2023) - [i68]Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda:
Audio Difference Learning for Audio Captioning. CoRR abs/2309.08141 (2023) - [i67]Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders. CoRR abs/2309.09627 (2023) - [i66]Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens. CoRR abs/2310.02570 (2023) - [i65]Jiajun He, Zekun Yang, Tomoki Toda:
ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction. CoRR abs/2310.05129 (2023) - [i64]Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. CoRR abs/2310.05203 (2023) - [i63]Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda:
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition. CoRR abs/2311.07093 (2023) - 2022
- [j62]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda:
A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion. IEEE J. Sel. Top. Signal Process. 16(6): 1308-1318 (2022) - [j61]Yusuke Yasuda, Tomoki Toda:
Investigation of Japanese PnG BERT Language Model in Text-to-Speech Synthesis for Pitch Accent Language. IEEE J. Sel. Top. Signal Process. 16(6): 1319-1328 (2022) - [j60]Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Neural speech-rate conversion with multispeaker WaveNet vocoder. Speech Commun. 138: 1-12 (2022) - [c307]Sehun Kim, Tomoki Hayashi, Tomoki Toda:
Note-level Automatic Guitar Transcription Using Attention Mechanism. EUSIPCO 2022: 229-233 - [c306]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure. EUSIPCO 2022: 294-298 - [c305]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Modified Sound Field Interpolation Method for Rotation-robust Beamforming with Unequally Spaced Circular Microphone Array. EUSIPCO 2022: 344-348 - [c304]Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda:
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech. ICASSP 2022: 896-900 - [c303]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c302]Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda:
Towards Identity Preserving Normal to Dysarthric Voice Conversion. ICASSP 2022: 6672-6676 - [c301]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion. ICASSP 2022: 6787-6791 - [c300]Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion. ICASSP 2022: 6802-6806 - [c299]Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi:
Generalization Ability of MOS Prediction Networks. ICASSP 2022: 8442-8446 - [c298]Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. INTERSPEECH 2022: 41-45 - [c297]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. INTERSPEECH 2022: 848-852 - [c296]Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The VoiceMOS Challenge 2022. INTERSPEECH 2022: 4536-4540 - [c295]Daiki Yoshioka, Yusuke Yasuda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage. INTERSPEECH 2022: 4576-4580 - [c294]Yeonjong Choi, Chao Xie, Tomoki Toda:
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions. INTERSPEECH 2022: 4910-4914 - [c293]Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:
Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion. SLT 2022: 949-954 - [i62]Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The VoiceMOS Challenge 2022. CoRR abs/2203.11389 (2022) - [i61]Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. CoRR abs/2203.15431 (2022) - [i60]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. CoRR abs/2205.06053 (2022) - [i59]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure. CoRR abs/2206.05929 (2022) - [i58]Yeonjong Choi, Chao Xie, Tomoki Toda:
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions. CoRR abs/2206.15155 (2022) - [i57]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda:
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion. CoRR abs/2207.04356 (2022) - [i56]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System. CoRR abs/2207.05913 (2022) - [i55]Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. CoRR abs/2210.10314 (2022) - [i54]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder. CoRR abs/2210.15533 (2022) - [i53]Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda:
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit. CoRR abs/2210.15987 (2022) - [i52]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition. CoRR abs/2211.01079 (2022) - [i51]Takuya Fujimura, Tomoki Toda:
Analysis of Noisy-target Training for DNN-based speech enhancement. CoRR abs/2211.01198 (2022) - [i50]Yuka Hashizume, Li Li, Tomoki Toda:
Music Similarity Calculation of Individual Instrumental Sounds Using Metric Learning. CoRR abs/2211.07863 (2022) - [i49]Yusuke Yasuda, Tomoki Toda:
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language. CoRR abs/2212.08321 (2022) - [i48]Yusuke Yasuda, Tomoki Toda:
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder. CoRR abs/2212.08329 (2022) - 2021
- [j59]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU. IEEE Access 9: 94923-94933 (2021) - [j58]Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 656-670 (2021) - [j57]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 29: 745-755 (2021) - [j56]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 792-806 (2021) - [j55]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1134-1148 (2021) - [c292]Zhaopeng Qian, Haijun Niu, Li Wang, Kazuhiro Kobayashi, Shaochuan Zhang, Tomoki Toda:
Mandarin Electro-Laryngeal Speech Enhancement based on Statistical Voice Conversion and Manual Tone Control. APSIPA ASC 2021: 546-552 - [c291]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Framework with Denoising Model. APSIPA ASC 2021: 814-820 - [c290]Ding Ma, Wen-Chin Huang, Tomoki Toda:
Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion. APSIPA ASC 2021: 870-877 - [c289]Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion. APSIPA ASC 2021: 1234-1238 - [c288]Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
Multi-Stream HiFi-GAN with Data-Driven Waveform Decomposition. ASRU 2021: 610-617 - [c287]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS Based Voice Conversion. ASRU 2021: 642-649 - [c286]Ming-Chi Yen, Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Shu-Wei Tsai, Yu Tsao, Tomoki Toda, Jyh-Shing Roger Jang, Hsin-Min Wang:
Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling. ASRU 2021: 650-657 - [c285]Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao:
HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network. ASRU 2021: 907-913 - [c284]Ibuki Kuroyanagi, Tomoki Hayashi, Yusuke Adachi, Takenori Yoshimura, Kazuya Takeda, Tomoki Toda:
An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning. DCASE 2021: 110-114 - [c283]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Anomalous Sound Detection Using a Binary Classification Model and Class Centroids. EUSIPCO 2021: 1995-1999 - [c282]Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder. ICASSP 2021: 5934-5938 - [c281]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders. ICASSP 2021: 6029-6033 - [c280]Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda:
Speech Emotion Recognition Based on Listener Adaptive Models. ICASSP 2021: 6274-6278 - [c279]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC. ICASSP 2021: 7058-7062 - [c278]Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
Non-Autoregressive Sequence-To-Sequence Voice Conversion. ICASSP 2021: 7068-7072 - [c277]Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda:
Speech Recognition by Simply Fine-Tuning Bert. ICASSP 2021: 7343-7347 - [c276]Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion. Interspeech 2021: 1329-1333 - [c275]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN: Unified Source-Filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN. Interspeech 2021: 2187-2191 - [c274]Patrick Lumban Tobing, Tomoki Toda:
High-Fidelity and Low-Latency Universal Neural Vocoder Based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling. Interspeech 2021: 2217-2221 - [c273]Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder. Interspeech 2021: 3630-3634 - [c272]Shogo Seki, Haruka Taga, Tomoki Toda:
Singing Fundamental Frequency Contour Generation Using Generalized Command-Response Model and Score-Conditional Variational Autoencoder. MLSP 2021: 1-3 - [c271]Patrick Lumban Tobing, Tomoki Toda:
Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. SSW 2021: 142-147 - [i47]Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda:
Speech Recognition by Simply Fine-tuning BERT. CoRR abs/2102.00291 (2021) - [i46]Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder. CoRR abs/2103.02858 (2021) - [i45]Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
The AS-NU System for the M2VoC Challenge. CoRR abs/2104.03009 (2021) - [i44]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN. CoRR abs/2104.04668 (2021) - [i43]Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
Non-autoregressive sequence-to-sequence voice conversion. CoRR abs/2104.06793 (2021) - [i42]Patrick Lumban Tobing, Tomoki Toda:
High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling. CoRR abs/2105.09856 (2021) - [i41]Patrick Lumban Tobing, Tomoki Toda:
Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction. CoRR abs/2105.09858 (2021) - [i40]Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion. CoRR abs/2106.01415 (2021) - [i39]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Anomalous Sound Detection Using a Binary Classification Model and Class Centroids. CoRR abs/2106.06151 (2021) - [i38]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS based Voice Conversion. CoRR abs/2107.09477 (2021) - [i37]Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion. CoRR abs/2109.03551 (2021) - [i36]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Framework with Denoising Model. CoRR abs/2109.10608 (2021) - [i35]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations. CoRR abs/2110.06280 (2021) - [i34]Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda:
Towards Identity Preserving Normal to Dysarthric Voice Conversion. CoRR abs/2110.08213 (2021) - [i33]Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda:
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech. CoRR abs/2110.09103 (2021) - [i32]Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao:
HASA-net: A non-intrusive hearing-aid speech assessment network. CoRR abs/2111.05691 (2021) - [i31]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion. CoRR abs/2111.07116 (2021) - 2020
- [j54]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression. IEEE Access 8: 62094-62106 (2020) - [j53]Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. IEEE ACM Trans. Audio Speech Lang. Process. 28: 715-728 (2020) - [c270]Hikaru Nakatani, Patrick Lumban Tobing, Kazuya Takeda, Tomoki Toda:
Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder. APSIPA 2020: 520-526 - [c269]Mohammad Eshghi, Kazuhiro Kobayashi, Kou Tanaka, Hirokazu Kameoka, Tomoki Toda:
Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech. APSIPA 2020: 572-577 - [c268]Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda:
Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --. Blizzard Challenge / Voice Conversion Challenge 2020 - [c267]Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda:
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions. Blizzard Challenge / Voice Conversion Challenge 2020 - [c266]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. Blizzard Challenge / Voice Conversion Challenge 2020 - [c265]Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders. Blizzard Challenge / Voice Conversion Challenge 2020 - [c264]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda:
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN. Blizzard Challenge / Voice Conversion Challenge 2020 - [c263]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation. DCASE 2020: 100-104 - [c262]Kazuhiro Kobayashi, Tomoki Toda:
Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. EUSIPCO 2020: 396-400 - [c261]Moe Takada, Shogo Seki, Patrick Lumban Tobing, Tomoki Toda:
Semi-Supervised Enhancement and Suppression of Self-Produced Speech Using Correspondence between Air- and Body-Conducted Signals. EUSIPCO 2020: 456-460 - [c260]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Weakly-Supervised Sound Event Detection with Self-Attention. ICASSP 2020: 66-70 - [c259]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Transformer-Based Text-to-Speech with Weighted Forced Attention. ICASSP 2020: 6729-6733 - [c258]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction. ICASSP 2020: 7204-7208 - [c257]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan:
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. ICASSP 2020: 7654-7658 - [c256]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation. INTERSPEECH 2020: 3535-3539 - [c255]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems. INTERSPEECH 2020: 3540-3544 - [c254]Shogo Seki, Moe Takada, Tomoki Toda:
Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder. INTERSPEECH 2020: 4039-4043 - [c253]Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda:
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment. INTERSPEECH 2020: 4059-4063 - [c252]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. INTERSPEECH 2020: 4676-4680 - [c251]Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling. INTERSPEECH 2020: 4861-4865 - [e1]Junichi Yamagishi, Zhenhua Ling, Rohan Kumar Das, Simon King, Tomi Kinnunen, Tomoki Toda, Wen-Chin Huang, Xiao Zhou, Xiaohai Tian, Yi Zhao:
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Shanghai, China, October 30, 2020. ISCA 2020 [contents] - [i30]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression. CoRR abs/2003.11750 (2020) - [i29]Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. CoRR abs/2005.08445 (2020) - [i28]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation. CoRR abs/2005.08654 (2020) - [i27]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems. CoRR abs/2005.08659 (2020) - [i26]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network. CoRR abs/2007.05663 (2020) - [i25]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network. CoRR abs/2007.12955 (2020) - [i24]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. CoRR abs/2008.03088 (2020) - [i23]Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda:
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion. CoRR abs/2008.12527 (2020) - [i22]Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda:
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions. CoRR abs/2009.03554 (2020) - [i21]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. CoRR abs/2010.02434 (2020) - [i20]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda:
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN. CoRR abs/2010.04429 (2020) - [i19]Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders. CoRR abs/2010.04446 (2020) - [i18]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, Tomoki Toda:
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations. CoRR abs/2010.12231 (2020)
2010 – 2019
- 2019
- [j52]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Underdetermined Source Separation Based on Generalized Multichannel Variational Autoencoder. IEEE Access 7: 168104-168115 (2019) - [j51]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder. IEEE Access 7: 171114-171125 (2019) - [j50]Karthika Vijayan, Haizhou Li, Tomoki Toda:
Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes. IEEE Signal Process. Mag. 36(1): 95-102 (2019) - [c250]Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output. ASRU 2019: 176-183 - [c249]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems. ASRU 2019: 214-221 - [c248]Farzaneh Ahmadi, Kazuhiro Kobayashi, Tomoki Toda:
Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction. ASSETS 2019: 655-657 - [c247]Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion. EUSIPCO 2019: 1-5 - [c246]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation. EUSIPCO 2019: 1-5 - [c245]Tatsuya Komatsu, Tomoki Hayashi, Reishi Kondo, Tomoki Toda, Kazuya Takeda:
Scene-dependent Anomalous Acoustic-event Detection Based on Conditional Wavenet and I-vector. ICASSP 2019: 870-874 - [c244]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder. ICASSP 2019: 6815-6819 - [c243]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features. ICASSP 2019: 7020-7024 - [c242]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. INTERSPEECH 2019: 196-200 - [c241]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. INTERSPEECH 2019: 674-678 - [c240]Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds. INTERSPEECH 2019: 684-688 - [c239]Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion. INTERSPEECH 2019: 709-713 - [c238]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders. INTERSPEECH 2019: 1308-1312 - [c237]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu:
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis. INTERSPEECH 2019: 4430-4434 - [c236]Li Li, Tomoki Toda, Kazuho Morikawa, Kazuhiro Kobayashi, Shoji Makino:
Improving Singing Aid System for Laryngectomees With Statistical Voice Conversion and VAE-SPACE. ISMIR 2019: 784-790 - [c235]Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion. SSW 2019: 57-62 - [c234]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder. SSW 2019: 63-68 - [c233]Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda:
An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement. SSW 2019: 251-256 - [i17]Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion. CoRR abs/1905.00615 (2019) - [i16]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. CoRR abs/1907.00797 (2019) - [i15]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder. CoRR abs/1907.08940 (2019) - [i14]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. CoRR abs/1907.10185 (2019) - [i13]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan:
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. CoRR abs/1910.10909 (2019) - [i12]Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Héctor Delgado, Andreas Nautsch, Nicholas W. D. Evans, Md. Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sébastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-François Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling:
The ASVspoof 2019 database. CoRR abs/1911.01601 (2019) - [i11]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. CoRR abs/1912.06813 (2019) - 2018
- [j49]Tomoki Hayashi, Masafumi Nishida, Norihide Kitaoka, Tomoki Toda, Kazuya Takeda:
Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 101-A(1): 199-210 (2018) - [j48]Shogo Seki, Tomoki Toda, Kazuya Takeda:
Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 101-A(7): 1057-1064 (2018) - [j47]Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
An end-to-end model for cross-lingual transformation of paralinguistic information. Mach. Transl. 32(4): 353-368 (2018) - [j46]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Commun. 99: 211-220 (2018) - [c232]Moe Takada, Shogo Seki, Tomoki Toda:
Self-Produced Speech Enhancement and Suppression Method using Air- and Body-Conductive Microphones. APSIPA 2018: 1240-1245 - [c231]Shunya Seiya, Ryuya Ito, Kosuke Okamoto, Ukyo Tanikawa, Shigeki Ohira, Daisuke Deguchi, Tomoki Toda:
Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS. EDUCON 2018: 835-842 - [c230]Koichi Miyazaki, Tomoki Hayashi, Tomoki Toda, Kazuya Takeda:
Connectionist Temporal Classification-based Sound Event Encoder for Converting Sound Events into Onomatopoeic Representations. EUSIPCO 2018: 852-856 - [c229]Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN. EUSIPCO 2018: 2115-2119 - [c228]Tomoki Hayashi, Tatsuya Komatsu, Reishi Kondo, Tomoki Toda, Kazuya Takeda:
Anomalous Sound Event Detection Based on WaveNet. EUSIPCO 2018: 2494-2498 - [c227]Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features. ICASSP 2018: 5654-5658 - [c226]Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation. ICASSP 2018: 5664-5668 - [c225]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. INTERSPEECH 2018: 801-805 - [c224]Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda:
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder. INTERSPEECH 2018: 1988-1992 - [c223]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis. INTERSPEECH 2018: 2027-2031 - [c222]Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda:
Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features. INTERSPEECH 2018: 2469-2473 - [c221]Farzaneh Ahmadi, Tomoki Toda:
Designing a Pneumatic Bionic Voice Prosthesis - A Statistical Approach for Source Excitation Generation. INTERSPEECH 2018: 3142-3146 - [c220]Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling:
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. Odyssey 2018: 187-194 - [c219]Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling:
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods. Odyssey 2018: 195-202 - [c218]Kazuhiro Kobayashi, Tomoki Toda:
sprocket: Open-Source Voice Conversion Software. Odyssey 2018: 203-210 - [c217]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018. Odyssey 2018: 211-218 - [c216]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
NU Voice Conversion System for the Voice Conversion Challenge 2018. Odyssey 2018: 219-226 - [c215]Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion. SLT 2018: 297-303 - [c214]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Improving FFTNet Vocoder with Noise Shaping and Subband Approaches. SLT 2018: 304-311 - [c213]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for end-to-end ASR. SLT 2018: 426-433 - [i10]Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling:
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods. CoRR abs/1804.04262 (2018) - [i9]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. CoRR abs/1804.08050 (2018) - [i8]Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling:
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. CoRR abs/1804.08438 (2018) - [i7]Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda:
Collapsed speech segment detection and suppression for WaveNet vocoder. CoRR abs/1804.11055 (2018) - [i6]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices. CoRR abs/1806.06812 (2018) - [i5]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for End-to-End ASR. CoRR abs/1807.10893 (2018) - [i4]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation. CoRR abs/1810.00223 (2018) - [i3]Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion. CoRR abs/1811.11078 (2018) - 2017
- [j45]Kou Tanaka, Tomoki Toda, Satoshi Nakamura:
A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction. IEICE Trans. Inf. Syst. 100-D(9): 2165-2173 (2017) - [j44]Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Preserving Word-Level Emphasis in Speech-to-Speech Translation. IEEE ACM Trans. Audio Speech Lang. Process. 25(3): 544-556 (2017) - [j43]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Duration-Controlled LSTM for Polyphonic Sound Event Detection. IEEE ACM Trans. Audio Speech Lang. Process. 25(11): 2059-2070 (2017) - [j42]Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings. IEEE ACM Trans. Audio Speech Lang. Process. 25(12): 2337-2350 (2017) - [c212]Kazuho Morikawa, Tomoki Toda:
Electrolaryngeal speech modification towards singing aid system for laryngectomees. APSIPA 2017: 610-613 - [c211]Patrick Lumban Tobing, Hirokazu Kameoka, Tomoki Toda:
Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling. APSIPA 2017: 1274-1277 - [c210]Akira Tamamori, Tomoki Hayashi, Tomoki Toda, Kazuya Takeda:
An investigation of recurrent neural network for daily activity recognition using multi-modal signals. APSIPA 2017: 1334-1340 - [c209]Kazutaka Kubo, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An investigation of how to design control parameters for statistical voice timbre control. APSIPA 2017: 1520-1523 - [c208]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
Accurate estimation of f0 and aperiodicity based on periodicity detector residuals and deviations of phase derivatives. APSIPA 2017: 1556-1564 - [c207]Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Subband wavenet with overlapped single-sideband filterbanks. ASRU 2017: 698-704 - [c206]Tomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
An investigation of multi-speaker training for wavenet vocoder. ASRU 2017: 712-718 - [c205]Shogo Seki, Tomoki Toda, Kazuya Takeda:
Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization. EUSIPCO 2017: 981-985 - [c204]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection. ICASSP 2017: 766-770 - [c203]Yusuke Tajiri, Hirokazu Kameoka, Tomoki Toda:
A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals. ICASSP 2017: 4960-4964 - [c202]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation. INTERSPEECH 2017: 424-428 - [c201]Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura:
Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement. INTERSPEECH 2017: 1069-1073 - [c200]Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Speaker-Dependent WaveNet Vocoder. INTERSPEECH 2017: 1118-1122 - [c199]Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda:
Statistical Voice Conversion with WaveNet-Based Waveform Generation. INTERSPEECH 2017: 1138-1142 - [c198]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis. INTERSPEECH 2017: 1358-1362 - [c197]Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino:
Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. INTERSPEECH 2017: 1998-2002 - [c196]Shogo Seki, Hirokazu Kameoka, Tomoki Toda, Kazuya Takeda:
Missing component restoration for masked speech signals based on time-domain spectrogram factorization. MLSP 2017: 1-6 - [i2]Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino:
A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis. CoRR abs/1702.06724 (2017) - [i1]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation. CoRR abs/1706.02964 (2017) - 2016
- [j41]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior. IEICE Trans. Inf. Syst. 99-D(6): 1437-1446 (2016) - [j40]Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models. IEICE Trans. Inf. Syst. 99-D(10): 2490-2498 (2016) - [j39]Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion. IEICE Trans. Inf. Syst. 99-D(11): 2767-2777 (2016) - [j38]Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics. IEICE Trans. Inf. Syst. 99-D(12): 3132-3139 (2016) - [j37]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Learning cooperative persuasive dialogue policies using framing. Speech Commun. 84: 83-96 (2016) - [j36]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 24(4): 755-767 (2016) - [j35]Zhizheng Wu, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi:
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance. IEEE ACM Trans. Audio Speech Lang. Process. 24(4): 768-783 (2016) - [j34]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Hideki Negoro, Hidemi Iwasaka, Satoshi Nakamura:
Teaching Social Communication Skills Through Human-Agent Interaction. ACM Trans. Interact. Intell. Syst. 6(2): 18:1-18:26 (2016) - [c195]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection. DCASE 2016: 35-39 - [c194]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices. EMBC 2016: 3728-3731 - [c193]Kou Tanaka, Tomoki Toda, Graham Neubig, Satoshi Nakamura:
Real-time vibration control of an electrolarynx based on statistical F0 contour prediction. EUSIPCO 2016: 1333-1337 - [c192]Soichi Yamane, Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer. ICASSP 2016: 5265-5269 - [c191]Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura:
Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework. ICASSP 2016: 5665-5669 - [c190]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification. ICASSP 2016: 5670-5674 - [c189]Yusuke Tajiri, Tomoki Toda, Satoshi Nakamura:
Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring. ICASSP 2016: 5935-5939 - [c188]Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura:
Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model. INTERSPEECH 2016: 953-957 - [c187]Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
The Voice Conversion Challenge 2016. INTERSPEECH 2016: 1632-1636 - [c186]Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda:
The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016. INTERSPEECH 2016: 1667-1671 - [c185]Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework. INTERSPEECH 2016: 2288-2292 - [c184]Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training. INTERSPEECH 2016: 3196-3200 - [c183]Takuya Hiraoka, Graham Neubig, Koichiro Yoshino, Tomoki Toda, Satoshi Nakamura:
Active Learning for Example-Based Dialog Systems. IWSDS 2016: 67-78 - [c182]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. SLT 2016: 693-700 - [c181]Yusuke Tajiri, Tomoki Toda:
Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. SSW 2016: 52-58 - 2015
- [j33]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
NOCOA+: Multimodal Computer-Based Training for Social and Communication Skills. IEICE Trans. Inf. Syst. 98-D(8): 1536-1544 (2015) - [j32]Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Semantic Parsing of Ambiguous Input through Paraphrasing and Verification. Trans. Assoc. Comput. Linguistics 3: 571-584 (2015) - [c180]Masahiro Mizukami, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Linguistic Individuality Transformation for Spoken Language. IWSDS 2015: 129-143 - [c179]Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection. IWSDS 2015: 145-152 - [c178]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System. IWSDS 2015: 153-167 - [c177]Takafumi Sasakura, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Unknown Word Detection Based on Event-Related Brain Desynchronization Responses. IWSDS 2015: 169-175 - [c176]Yuiko Tsunomori, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
An Analysis Towards Dialogue-Based Deception Detection. IWSDS 2015: 177-187 - [c175]Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents. ACL (1) 2015: 198-207 - [c174]Akiva Miura, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Improving Pivot Translation by Remembering the Pivot. ACL (2) 2015: 573-577 - [c173]Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino:
Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. APSIPA 2015: 520-529 - [c172]Sakriani Sakti, Faiz Ilham, Graham Neubig, Tomoki Toda, Ayu Purwarianti, Satoshi Nakamura:
Incremental sentence compression using LSTM recurrent networks. ASRU 2015: 252-258 - [c171]Quoc Truong Do, Michael Heck, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function. ASRU 2015: 654-659 - [c170]Nurul Lubis, Sakriani Sakti, Graham Neubig, Koichiro Yoshino, Tomoki Toda, Satoshi Nakamura:
A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows. ASRU 2015: 777-783 - [c169]Masahiro Mizukami, Hideaki Kizuki, Toshio Nomura, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Adaptive selection from multiple response candidates in example-based dialogue. ASRU 2015: 784-790 - [c168]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction. ASSETS 2015: 435-436 - [c167]Shinnosuke Takamichi, Kazuhiro Kobayashi, Kou Tanaka, Tomoki Toda, Satoshi Nakamura:
The NAIST Text-to-Speech System for the Blizzard Challenge 2015. Blizzard Challenge 2015 - [c166]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
An evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model. EMBC 2015: 2775-2778 - [c165]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
EEG signal enhancement using multi-channel wiener filter with a spatial correlation prior. ICASSP 2015: 2639-2643 - [c164]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis. ICASSP 2015: 4210-4214 - [c163]Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, Simon King:
SAS: A speaker verification spoofing database containing diverse attacks. ICASSP 2015: 4440-4444 - [c162]Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. ICASSP 2015: 4525-4529 - [c161]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion. ICASSP 2015: 4859-4863 - [c160]Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. INTERSPEECH 2015: 299-303 - [c159]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. INTERSPEECH 2015: 1206-1210 - [c158]Takashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Speed or accuracy? a study in evaluation of simultaneous speech translation. INTERSPEECH 2015: 2267-2271 - [c157]The Tung Nguyen, Graham Neubig, Hiroyuki Shindo, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
A latent variable model for joint pause prediction and dependency parsing. INTERSPEECH 2015: 2719-2723 - [c156]Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Statistical singing voice conversion based on direct waveform modification with global variance. INTERSPEECH 2015: 2754-2758 - [c155]Yusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. INTERSPEECH 2015: 2769-2773 - [c154]Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. INTERSPEECH 2015: 3350-3354 - [c153]Quoc Truong Do, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. INTERSPEECH 2015: 3665-3669 - [c152]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Hideki Negoro, Hidemi Iwasaka, Satoshi Nakamura:
Automated Social Skills Trainer. IUI 2015: 17-27 - [c151]Quoc Truong Do, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Improving translation of emphasis with pause prediction in speech-to-speech translation systems. IWSLT 2015 - [c150]Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T). ASE 2015: 574-584 - [c149]Hiroyuki Fudaba, Yusuke Oda, Koichi Akabe, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code. ASE 2015: 824-829 - [c148]Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Ckylark: A More Robust PCFG-LA Parser. HLT-NAACL 2015: 41-45 - [c147]Nurul Lubis, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Construction and analysis of social-affective interaction corpus in English and Indonesian. O-COCOSDA/CASLRE 2015: 202-206 - [c146]Kyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering. WMT@EMNLP 2015: 442-449 - 2014
- [j31]Kazuhiro Kobayashi, Tomoki Toda, Hironori Doi, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Voice Timbre Control Based on Perceived Age in Singing Voice Conversion. IEICE Trans. Inf. Syst. 97-D(6): 1419-1428 (2014) - [j30]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation. IEICE Trans. Inf. Syst. 97-D(6): 1429-1437 (2014) - [j29]Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model. IEICE Trans. Inf. Syst. 97-D(6): 1468-1476 (2014) - [j28]