Abstract
Speaker adaptation techniques use a small amount of data to modify Hidden Markov Model (HMM) based speech synthesis systems to mimic a target voice. These techniques can be used to provide personalized systems to people who suffer some speech impairment and allow them to communicate in a more natural way. Although the adaptation techniques don’t require a big quantity of data, the recording process can be tedious if the user has speaking problems. To improve the acceptance of these systems an important factor is to be able to obtain acceptable results with minimal amount of recordings. In this work we explore the performance of an adaptation method based on Frequency Warping which uses only vocalic segments according to the amount of available training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Taylor, P.: Text-to-Speech Synthesis. Cambridge University Press, Cambridge (2009)
Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33(1), 1–5 (2012)
Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised synthetic voices for individuals with severe speech impairment. Comput. Speech Lang. 27(6), 1178–1193 (2013)
Lanchantin, P., Veaux, C., Gales, M.J.F., King, S., Yamagishi, J.: Reconstructing voices within the multiple-average-voice-model framework. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech), Dresden, Germany, pp. 2232–2236 (2015)
Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic segments via frequency warping. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech), Dresden, Germany, pp. 2764–2768 (2015)
Kawahara, H., Masuda-Katsusue, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 184–194 (2014)
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90-D(5), 825–834 (2007)
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis, vol. 30, pp. 1315–1318 (2000)
Yamagishi, J.: A training method of average voice model for HMM-based speech synthesis using MLLR. IEICE Trans. Inf. Syst. 86(8), 1956–1963 (2003)
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., Isogai, J.: Analysis of speaker adaptation algorthims for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 19(1), 66–83 (2009)
Erro, D., Alonso, A., Serrano, L., Navas, E., Hernaez, I.: Interpretable parametric voice conversion functions based on Gaussian mixture models and constrained transformations. Comput. Speech Lang. 30, 3–15 (2015)
Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio Speech Lang. Process. 18(5), 922–931 (2010)
Zorila, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations. Commun. Comput. Inf. Sci. 328, 30–39 (2012)
Godoy, E., Rosec, O., Chonavel, T.: Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 20(4), 1313–1323 (2012)
Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21(3), 556–566 (2013)
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech Audio Process. 13, 930–944 (2005)
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)
Cappé, O., Laroche, J., Moulines, E.: Regularized estimation of cepstrum envelope from discrete frequency points. In: IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 213–216 (1995)
Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Magariños, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online platform for obtaining personalized synthetic voices. In: Proceedings of eNTERFACE 2014 (2014)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., et al.: The HTK Book, version 3.4 (2006)
Acknowledgments
This work has been partially supported by MINECO/FEDER, UE (SpeechTech4All project, TEC2012-38939-C03-03 and RESTORE project, TEC2015-67163-C2-1-R), and the Basque Government (ELKAROLA project, KK-2015/00098).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Alonso, A., Erro, D., Navas, E., Hernaez, I. (2016). Study of the Effect of Reducing Training Data in Speech Synthesis Adaptation Based on Frequency Warping. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-49169-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)