Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis

Kubin, Gernot; Lainscsek, Claudia; Rank, Erhard

doi:10.1007/11520153_5

Gernot Kubin²²,
Claudia Lainscsek²³ &
Erhard Rank²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

1227 Accesses

Abstract

More than ten years ago the first successful application of a nonlinear oscillator model to high-quality speech signal processing was reported (Kubin and Kleijn, 1994). Since then, numerous developments have been initiated to turn nonlinear oscillators into a standard tool for speech technology. The present contribution will review and compare several of these attempts with a special emphasis on adaptive model identification from data and the approaches to the associated machine learning problems. This includes Bayesian methods for the regularization of the parameter estimation problem (including the pruning of irrelevant parameters) and Ansatz library (Lainscsek et al., 2001) based methods (structure selection of the model). We conclude with the observation that these advanced identification methods need to be combined with a thorough background from speech science to succeed in practical modeling tasks.

This chapter corresponds to talks given at the Cost 277 summerschool at IIASS in Vietri sul Mare (IT), in Sept. 2004. We would sincerely like to thank Anna Esposito for organizing the summerschool, and for her patience editing this publication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Shennong: A Python toolbox for audio speech features extraction

Article 07 February 2023

References

Kubin, G.: Nonlinear processing of speech. In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, pp. 557–610. Elsevier, Amsterdam (1995)
Google Scholar
Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, pp. 267–270 (1996)
Google Scholar
Kubin, G., Kleijn, W.B.: Time-scale modification of speech based on a nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Adelaide, South Australia, vol. 1, pp. 453–456 (1994)
Google Scholar
Sauer, T.: A noise reduction method for signals from nonlinear systems. Physica D 52, 193–201 (1992)
Article MathSciNet Google Scholar
Hegger, R., Kantz, H., Matassini, L.: Noise reduction for human speech signals by local projection in embedding spaces. IEEE Transactions on Circuits and Systems 48, 1454–1461 (2001)
Article Google Scholar
Terez, D.E.: Robust pitch determination using nonlinear state-space embedding. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, vol. 1, pp. 345–348 (2002)
Google Scholar
Mann, I., McLaughlin, S.: A nonlinear algorithm for epoch marking in speech signals using Poincaré maps. In: Proceedings of the European Signal Processing Conference, vol. 2, pp. 701–704 (1998)
Google Scholar
Lindgren, A.C., Johnson, M.T., Povinelli, R.J.: Joint frequency domain and reconstructed phase space features for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada. 1, pp. 533–536 (2004)
Google Scholar
Birgmeier, M.: A fully Kalman-trained radial basis function network for nonlinear speech modeling. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, pp. 259–264 (1995)
Google Scholar
Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta (GA), vol. 1, pp. 267–270 (1996)
Google Scholar
Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proceedings of the 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (1998)
Google Scholar
Mann, I., McLaughlin, S.: Stable speech synthesis using recurrent radial basis functions. In: Proceedings of the European Conference on Speech Communication and Technology, Budapest, Hungary, vol. 5, pp. 2315–2318 (1999)
Google Scholar
Narasimhan, K., Pr´ıncipe, J.C., Childers, D.G.: Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, pp. 389–392 (1999)
Google Scholar
Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In: Mira, J., Prieto, A.G. (eds.) IWANN 2001. LNCS, vol. 2085, pp. 746–753. Springer, Heidelberg (2001) part II
Chapter Google Scholar
Mann, I., McLaughlin, S.: Synthesising natural-sounding vowels using a nonlinear dynamical model. Signal Processing 81, 1743–1756 (2001)
Article MATH Google Scholar
Rank, E.: Application of Bayesian trained RBF networks to nonlinear time-series modeling. Signal Processing 83, 1393–1410 (2003)
Article MATH Google Scholar
Takens, F.: Detecting strange attractors in turbulence. In: Steffens, P. (ed.) EAMT-WS 1993. LNCS, vol. 898, p. 366. Springer, Heidelberg (1995)
Google Scholar
Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. Journal of Statistical Physics 65, 579–616 (1991)
Article MATH MathSciNet Google Scholar
Haykin, S., Príncipe, J.: Making sense of a complex world. IEEE Signal Processing Magazine 15, 66–81 (1998)
Article Google Scholar
Judd, K., Mees, A.: Embedding as a modeling problem. Physica D 120, 273–286 (1998)
Article MATH Google Scholar
Bernhard, H.P.: The Mutual Information Function and its Application to Signal Processing. PhD thesis, Vienna University of Technology (1997)
Google Scholar
Hegger, R., Kantz, H., Schreiber, T.: Practical implementation of nonlinear time series methods: The TISEAN package. CHAOS 9, 413–435 (1999)
Article MATH Google Scholar
Bernhard, H.P., Kubin, G.: Detection of chaotic behaviour in speech signals using Fraser’s mutual information algorithm. In: Proc. 13th GRETSI Symp. Signal and Image Process, Juan-les-Pins, France, pp. 1301–1311 (1991)
Google Scholar
Mann, I.: An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques. PhD thesis, University of Edinburgh (1999)
Google Scholar
Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In: Mira, J., Prieto, A.G. (eds.) IWANN 2001. LNCS, vol. 2085, pp. 746–753. Springer, Heidelberg (2001) part II
Chapter Google Scholar
Li, J., Zhang, B., Lin, F.: Nonlinear speech model based on support vector machine and wavelet transform. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2003), Sacramento, CA, pp. 259–264 (2003)
Google Scholar
Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proc. 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA (1998)
Google Scholar
Townshend, B.: Nonlinear prediction of speech. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 425–428 (1991)
Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W.H. Winston (1977)
Google Scholar
Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo 1140, Massachusetts Institute of Technology (1989)
Google Scholar
Stone, M.: Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36, 111–147 (1974)
MATH Google Scholar
MacKay, D.J.: Bayesian interpolation. Neural Computation 4, 415–447 (1992)
Article Google Scholar
MacKay, D.J.: A practical Bayesian framework for backprop networks. Neural Computation 4, 448–472 (1992)
Article Google Scholar
MacKay, D.J.: The evidence framework applied to classification networks. Neural Computation 4, 698–714 (1992)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelyhood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)
Article MATH MathSciNet Google Scholar
Fant, G., Liljencrants, J., Lin, Q.G.: A four parameter model of glottal flow. Quarterly Progress Status Report 4, Speech Transmission Laboratory/Royal Institute of Technology, Stockholm, Sweden (1985)
Google Scholar
Köppl, H., Kubin, G., Paoli, G.: Bayesian methods for sparse RLS adaptive filters. Thirty-Seventh IEEE Asilomar Conference on Signals, Systems and Computers 2, 1273–1277 (2003)
Article Google Scholar
Kubin, G., Atal, B.S., Kleijn, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. IEEE Workshop on Speech Coding for Telecommunication, St.Jovite, Québec, Canada, pp. 1–2 (1993)
Google Scholar
Holm, S.: Automatic generation of mixed excitation in a linear predictive speech synthesizer. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta (GA), vol. 6, pp. 118–120 (1981)
Google Scholar
Hermes, D.J.: Synthesis of breathy vowels: Some research methods. Speech Communication 10, 497–502 (1991)
Article Google Scholar
Skoglund, J., Kleijn, W.B.: On the significance of temporal masking in speech coding. In: Proceedings of the International Conference on Spoken Language Processing, Sydney, vol. 5, pp. 1791–1794 (1998)
Google Scholar
Jackson, P.J., Shadle, C.H.: Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data. In: Proceedings of 5th Speech Production Seminar, Kloster Seeon, Germany, pp. 185–188 (2000)
Google Scholar
Jackson, P.J., Shadle, C.H.: Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. Journal of the Acoustic Society of America 108, 1421–1434 (2000)
Article Google Scholar
Stylianou, Y., Laroche, J., Moulines, E.: High-quality speech modification based on a harmonic + noise model. In: Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain, pp. 451–454 (1995)
Google Scholar
Bailly, G.: A parametric harmonic+noise model. In: Keller, E., Bailly, G., Monaghan, A., Terken, J., Huckvale, M. (eds.) Improvements in Speech Synthesis, pp. 22–38. Wiley, Chichester (2002)
Google Scholar
Rank, E., Kubin, G.: An oscillator-plus-noise model for speech synthesis. Speech Communication, Accepted for publication (2005)
Google Scholar
Lu, H.L., Smith, I.J.O.: Glottal source modeling for singing voice. In: Proc. International Computer Music Conference, Berlin, Germany, pp. 90–97 (2000)
Google Scholar
Lainscsek, C., Letellier, C., Schürrer, F.: Ansatz library for global modeling with a structure selection. Physical Review E 64, 016206:1–15 (2001)
Article Google Scholar
Lainscsek, C., Letellier, C., Gorodnitsky, I.: Global modeling of the Rössler system from the z-variable. Physics Letters A 314(5-6), 127–409 (2003)
Article MathSciNet Google Scholar
Judd, K., Mees, A.: On selecting models for nonlinear time series. Physica D 82, 426–444 (1995)
Article MATH Google Scholar
Gouesbet, G., Letellier, C.: Global vector-field reconstruction by using a multivariate polynomial l₂ approximation on nets. Phys. Rev. E 49, 4955 (1994)
Article MathSciNet Google Scholar
Press, W., Flannery, B., Teukolsky, S., Vetterling, W.: Numerical Recipes in C. Cambridge University Press, Cambridge (1990)
Google Scholar
Lainscsek, C., Gorodnitsky, I.: Ansatz libraries for systems with quadratic and cubic non-linearities (2002), http://cloe.ucsd.edu/claudia/posterDD2002.pdf
Eichhorn, R., Linz, S., Hänggi, P.: Transformations of nonlinear dynamical systems to jerky motion and its application to minimal chaotic flows. Physical Review E 58 (6), 7151–7164 (1998)
Article MathSciNet Google Scholar
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1998)
Google Scholar
Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)
Google Scholar
Ishizaka, K., Flanagan, J.L.: Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Systems Technical Journal 51, 1233–1267 (1972)
Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz, Austria
Gernot Kubin & Erhard Rank
Cognitive Science Department, University of California at San Diego, La Jolla, CA, USA
Claudia Lainscsek

Authors

Gernot Kubin
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Lainscsek
View author publications
You can also search for this author in PubMed Google Scholar
Erhard Rank
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kubin, G., Lainscsek, C., Rank, E. (2005). Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_5

Download citation

DOI: https://doi.org/10.1007/11520153_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Bayesian view on acoustic model-based techniques for robust speech recognition

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Shennong: A Python toolbox for audio speech features extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Bayesian view on acoustic model-based techniques for robust speech recognition

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Shennong: A Python toolbox for audio speech features extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation