Abstract
In this contribution, we introduce a novel approach to combine acoustic information and emotional point information for a robust automatic recognition of a speaker’s emotion. Six discrete emotional states are recognized in the work. Firstly, a multi-level model for emotion recognition by acoustic features is presented. The derived features are selected by fisher rate to distinguish different types of emotions. Secondly, a novel emotional point model for Mandarin is established by Support Vector Machine and Hidden Markov Model. This model contains 28 emotional syllables which reflect rich emotional information. Finally the acoustic information and emotional point information are integrated by a soft decision strategy. Experimental results show that the application of emotional point information in speech emotion recognition is effective.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Picard R (2000) Affective computing. MIT Press, Cambridge
Malatesta L, Raouzaiou A, Karpouzis K, Kollias S (2009) Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis. Appl Intell 30(1):58–64
Cho S (2002) Towards creative evolutionary systems with interactive genetic algorithm. Appl Intell 16(2):129–138
Tao J, Tan T (2005) Affective computing: a review. In: Proc of affective computing and intelligent interaction, pp 981–995
Assaleh K, Shanableh T (2010) Robust polynomial classifier using L 1-norm minimization. Appl Intell 33(3):330–339
Ince G, Nakadai K, Rodemann T, Tsujino H, Imura J (2011) Ego noise cancellation of a robot using missing feature masks. Appl Intell 34(3):1–12
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181
Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256
Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth international conference on spoken language processing, Beijing, China, vol 2, pp 222–225
Yoon W, Park K (2007) A study of emotion recognition and its applications. In: Proc of modeling decisions for artificial intelligence, pp 455–462
Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27(12):1760–1774
Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200
Plutchik R (1980) Emotion: a psychoevolutionary synthesis. Harper Collins, New York
Mehrabian A, Russell J (1974) An approach to environmental psychology. MIT Press, Cambridge
Coghlan A, Pearce P (2010) Tracking affective components of satisfaction. Tour Hosp Res 10(1):42
Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423
Fujisaki H (2004) Information, prosody, and modeling-with emphasis on tonal features of speech. In: Proceedings of speech prosody 2004, Nara, Japan, pp 1–10
Zhao L, Cao Y, Wang Z, Zou C (2005) Speech emotional recognition using global and time sequence structure features with MMD. In: Proc of affective computing and intelligent interaction, pp 311–318
Shami M, Kamel M (2005) Segment-based approach to the recognition of emotions in speech. In: 2005 IEEE international conference on multimedia and expo. IEEE Press, New York, pp 1–4
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings ICASSP’04, vol 1. IEEE Press, New York, pp I577–I580
Zhang J, Hirose K (2004) Tone nucleus modeling for Chinese lexical tone recognition. Speech Commun 42(3):447–466
Chao Y (1965) A grammar of spoken Chinese. University of California Press, Berkeley
Chen Y, Wang R (1990) Speech signal processing. University of Science and Technology of China Press, Hefei (in Chinese)
Olson C (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325
Mao X, Chen L (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst 93(8):2324–2326
Xiao Z, Dellandrea E, Dou W, Chen L (2010) Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46(1):119–145
Fisher R (1938) The statistical utilization of multiple measurements. Ann Hum Genet 8(4):376–386
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Sun Y, Zhou Y, Zhao Q, Yan Y (2010) Acoustic feature optimization based on f-ratio for robust speech recognition. IEICE Trans Inf Syst 93(9):2417–2430
Pudil P, Novovicová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Lin Y, Wei G (2005) Speech emotion recognition based on HMM and SVM. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 8. IEEE Press, New York, pp 4898–4901
Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: IEEE international conference on multimedia and expo, 2005. ICME 2005. IEEE Press, New York, pp 864–867
Damper R, Gunn S, Gore M (2000) Extracting phonetic knowledge from learning systems: perceptrons, support vector machines and linear discriminants. Appl Intell 12(1):43–62
Hooper J (1972) The syllable in phonological theory. Language 48:525–540
Goslin J, Frauenfelder U (2001) A comparison of theoretical and human syllabification. Lang Speech 44(4):409–436
Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1–3):133–147
Viikki O, Bye D, Laurila K (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 2. IEEE Press, New York, pp 733–736
Glass J, Chang J, McCandless M (1996) A probabilistic framework for feature-based speech recognition. In: Proceedings of fourth international conference on spoken language, ICSLP 96, vol 4. IEEE Press, New York, pp 2277–2280
Nogueiras A, Moreno A, Bonafonte A, Mariño J (2001) Speech emotion recognition using hidden Markov models. In: Proceedings of eurospeech, 2001, pp 2679–2682
Fernandez R, Picard R (2003) Modeling drivers’ speech under stress. Speech Commun 40(1–2):145–159
Janev M, Pekar D, Jakovljevic N, Delic V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMS with full covariance matrices. Appl Intell 33(2):107–116
Acknowledgements
This research is supported by the International Science and Technology Cooperation Program of China (No. 2010DFA11990) and the National Nature Science Foundation of China (No. 61103097).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L., Mao, X., Wei, P. et al. Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37, 602–612 (2012). https://doi.org/10.1007/s10489-012-0352-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0352-1