Abstract
While approaches on automatic recognition of human emotion from speech have already achieved reasonable results , a lot of room for improvement still remains there. In our research, we select the most essential features by applying a self-adaptive multi-objective genetic algorithm. The proposed approach is evaluated using data from different languages (English and German) with two different feature sets consisting of 37 and 384 dimensions, respectively. The obtained results of the developed technique have increased the emotion recognition performance by up to 49.8 % relative improvement in accuracy. Furthermore, in order to identify salient features across speech data from different languages, we analysed the selection count of the features to generate a feature ranking. Based on this, a feature set for speech-based emotion recognition based on the most salient features has been created. By applying this feature set, we achieve a relative improvement of up to 37.3 % without the need of time-consuming feature selection using a genetic algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bellman, R.: Dynamic Programming. Princeton University Press (1957)
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of the INTERSPEECH (2003)
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH 97, 1743–1746 (1997)
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M.J., Wong, M.: “you stupid tin box”—children interacting with the Aibo robot: A cross-linguistic emotional speech corpus. In: Proceedings of LREC (2004)
Gharavian, D., Sheikhan, M., Nazerieh, A., Garoucy, S.: Speech emotion recognition using fcbf feature selection method and ga-optimized fuzzy artmap neural network. Neural Comput. Appl. 21(8), 2115–2126 (2012)
Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3(5), 698–713 (1992)
Bijankhan, M., Sheikhzadegan, J., Roohani, M., Samareh, Y., Lucas, C., Tebyani, M.: Farsdat-the speech database of Farsi spoken language. In: Proceedings of the Australian Conference on Speech Science and Technology. vol. 2, pp. 826–830 (1994)
Polzehl, T., Schmitt, A., Metze, F.: Salient features for anger recognition in German and English IVR portals. In: Minker, W., Lee, G.G., Nakamura, S., Mariani, J. (eds.) Spoken Dialogue Systems Technology and Design, pp. 83–105. Springer, New York (2011). doi:10.1007/978-1-4419-7934-6_4
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceeding of Interspeech. pp. 1517–1520 (2005)
Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 865–868. IEEE (2008)
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2009, ASRU 2009, pp. 552–557. IEEE (2009)
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)
Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)
Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 3(2), 124–141 (1999)
Daridi, F., Kharma, N., Salik, J.: Parameterless genetic algorithms: review and innovation. IEEE Can. Rev. 47, 19–23 (2004)
Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Parallel Problem Solving from Nature–PPSN III, pp. 249–257. Springer (1994)
Zhang, Q., Zhou, A., Zhao, S., Suganthan, P.N., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the CEC 2009 special session and competition. University of Essex, Colchester, UK and Nanyang Technological University, Singapore, Special Session on Performance Assessment of Multi-Objective Optimization Algorithms, Technical Report (2008)
Kockmann, M., Burget, L., Černockỳ, J.: Brno University of Technology system for Interspeech 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2002)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Sidorov, M., Brester, C., Ultes, S., Schmitt, A. (2017). Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-2585-3_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2584-6
Online ISBN: 978-981-10-2585-3
eBook Packages: EngineeringEngineering (R0)