Real-Time Visual Prosody for Interactive Virtual Agents

van Welbergen, Herwin; Ding, Yu; Sattler, Kai; Pelachaud, Catherine; Kopp, Stefan

doi:10.1007/978-3-319-21996-7_16

Herwin van Welbergen^7,8,
Yu Ding⁸,
Kai Sattler^7,9,
Catherine Pelachaud⁸ &
…
Stefan Kopp⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9238))

Included in the following conference series:

International Conference on Intelligent Virtual Agents

2135 Accesses
9 Citations
2 Altmetric

Abstract

Speakers accompany their speech with incessant, subtle head movements. It is important to implement such “visual prosody” in virtual agents, not only to make their behavior more natural, but also because it has been shown to help listeners understand speech. We contribute a visual prosody model for interactive virtual agents that shall be capable of having live, non-scripted interactions with humans and thus have to use Text-To-Speech rather than recorded speech. We present our method for creating visual prosody online from continuous TTS output, and we report results from three crowdsourcing experiments carried out to see if and to what extent it can help in enhancing the interaction experience with an agent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modeling Multimodal Behaviors from Speech Prosody

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Notes

1.
The state at time t is independent of all the previous states given the state at time $t-1$; the observation at time t is assumed independent of all other observations and all states given the state at time t.
2.
https://www.cereproc.com/.
3.
http://www.crowdflower.com/.

References

Bergmann, Kirsten, Kopp, Stefan, Eyssel, Friederike: Individualized gesturing outperforms average gesturing – evaluating gesture production in virtual humans. In: Safonova, Alla (ed.) IVA 2010. LNCS, vol. 6356, pp. 104–117. Springer, Heidelberg (2010)
Chapter Google Scholar
Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., Lee, S., Narayanan, S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
Article Google Scholar
Busso, C., Deng, Z., Neumann, U., Narayanan, S.: Natural head motion synthesis driven by acoustic prosodic features. Comput. Animation Virtual Worlds 16(3–4), 283–290 (2005)
Article Google Scholar
Chuang, E., Bregler, C.: Mood swings: expressive speech animation. Trans. Graph. 24(2), 331–347 (2005)
Article Google Scholar
Ding, Y., Pelachaud, C., Artières, T.: Modeling multimodal behaviors from speech prosody. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds.) IVA 2013. LNCS, vol. 8108, pp. 217–228. Springer, Heidelberg (2013)
Chapter Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Conference on Multimedia, pp. 835–838. ACM (2013)
Google Scholar
Fiske, S.T., Cuddy, A.J.C., Glick, P.: Universal dimensions of social cognition: warmth and competence. Trends Cogn. Sci. 11(2), 77–83 (2007)
Article Google Scholar
Graf, H.P., Cosatto, E., Strom, V., Hang, F.J.: Visual prosody: facial movements accompanying speech. In: Automatic Face and Gesture Recognition, pp. 381–386. IEEE Computer Society (2002)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Heylen, D.K.J.: Head gestures, gaze and the principles of conversational structure. Int. J. Humanoid Rob. 3(3), 241–267 (2006)
Article Google Scholar
Le, B.H., Ma, X., Deng, Z.: Live speech driven head-and-eye motion generators. Trans. Visual Comput. Graphics 18(11), 1902–1914 (2012)
Article Google Scholar
Lee, J., Marsella, S.: Modeling speaker behavior: a comparison of two approaches. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 161–174. Springer, Heidelberg (2012)
Chapter Google Scholar
Levine, S., Krähenbühl, P., Thrun, S., Koltun, V.: Gesture controllers. Trans. Graph. 29(4), 124:1–124:11 (2010)
Article Google Scholar
Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. In: SIGGRAPH Asia, pp. 1–10. ACM, New York (2009)
Google Scholar
Mariooryad, S., Busso, C.: Generating human-like behaviors using joint, speech-driven models for conversational agents. Audio Speech Lang. Process. 20(8), 2329–2340 (2012)
Article Google Scholar
Munhall, K.G., Jones, J.A., Callan, D.E., Kuratate, T., Vatikiotis-Bateson, E.: Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol. Sci. 15(2), 133–137 (2004)
Article Google Scholar
van Welbergen, H., Yaghoubzadeh, R., Kopp, S.: AsapRealizer 2.0: the next steps in fluent behavior realization for ECAs. In: Bickmore, T., Marsella, S., Sidner, C. (eds.) IVA 2014. LNCS, vol. 8637, pp. 449–462. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgements

We would like to thank Kirsten Bergmann and Philipp Kulms for their feedback on the design of the study and their help with the evaluation of the results. This work was partially performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02. It was also partially funded by the EU H2020 project ARIA-VALUSPA; and by the German Federal Ministry of Education and Research (BMBF) within the Leading-Edge Cluster Competition, managed by the Project Management Agency Karlsruhe (PTKA). The authors are responsible for the contents of this publication.

Author information

Authors and Affiliations

Social Cognitive Systems Group, CITEC, Faculty of Technology, Bielefeld University, Bielefeld, Germany
Herwin van Welbergen, Kai Sattler & Stefan Kopp
CNRS-LTCI, Télécom-ParisTech, Paris, France
Herwin van Welbergen, Yu Ding & Catherine Pelachaud
Department of Psychology, University of Bamberg, Bamberg, Germany
Kai Sattler

Authors

Herwin van Welbergen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Ding
View author publications
You can also search for this author in PubMed Google Scholar
Kai Sattler
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Pelachaud
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kopp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herwin van Welbergen .

Editor information

Editors and Affiliations

Delft University of Technology, Delft, The Netherlands
Willem-Paul Brinkman
Delft University of Technology, Delft, The Netherlands
Joost Broekens
University of Twente, Enschede, The Netherlands
Dirk Heylen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Welbergen, H., Ding, Y., Sattler, K., Pelachaud, C., Kopp, S. (2015). Real-Time Visual Prosody for Interactive Virtual Agents. In: Brinkman, WP., Broekens, J., Heylen, D. (eds) Intelligent Virtual Agents. IVA 2015. Lecture Notes in Computer Science(), vol 9238. Springer, Cham. https://doi.org/10.1007/978-3-319-21996-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-21996-7_16
Published: 01 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21995-0
Online ISBN: 978-3-319-21996-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Real-Time Visual Prosody for Interactive Virtual Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Real-Time Visual Prosody for Interactive Virtual Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation