Abstract
In recent years, the usage of virtual assistants to complete tasks like service scheduling and online shopping has increased in both popularity and need. An end user’s task goals are the main objectives of a task-oriented conversation agent, and those should be served effectively and successfully. Beside that, user satisfaction is one of the most important aspect that should be taken care of. Communication with multi-modal responses makes the conversation easier and more attractive. Responses through proper images can improve the quality of a task oriented conversation in terms of user satisfaction. Keeping these aspects in mind, we propose a framework which infuses multi-modality with an end-to-end persuasive task oriented dialogue generation module. Additionally, we create a personalised persuasive multi-modal dialogue (PPMD) corpus with slot, sentiment, and agent action annotation at turn level that contains multi-modal responses from both ends. The results and thorough analysis on this dataset show that the suggested multi-modal persuasive virtual assistant achieves better performance over traditional task-oriented frameworks in terms of user satisfaction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lei, W., Jin, X., Kan, M.Y., Ren, Z., He, X., Yin, D.: Sequicity: simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 1437–1447. Association for Computational Linguistics (2018). https://aclanthology.org/P18-1133
Liang, W., Tian, Y., Chen, C., Yu, Z.: MOSS: end-to-end dialog system framework with modular supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8327–8335 (2020). https://doi.org/10.1609/aaai.v34i05.6349
Yang, Y., Li, Y., Quan, X.: UBAR: towards fully end-to-end task-oriented dialog systems with GPT-2. In: AAAI (2021)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Budzianowski, P., Vulić, I.: Hello, it’s GPT-2-how can I help you? Towards the use of pretrained language models for task-oriented dialogue systems. arXiv preprint arXiv:1907.05774 (2019)
Tiwari, A., et al.: A dynamic goal adapted task oriented dialogue agent. PLoS ONE 16(4), e0249030 (2021)
Tiwari, A., et al.: A persona aware persuasive dialogue policy for dynamic and co-operative goal setting. Expert Syst. Appl. 195, 116303 (2022)
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)
Guo, D., Wang, H., Wang, M.: Dual visual attention network for visual dialog. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019, pp. 4989–4995 (2019)
Tiwari, A., et al.: Multi-modal dialogue policy learning for dynamic and co-operative goal setting. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Guo, D., Wang, H., Wang, S., Wangb, M.: Textual-visual reference-aware attention network for visual dialog. IEEE Trans. Image Process. 29, 6655–6666 (2020)
Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Budzianowski, P., et al.: MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278 (2018)
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2204–2213 (2018)
Bordes, A., Boureau, Y.L., Weston, J.: Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016)
Lewis, M., Yarats, D., Dauphin, Y., Parikh, D., Batra, D.: Deal or no deal? End-to-end learning of negotiation dialogues. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2443–2453 (2017)
Saha, A., Khapra, M., Sankaranarayanan, K.: Towards building large scale multimodal domain-aware conversation systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Wang, X., et al.: Persuasion for good: towards a personalized persuasive dialogue system for social good. arXiv preprint arXiv:1906.06725 (2019)
Baichoo, A.: Kaggle GSMArean (2017). https://www.kaggle.com/arwinneil/gsmarena-phone-dataset
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raut, A. et al. (2023). Introducing Multi-modality in Persuasive Task Oriented Virtual Sales Agent. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)