Abstract
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users’ requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets’ semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In practice the model is only fed the 5 previous turns to limit the number of tokens.
- 3.
Obtained with the model at https://huggingface.co/thenlper/gte-large.
References
Banarescu, L., et al.: Abstract meaning representation for sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186 (2013)
Béchet, F., Raymond, C.: Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora. In: InterSpeech (2019)
Bonial, C., et al.: Dialogue-AMR: abstract meaning representation for dialogue. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France (2020)
Budzianowski, P., et al.: Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018). https://api.semanticscholar.org/CorpusID:52897360
Cai, S., Knight, K.: Smatch: an evaluation metric for semantic feature structures. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, pp. 748–752. Association for Computational Linguistics (2013). https://www.aclweb.org/anthology/P13-2131
Chen, X., Ghoshal, A., Mehdad, Y., Zettlemoyer, L., Gupta, S.: Low-resource domain adaptation for compositional task-oriented semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2020)
Devillers, L., et al.: The French media/evalda project: the evaluation of the understanding capability of spoken language dialogue systems. In: International Conference on Language Resources and Evaluation (2004)
Faruqui, M., Hakkani-Tür, D.: Revisiting the boundary between ASR and NLU in the age of conversational dialog systems. Comput. Linguist. 48(1), 221–232 (2022)
Geng, S., Josifoski, M., Peyrard, M., West, R.: Grammar-constrained decoding for structured NLP tasks without finetuning. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2023)
Georgila, K., Lemon, O., Henderson, J., Moore, J.D.: Automatic annotation of context and speech acts for dialogue corpora. Nat. Lang. Eng. 15(3), 315–353 (2009)
Gong, T., Belanich, J., Somandepalli, K., Nagrani, A., Eoff, B., Jou, B.: Lanser: language-model supported speech emotion recognition. arXiv preprint arXiv:2309.03978 (2023)
Hu, E.J., et al.: LoRA: Low-rank adaptation of large language models. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Hu, X., et al.: Dialogue meaning representation for task-oriented dialogue systems. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics (2022)
Jiang, A.Q., et al.: Mistral 7B (2023)
Kim, H., Mitra, K., Chen, R.L., Rahman, S., Zhang, D.: Meganno+: a human-LLM collaborative annotation system. arXiv preprint arXiv:2402.18050 (2024)
Liao, X., Zhao, Z.: Unsupervised approaches for textual semantic annotation, a survey. ACM Comput. Surv. (CSUR) 52(4), 1–45 (2019)
Mdhaffar, S., et al.: Impact analysis of the use of speech and language models pretrained by self-supersivion for spoken language understanding. In: International Conference on Language Resources and Evaluation (LREC) (2022)
Moslem, Y., Haque, R., Way, A.: Fine-tuning large language models for adaptive machine translation. arXiv preprint arXiv:2312.12740 (2023)
Savelka, J.: Unlocking practical applications in legal domain: evaluation of GPT for zero-shot semantic annotation of legal texts. In: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pp. 447–451 (2023)
Si, S., et al.: SpokenWOZ: a large-scale speech-text benchmark for spoken task-oriented dialogue agents. In: NeurIPS Datasets and Benchmarks Track (2023). https://openreview.net/forum?id=viktK3nO5b
Soltau, H., Shafran, I., Wang, M., Rastogi, A., Han, W., Cao, Y.: DSTC-11: speech aware task-oriented dialog modeling track. In: Proceedings of The Eleventh Dialog System Technology Challenge. Association for Computational Linguistics (2023)
Tomasello, P., et al.: Stop: a dataset for spoken task oriented semantic parsing. In: 2022 IEEE Spoken Language Technology Workshop (SLT) (2022). https://doi.org/10.1109/SLT54892.2023.10022703
Tur, G., De Mori, R.: SLU in commercial and research spoken dialogue systems. Wiley Telecom (2011). https://doi.org/10.1002/9781119992691.ch7
Xu, Z., Jain, S., Kankanhalli, M.: Hallucination is inevitable: an innate limitation of large language models. arXiv preprint arXiv:2401.11817 (2024)
Yang, X., Song, Z., King, I., Xu, Z.: A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 35(9), 8934–8954 (2022)
Zhang, Y., Yang, R., Xu, X., Xiao, J., Shen, J., Han, J.: Teleclass: taxonomy enrichment and LLM-enhanced hierarchical text classification with minimal supervision. arXiv preprint arXiv:2403.00165 (2024)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Druart, L., Vielzeuf, V., Estève, Y. (2024). Investigating Low-Cost LLM Annotation for Spoken Dialogue Understanding Datasets. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15049. Springer, Cham. https://doi.org/10.1007/978-3-031-70566-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-70566-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70565-6
Online ISBN: 978-3-031-70566-3
eBook Packages: Computer ScienceComputer Science (R0)