MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions

Li, Yi; Zhu, Qingmeng; He, Hao; Gu, Ziyin; Zheng, Changwen

doi:10.1007/978-981-99-8082-6_34

Yi Li^12,13,
Qingmeng Zhu^12,13,
Hao He¹³,
Ziyin Gu^12,13 &
…
Changwen Zheng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

1019 Accesses
1 Citations

Abstract

Multi-modal sentiment analysis (MSA) aims to utilize information from various modalities to improve the classification of emotions. Most existing studies employ attention mechanisms for modality fusion, overlooking the heterogeneity of different modalities. To address this issue, we propose an approach that leverages optimal transport for modality alignment and fusion, specifically focusing on distributional alignment. However, solely relying on the optimal transport module may result in a deficiency of intra-modal and inter-sample interactions. To tackle this deficiency, we introduce a double-modal contrastive learning module. Specifically, we propose a model MOC (Multi-modal sentiment analysis via Optimal transport and Contrastive interactions), which integrates optimal transport and contrastive learning. Through empirical comparisons on three established multi-modal sentiment analysis datasets, we demonstrate that our approach achieves state-of-the-art performance. Additionally, we conduct extended ablation studies to validate the effectiveness of each proposed module.

Y. Li and Q. Zhu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event (2020)
Google Scholar
Chen, X., He, K.: Exploring simple siamese representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 15750–15758. Computer Vision Foundation/IEEE (2021)
Google Scholar
Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (2013)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 (2021)
Google Scholar
Han, W., Chen, H., Poria, S.: Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: EMNLP (2021)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Google Scholar
Huang, P., Patrick, M., Hu, J., Neubig, G., Metze, F., Hauptmann, A.: Multilingual multimodal pre-training for zero-shot cross-lingual transfer of vision-language models. In: NAACL-HLT (2021)
Google Scholar
Kolouri, S., Naderializadeh, N., Rohde, G.K., Hoffmann, H.: Wasserstein embedding for graph learning. In: ICLR (2021)
Google Scholar
Li, J., et al.: Metamask: revisiting dimensional confounder for self-supervised learning. In: ICML (2022)
Google Scholar
Liu, X., et al.: Self-supervised learning: generative or contrastive. CoRR (2021)
Google Scholar
Lv, F., Chen, X., Huang, Y., Duan, L., Lin, G.: Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences. In: CVPR (2021)
Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML (2011)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Google Scholar
Solomon, J., et al.: Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. 34(4), 66:1–66:11 (2015)
Google Scholar
Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019)
Google Scholar
Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., Ma, H.: CLEAR: contrastive learning for sentence representation. CoRR abs/2012.15466 (2020)
Google Scholar
Xu, J., Zhou, H., Gan, C., Zheng, Z., Li, L.: Vocabulary learning via optimal transport for neural machine translation. In: ACL/IJCNLP (2021)
Google Scholar
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: SIGIR (2018)
Google Scholar
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: a contrastive framework for self-supervised sentence representation transfer. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) ACL/IJCNLP (2021)
Google Scholar
Yang, X.: Multimodal sentiment detection based on multi-channel graph neural networks. In: ACL (2021)
Google Scholar
Yang, X., Feng, S., Wang, D., Zhang, Y.: Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans. Multim. 23, 4014–4026 (2021)
Article Google Scholar
Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: AAAI (2021)
Google Scholar
Yuan, X., et al.: Multimodal contrastive training for visual representation learning. In: CVPR (2021)
Google Scholar
Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.: Multi-attention recurrent network for human communication comprehension. In: AAAI (2018)
Google Scholar
Li, J., Qiang, W., Zheng, C., Su, B., Xiong, H.: Metaug: contrastive learning via meta feature augmentation. In: International Conference on Machine Learning, pp. 12964–12978. PMLR (2022)
Google Scholar
Chen, Y.: Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 207–212 (2016)
Google Scholar
Li, J., et al.: Modeling multiple views via implicitly preserving global consistency and local complementarity. IEEE Trans. Knowl. Data Eng. (2022)
Google Scholar
Qiang, W., Li, J., Zheng, C., Su, B., Xiong, H.: Interventional contrastive learning with meta semantic regularizer. In: International Conference on Machine Learning, pp. 18018–18030. PMLR (2022)
Google Scholar
Cao, Z., Xu, Q., Yang, Z., He, Y., Cao, X., Huang, Q.: Otkge: multi-modal knowledge graph embeddings via optimal transport. Adv. Neural. Inf. Process. Syst. 35, 39090–39102 (2022)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)
Google Scholar
Li, Z., Xu, B., Zhu, C., Zhao, T.: CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection. arXiv preprint arXiv:2204.05515 (2022)
Xu, N.: Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 152–154. IEEE (2017)
Google Scholar
Xu, N., Mao, W.: Multisentinet: a deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2399–2402 (2017)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)

Download references

Acknowledgements

This work is supported by the National Key R &D Program of China (2022YFC3103800) and National Natural Science Foundation of China (62101552).

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Yi Li, Qingmeng Zhu & Ziyin Gu
Institute of Software Chinese Academy of Sciences, Beijing, China
Yi Li, Qingmeng Zhu, Hao He, Ziyin Gu & Changwen Zheng

Authors

Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Qingmeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hao He
View author publications
You can also search for this author in PubMed Google Scholar
Ziyin Gu
View author publications
You can also search for this author in PubMed Google Scholar
Changwen Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao He .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Zhu, Q., He, H., Gu, Z., Zheng, C. (2024). MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_34

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_34
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions