Abstract
Multi-modal sentiment analysis (MSA) aims to utilize information from various modalities to improve the classification of emotions. Most existing studies employ attention mechanisms for modality fusion, overlooking the heterogeneity of different modalities. To address this issue, we propose an approach that leverages optimal transport for modality alignment and fusion, specifically focusing on distributional alignment. However, solely relying on the optimal transport module may result in a deficiency of intra-modal and inter-sample interactions. To tackle this deficiency, we introduce a double-modal contrastive learning module. Specifically, we propose a model MOC (Multi-modal sentiment analysis via Optimal transport and Contrastive interactions), which integrates optimal transport and contrastive learning. Through empirical comparisons on three established multi-modal sentiment analysis datasets, we demonstrate that our approach achieves state-of-the-art performance. Additionally, we conduct extended ablation studies to validate the effectiveness of each proposed module.
Y. Li and Q. Zhu—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event (2020)
Chen, X., He, K.: Exploring simple siamese representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 15750–15758. Computer Vision Foundation/IEEE (2021)
Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Knowl. Data Eng. (2021)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (2013)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 (2021)
Han, W., Chen, H., Poria, S.: Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: EMNLP (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Huang, P., Patrick, M., Hu, J., Neubig, G., Metze, F., Hauptmann, A.: Multilingual multimodal pre-training for zero-shot cross-lingual transfer of vision-language models. In: NAACL-HLT (2021)
Kolouri, S., Naderializadeh, N., Rohde, G.K., Hoffmann, H.: Wasserstein embedding for graph learning. In: ICLR (2021)
Li, J., et al.: Metamask: revisiting dimensional confounder for self-supervised learning. In: ICML (2022)
Liu, X., et al.: Self-supervised learning: generative or contrastive. CoRR (2021)
Lv, F., Chen, X., Huang, Y., Duan, L., Lin, G.: Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences. In: CVPR (2021)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML (2011)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Solomon, J., et al.: Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. 34(4), 66:1–66:11 (2015)
Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019)
Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., Ma, H.: CLEAR: contrastive learning for sentence representation. CoRR abs/2012.15466 (2020)
Xu, J., Zhou, H., Gan, C., Zheng, Z., Li, L.: Vocabulary learning via optimal transport for neural machine translation. In: ACL/IJCNLP (2021)
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: SIGIR (2018)
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: a contrastive framework for self-supervised sentence representation transfer. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) ACL/IJCNLP (2021)
Yang, X.: Multimodal sentiment detection based on multi-channel graph neural networks. In: ACL (2021)
Yang, X., Feng, S., Wang, D., Zhang, Y.: Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans. Multim. 23, 4014–4026 (2021)
Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: AAAI (2021)
Yuan, X., et al.: Multimodal contrastive training for visual representation learning. In: CVPR (2021)
Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.: Multi-attention recurrent network for human communication comprehension. In: AAAI (2018)
Li, J., Qiang, W., Zheng, C., Su, B., Xiong, H.: Metaug: contrastive learning via meta feature augmentation. In: International Conference on Machine Learning, pp. 12964–12978. PMLR (2022)
Chen, Y.: Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 207–212 (2016)
Li, J., et al.: Modeling multiple views via implicitly preserving global consistency and local complementarity. IEEE Trans. Knowl. Data Eng. (2022)
Qiang, W., Li, J., Zheng, C., Su, B., Xiong, H.: Interventional contrastive learning with meta semantic regularizer. In: International Conference on Machine Learning, pp. 18018–18030. PMLR (2022)
Cao, Z., Xu, Q., Yang, Z., He, Y., Cao, X., Huang, Q.: Otkge: multi-modal knowledge graph embeddings via optimal transport. Adv. Neural. Inf. Process. Syst. 35, 39090–39102 (2022)
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)
Li, Z., Xu, B., Zhu, C., Zhao, T.: CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection. arXiv preprint arXiv:2204.05515 (2022)
Xu, N.: Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 152–154. IEEE (2017)
Xu, N., Mao, W.: Multisentinet: a deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2399–2402 (2017)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)
Acknowledgements
This work is supported by the National Key R &D Program of China (2022YFC3103800) and National Natural Science Foundation of China (62101552).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, Y., Zhu, Q., He, H., Gu, Z., Zheng, C. (2024). MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_34
Download citation
DOI: https://doi.org/10.1007/978-981-99-8082-6_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)