Multi-modal sentiment classification based on graph neural network and multi-head cross-attention mechanism for education emotion analysis

Zhiguang  Liu; Guoyin  Hao; Fengshuai  Li; Xiaoqing  He; Yuanheng  Zhang

doi:10.6180/jase.202506_28(6).0002

Multi-modal sentiment classification based on graph neural network and multi-head cross-attention mechanism for education emotion analysis

Electrical Engineering Civil Engineering

Zhiguang Liu¹This email address is being protected from spambots. You need JavaScript enabled to view it., Guoyin Hao², Fengshuai Li³, Xiaoqing He¹, and Yuanheng Zhang¹

¹School of Electronics and Electrical Engineering, Zhengzhou University of Science and Technology, Zhengzhou 450064 China

²School of Music and Dance, Zhengzhou University of Science and Technology, Zhengzhou 450064 China

³College of Civil and Architectural Engineering, Zhengzhou University of Science and Technology, Zhengzhou 450064 China

Received: March 16, 2024
Accepted: April 14, 2024
Publication Date: August 3, 2024

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202506_28(6).0002

One of the key tasks of multi-modal sentiment classification is to accurately extract and fuse complementary information from two different modes of text and vision in order to detect the affective tendency of the mentioned aspect words in the text. Most of the existing methods only use a single context information combined with picture information for analysis, and there are problems such as insensitive recognition of the correlation between aspects, context information and visual information, and insufficient local extraction of aspects related information in vision. In addition, when the feature fusion is carried out, some modal information is not sufficient, which leads to the general fusion effect. In order to fully carry out fine-grained information interaction between multiple modes, a multi-modal sentiment classification based on graph neural network and multi-head cross-attention mechanism for education emotion analysis is proposed in this paper. Firstly, cross-attention is used to obtain the global representation of aspect-oriented objects in text and images. Then a multi-modal interaction graph is established to connect the local and global representation nodes of different modes. Finally, the graph attention network is used to fully integrate the features in the two granularity. Numerous experiments on popular multi-modal sentiment analysis datasets demonstrate the advantages of the proposed framework in this paper compared to state-of-the-art methods.

Keywords: Multi-modal sentiment classification, graph neural network, multi-head cross-attention mechanism, education emotion analysis

[1] S. Amal, L. Safarnejad, J. A. Omiye, I. Ghanzouri, J. H. Cabot, and E. G. Ross, (2022) “Use of multi-modal data and machine learning to improve cardiovascular disease care" Frontiers in cardiovascular medicine 9: 840262. DOI: 10.3389/fcvm.2022.840262.
[2] N. D. Nguyen, J. Huang, and D. Wang, (2022) “A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data" Nature computational science 2(1): 38–46. DOI: 10.1038/s43588-021-00185-x.
[3] S. Yin, (2023) “Object Detection Based on Deep Learning: A Brief Review" IJLAI Transactions on Science and Engineering 1(02): 1–6.
[4] Z. Li, E. Zhu, M. Jin, C. Fan, H. He, T. Cai, and J. Li, (2022) “Dynamic domain adaptation for class-aware cross-subject and cross-session EEG emotion recognition" IEEE Journal of Biomedical and Health Informatics 26(12): 5964–5973. DOI: 10.1109/JBHI.2022.3210158.
[5] N. Kumari, S. Anwar, and V. Bhattacharjee, (2022) “Time series-dependent feature of EEG signals for improved visually evoked emotion classification using EmotionCapsNet" Neural Computing and Applications 34(16): 13291–13303. DOI: 10.1007/s00521-022-06942-x.
[6] S. Khan, A. Rizwan, A. N. Khan, M. Ali, R. Ahmed, and D. H. Kim, (2023) “A multi-perspective revisit to the optimization methods of Neural Architecture Search and Hyper-parameter optimization for non-federated and federated learning environments" Computers and Electrical Engineering 110: 108867. DOI: 10.1016/j.compeleceng.2023.108867.
[7] S. Yin, H. Li, A. A. Laghari, T. R. Gadekallu, G. A. Sampedro, and A. Almadhor, (2024) “An Anomaly Detection Model Based On Deep Auto-Encoder and Capsule Graph Convolution via Sparrow Search Algorithm in 6G Internet-of-Everything" IEEE Internet of Things Journal: DOI: 10.1109/JIOT.2024.3353337.
[8] P. Siepmann, D. Rumlich, F. Matz, and R. Römhild, (2023) “Attention to diversity in German CLIL classrooms: Multi-perspective research on students’ and teachers’ perceptions" International Journal of Bilingual Education and Bilingualism 26(9): 1080–1096. DOI: 10.1080/13670050.2021.1981821.
[9] Y. Zhao, H. Li, and S. Yin, (2022) “A multi-channel character relationship classification model based on attention mechanism" Int. J. Math. Sci. Comput.(IJMSC) 8: 28–36. DOI: 10.5815/ijmsc.2022.01.03.
[10] M. Bilal and A. A. Almazroi, (2023) “Effectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviews" Electronic Commerce Research 23(4): 2737–2757. DOI: 10.1007/s10660-022-09560-w.
[11] Y. Djenouri, A. Belhadi, G. Srivastava, and J. C.-W. Lin, (2023) “Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting" Future Generation Computer Systems 139: 100–108. DOI: 10.1016/j.future.2022.09.018.
[12] S. Xiao, D. Zhu, C. Tang, and Z. Huang, (2023) “Combining Graph Contrastive Embedding and Multi-head Cross-Attention Transfer for Cross-Domain Recommendation" Data Science and Engineering 8(3): 247–262. DOI: 10.1007/s41019-023-00226-7.
[13] W. Yu, H. Xu, Z. Yuan, and J. Wu. “Learning modalityspecific representations with self-supervised multitask learning for multimodal sentiment analysis”. In: Proceedings of the AAAI conference on artificial intelligence. 35. 12. 2021, 10790–10797. DOI: 10.1609/aaai.v35i12.17289.
[14] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L.-P. Morency. “Memory fusion network for multi-view sequential learning”. In: Proceedings of the AAAI conference on artificial intelligence. 32. 1. 2018. DOI: 10.1609/aaai.v32i1.12021.
[15] A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency. “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph”. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2236–2246. DOI: 10.18653/v1/P18-1208.
[16] D. Yang, H. Kuang, S. Huang, and L. Zhang. “Learning modality-specific and-agnostic representations for asynchronous multimodal language sequences”. In: Proceedings of the 30th ACM International Conference on Multimedia. 2022, 1708–1717. DOI: 10.1145/3503161.3547755.
[17] D. Hazarika, R. Zimmermann, and S. Poria. “Misa: Modality-invariant and-specific representations for multimodal sentiment analysis”. In: Proceedings of the 28th ACM international conference on multimedia. 2020, 1122–1131. DOI: 10.1145/3394171.3413678.
[18] W. Yu, H. Xu, Z. Yuan, and J. Wu. “Learning modalityspecific representations with self-supervised multitask learning for multimodal sentiment analysis”. In: Proceedings of the AAAI conference on artificial intelligence. 35. 12. 2021, 10790–10797. DOI: 10.1609/aaai.v35i12.17289.
[19] M. Xu, F. Liang, X. Su, and C. Fang, (2022) “Cmjrt: Cross-modal joint representation transformer for multimodal sentiment analysis" IEEE Access 10: 131671–131679. DOI: 10.1109/ACCESS.2022.3219200.
[20] L. Sun, Z. Lian, B. Liu, and J. Tao, (2023) “Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis" IEEE Transactions on Affective Computing: DOI: 10.1109/TAFFC.2023.3274829.