{"title":"用于多模态情感识别的具有联合交叉注意力的密集图卷积网络","authors":"Cheng Cheng;Wenzhe Liu;Lin Feng;Ziyu Jia","doi":"10.1109/TCSS.2024.3412074","DOIUrl":null,"url":null,"abstract":"Multimodal emotion recognition (MER) has attracted much attention since it can leverage consistency and complementary relationships across multiple modalities. However, previous studies mostly focused on the complementary information of multimodal signals, neglecting the consistency information of multimodal signals and the topological structure of each modality. To this end, we propose a dense graph convolution network (DGC) equipped with a joint cross attention (JCA), named DG-JCA, for MER. The main advantage of the DG-JCA model is that it simultaneously integrates the spatial topology, consistency, and complementarity of multimodal data into a unified network framework. Meanwhile, DG-JCA extends the graph convolution network (GCN) via a dense connection strategy and introduces cross attention to joint model well-learned features from multiple modalities. Specifically, we first build a topology graph for each modality and then extract neighborhood features of different modalities using DGC driven by dense connections with multiple layers. Next, JCA performs cross-attention fusion in intra- and intermodality based on each modality's characteristics while balancing the contributions of various modalities’ features. Finally, subject-dependent and subject-independent experiments on the DEAP and SEED-IV datasets are conducted to evaluate the proposed method. Abundant experimental results show that the proposed model can effectively extract and fuse multimodal features and achieve outstanding performance in comparison with some state-of-the-art approaches.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 5","pages":"6672-6683"},"PeriodicalIF":4.5000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dense Graph Convolutional With Joint Cross-Attention Network for Multimodal Emotion Recognition\",\"authors\":\"Cheng Cheng;Wenzhe Liu;Lin Feng;Ziyu Jia\",\"doi\":\"10.1109/TCSS.2024.3412074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal emotion recognition (MER) has attracted much attention since it can leverage consistency and complementary relationships across multiple modalities. However, previous studies mostly focused on the complementary information of multimodal signals, neglecting the consistency information of multimodal signals and the topological structure of each modality. To this end, we propose a dense graph convolution network (DGC) equipped with a joint cross attention (JCA), named DG-JCA, for MER. The main advantage of the DG-JCA model is that it simultaneously integrates the spatial topology, consistency, and complementarity of multimodal data into a unified network framework. Meanwhile, DG-JCA extends the graph convolution network (GCN) via a dense connection strategy and introduces cross attention to joint model well-learned features from multiple modalities. Specifically, we first build a topology graph for each modality and then extract neighborhood features of different modalities using DGC driven by dense connections with multiple layers. Next, JCA performs cross-attention fusion in intra- and intermodality based on each modality's characteristics while balancing the contributions of various modalities’ features. Finally, subject-dependent and subject-independent experiments on the DEAP and SEED-IV datasets are conducted to evaluate the proposed method. Abundant experimental results show that the proposed model can effectively extract and fuse multimodal features and achieve outstanding performance in comparison with some state-of-the-art approaches.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"11 5\",\"pages\":\"6672-6683\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10586830/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10586830/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
Dense Graph Convolutional With Joint Cross-Attention Network for Multimodal Emotion Recognition
Multimodal emotion recognition (MER) has attracted much attention since it can leverage consistency and complementary relationships across multiple modalities. However, previous studies mostly focused on the complementary information of multimodal signals, neglecting the consistency information of multimodal signals and the topological structure of each modality. To this end, we propose a dense graph convolution network (DGC) equipped with a joint cross attention (JCA), named DG-JCA, for MER. The main advantage of the DG-JCA model is that it simultaneously integrates the spatial topology, consistency, and complementarity of multimodal data into a unified network framework. Meanwhile, DG-JCA extends the graph convolution network (GCN) via a dense connection strategy and introduces cross attention to joint model well-learned features from multiple modalities. Specifically, we first build a topology graph for each modality and then extract neighborhood features of different modalities using DGC driven by dense connections with multiple layers. Next, JCA performs cross-attention fusion in intra- and intermodality based on each modality's characteristics while balancing the contributions of various modalities’ features. Finally, subject-dependent and subject-independent experiments on the DEAP and SEED-IV datasets are conducted to evaluate the proposed method. Abundant experimental results show that the proposed model can effectively extract and fuse multimodal features and achieve outstanding performance in comparison with some state-of-the-art approaches.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.