{"title":"利用基于小帧的随机配置网络进行多模态图学习,实现对话中的情感识别","authors":"","doi":"10.1016/j.ins.2024.121393","DOIUrl":null,"url":null,"abstract":"<div><p>The multimodal emotion recognition in conversation (ERC) task presents significant challenges due to the complexity of relationships and the difficulty in achieving semantic fusion across various modalities. Graph learning, recognized for its capability to capture intricate data relations, has been suggested as a solution for ERC. However, existing graph-based ERC models often fail to address the fundamental limitations of graph learning, such as assuming pairwise interactions and neglecting high-frequency signals in semantically-poor modalities, which leads to an over-reliance on text. While these issues might be negligible in other applications, they are crucial for the success of ERC. In this paper, we propose a novel framework for ERC, namely multimodal graph learning with framelet-based stochastic configuration networks (i.e., Frame-SCN). Specifically, framelet-based stochastic configuration networks, which employ 2D directional Haar framelets to extract both low- and high-pass components, are introduced to learn the unified semantic embeddings from multimodal data, mitigating prediction biases caused by an excessive reliance on text without introducing an unnecessarily large number of parameters. Also, we develop a modality-aware information extraction module that is able to extract both general and sensitive information in a multimodal semantic space, alleviating potential noise issues. Extensive experiment results demonstrate that our proposed Frame-SCN outperforms many state-of-the-art approaches on two widely used multimodal ERC datasets.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal graph learning with framelet-based stochastic configuration networks for emotion recognition in conversation\",\"authors\":\"\",\"doi\":\"10.1016/j.ins.2024.121393\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The multimodal emotion recognition in conversation (ERC) task presents significant challenges due to the complexity of relationships and the difficulty in achieving semantic fusion across various modalities. Graph learning, recognized for its capability to capture intricate data relations, has been suggested as a solution for ERC. However, existing graph-based ERC models often fail to address the fundamental limitations of graph learning, such as assuming pairwise interactions and neglecting high-frequency signals in semantically-poor modalities, which leads to an over-reliance on text. While these issues might be negligible in other applications, they are crucial for the success of ERC. In this paper, we propose a novel framework for ERC, namely multimodal graph learning with framelet-based stochastic configuration networks (i.e., Frame-SCN). Specifically, framelet-based stochastic configuration networks, which employ 2D directional Haar framelets to extract both low- and high-pass components, are introduced to learn the unified semantic embeddings from multimodal data, mitigating prediction biases caused by an excessive reliance on text without introducing an unnecessarily large number of parameters. Also, we develop a modality-aware information extraction module that is able to extract both general and sensitive information in a multimodal semantic space, alleviating potential noise issues. Extensive experiment results demonstrate that our proposed Frame-SCN outperforms many state-of-the-art approaches on two widely used multimodal ERC datasets.</p></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524013070\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524013070","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Multimodal graph learning with framelet-based stochastic configuration networks for emotion recognition in conversation
The multimodal emotion recognition in conversation (ERC) task presents significant challenges due to the complexity of relationships and the difficulty in achieving semantic fusion across various modalities. Graph learning, recognized for its capability to capture intricate data relations, has been suggested as a solution for ERC. However, existing graph-based ERC models often fail to address the fundamental limitations of graph learning, such as assuming pairwise interactions and neglecting high-frequency signals in semantically-poor modalities, which leads to an over-reliance on text. While these issues might be negligible in other applications, they are crucial for the success of ERC. In this paper, we propose a novel framework for ERC, namely multimodal graph learning with framelet-based stochastic configuration networks (i.e., Frame-SCN). Specifically, framelet-based stochastic configuration networks, which employ 2D directional Haar framelets to extract both low- and high-pass components, are introduced to learn the unified semantic embeddings from multimodal data, mitigating prediction biases caused by an excessive reliance on text without introducing an unnecessarily large number of parameters. Also, we develop a modality-aware information extraction module that is able to extract both general and sensitive information in a multimodal semantic space, alleviating potential noise issues. Extensive experiment results demonstrate that our proposed Frame-SCN outperforms many state-of-the-art approaches on two widely used multimodal ERC datasets.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.