利用多模态信息进行情感分析的关键技术综述

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2024-06-01 DOI:10.1007/s12559-024-10287-z

Xianxun Zhu, Chaopeng Guo, Heyang Feng, Yao Huang, Yichen Feng, Xiangyang Wang, Rui Wang

{"title":"利用多模态信息进行情感分析的关键技术综述","authors":"Xianxun Zhu, Chaopeng Guo, Heyang Feng, Yao Huang, Yichen Feng, Xiangyang Wang, Rui Wang","doi":"10.1007/s12559-024-10287-z","DOIUrl":null,"url":null,"abstract":"<p>Emotion analysis, an integral aspect of human–machine interactions, has witnessed significant advancements in recent years. With the rise of multimodal data sources such as speech, text, and images, there is a profound need for a comprehensive review of pivotal elements within this domain. Our paper delves deep into the realm of emotion analysis, examining multimodal data sources encompassing speech, text, images, and physiological signals. We provide a curated overview of relevant literature, academic forums, and competitions. Emphasis is laid on dissecting unimodal processing methods, including preprocessing, feature extraction, and tools across speech, text, images, and physiological signals. We further discuss the nuances of multimodal data fusion techniques, spotlighting early, late, model, and hybrid fusion strategies. Key findings indicate the essentiality of analyzing emotions across multiple modalities. Detailed discussions on emotion elicitation, expression, and representation models are presented. Moreover, we uncover challenges such as dataset creation, modality synchronization, model efficiency, limited data scenarios, cross-domain applicability, and the handling of missing modalities. Practical solutions and suggestions are provided to address these challenges. The realm of multimodal emotion analysis is vast, with numerous applications ranging from driver sentiment detection to medical evaluations. Our comprehensive review serves as a valuable resource for both scholars and industry professionals. It not only sheds light on the current state of research but also highlights potential directions for future innovations. The insights garnered from this paper are expected to pave the way for subsequent advancements in deep multimodal emotion analysis tailored for real-world deployments.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"95 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Review of Key Technologies for Emotion Analysis Using Multimodal Information\",\"authors\":\"Xianxun Zhu, Chaopeng Guo, Heyang Feng, Yao Huang, Yichen Feng, Xiangyang Wang, Rui Wang\",\"doi\":\"10.1007/s12559-024-10287-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Emotion analysis, an integral aspect of human–machine interactions, has witnessed significant advancements in recent years. With the rise of multimodal data sources such as speech, text, and images, there is a profound need for a comprehensive review of pivotal elements within this domain. Our paper delves deep into the realm of emotion analysis, examining multimodal data sources encompassing speech, text, images, and physiological signals. We provide a curated overview of relevant literature, academic forums, and competitions. Emphasis is laid on dissecting unimodal processing methods, including preprocessing, feature extraction, and tools across speech, text, images, and physiological signals. We further discuss the nuances of multimodal data fusion techniques, spotlighting early, late, model, and hybrid fusion strategies. Key findings indicate the essentiality of analyzing emotions across multiple modalities. Detailed discussions on emotion elicitation, expression, and representation models are presented. Moreover, we uncover challenges such as dataset creation, modality synchronization, model efficiency, limited data scenarios, cross-domain applicability, and the handling of missing modalities. Practical solutions and suggestions are provided to address these challenges. The realm of multimodal emotion analysis is vast, with numerous applications ranging from driver sentiment detection to medical evaluations. Our comprehensive review serves as a valuable resource for both scholars and industry professionals. It not only sheds light on the current state of research but also highlights potential directions for future innovations. The insights garnered from this paper are expected to pave the way for subsequent advancements in deep multimodal emotion analysis tailored for real-world deployments.</p>\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":\"95 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-024-10287-z\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10287-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

情感分析是人机交互不可或缺的一个方面，近年来取得了长足的进步。随着语音、文本和图像等多模态数据源的兴起，我们亟需对这一领域的关键要素进行全面回顾。我们的论文深入情感分析领域，研究了包括语音、文本、图像和生理信号在内的多模态数据源。我们提供了相关文献、学术论坛和竞赛的策划概述。重点是剖析单模态处理方法，包括预处理、特征提取和跨语音、文本、图像和生理信号的工具。我们进一步讨论了多模态数据融合技术的细微差别，重点介绍了早期、后期、模型和混合融合策略。主要研究结果表明了通过多种模式分析情绪的重要性。我们详细讨论了情绪激发、表达和表现模型。此外，我们还揭示了诸如数据集创建、模态同步、模型效率、有限数据场景、跨领域适用性以及处理缺失模态等方面的挑战。针对这些挑战，我们提供了实用的解决方案和建议。多模态情感分析的领域十分广阔，从驾驶员情感检测到医疗评估等应用不胜枚举。我们的全面综述对学者和行业专业人士来说都是宝贵的资源。它不仅揭示了研究现状，还强调了未来创新的潜在方向。从本文中获得的真知灼见有望为后续针对现实世界部署的深度多模态情感分析的进步铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Review of Key Technologies for Emotion Analysis Using Multimodal Information

Emotion analysis, an integral aspect of human–machine interactions, has witnessed significant advancements in recent years. With the rise of multimodal data sources such as speech, text, and images, there is a profound need for a comprehensive review of pivotal elements within this domain. Our paper delves deep into the realm of emotion analysis, examining multimodal data sources encompassing speech, text, images, and physiological signals. We provide a curated overview of relevant literature, academic forums, and competitions. Emphasis is laid on dissecting unimodal processing methods, including preprocessing, feature extraction, and tools across speech, text, images, and physiological signals. We further discuss the nuances of multimodal data fusion techniques, spotlighting early, late, model, and hybrid fusion strategies. Key findings indicate the essentiality of analyzing emotions across multiple modalities. Detailed discussions on emotion elicitation, expression, and representation models are presented. Moreover, we uncover challenges such as dataset creation, modality synchronization, model efficiency, limited data scenarios, cross-domain applicability, and the handling of missing modalities. Practical solutions and suggestions are provided to address these challenges. The realm of multimodal emotion analysis is vast, with numerous applications ranging from driver sentiment detection to medical evaluations. Our comprehensive review serves as a valuable resource for both scholars and industry professionals. It not only sheds light on the current state of research but also highlights potential directions for future innovations. The insights garnered from this paper are expected to pave the way for subsequent advancements in deep multimodal emotion analysis tailored for real-world deployments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES

CiteScore

9.30

自引率

3.70%

发文量

116

审稿时长

>12 weeks

期刊介绍： Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.