Deep Neural Networks for Comprehensive Multimodal Emotion Recognition

Ashutosh Tiwari, Satyam Kumar, Tushar Mehrotra, Rajneesh Kumar Singh
{"title":"Deep Neural Networks for Comprehensive Multimodal Emotion Recognition","authors":"Ashutosh Tiwari, Satyam Kumar, Tushar Mehrotra, Rajneesh Kumar Singh","doi":"10.1109/ICDT57929.2023.10150945","DOIUrl":null,"url":null,"abstract":"Emotions may be expressed in many different ways, making automatic affect recognition challenging. Several industries may benefit from this technology, including audiovisual search and human- machine interface. Recently, neural networks have been developed to assess emotional states with unprecedented accuracy. We provide an approach to emotion identification that makes use of both visual and aural signals. It’s crucial to isolate relevant features in order to accurately represent the nuanced emotions conveyed in a wide range of speech patterns. We do this by using a Convolutional Neural Network (CNN) to parse the audio track for feature extraction and a 50-layer deep ResNet to process the visual track. Machine learning algorithms, in addition to needing to extract the characteristics, should also be robust against outliers and reflective of their surroundings. To solve this problem, LSTM networks are used. We train the system from the ground up, using the RECOLA datasets from the AVEC 2016 emotion recognition research challenge, and we demonstrate that our method is superior to prior approaches that relied on manually constructed aural and visual cues for identifying genuine emotional states. It has been demonstrated that the visual modality predicts valence more accurately than arousal. The best results for the valence dimension from the RECOLA dataset are shown in Table III below.","PeriodicalId":266681,"journal":{"name":"2023 International Conference on Disruptive Technologies (ICDT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Disruptive Technologies (ICDT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDT57929.2023.10150945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Emotions may be expressed in many different ways, making automatic affect recognition challenging. Several industries may benefit from this technology, including audiovisual search and human- machine interface. Recently, neural networks have been developed to assess emotional states with unprecedented accuracy. We provide an approach to emotion identification that makes use of both visual and aural signals. It’s crucial to isolate relevant features in order to accurately represent the nuanced emotions conveyed in a wide range of speech patterns. We do this by using a Convolutional Neural Network (CNN) to parse the audio track for feature extraction and a 50-layer deep ResNet to process the visual track. Machine learning algorithms, in addition to needing to extract the characteristics, should also be robust against outliers and reflective of their surroundings. To solve this problem, LSTM networks are used. We train the system from the ground up, using the RECOLA datasets from the AVEC 2016 emotion recognition research challenge, and we demonstrate that our method is superior to prior approaches that relied on manually constructed aural and visual cues for identifying genuine emotional states. It has been demonstrated that the visual modality predicts valence more accurately than arousal. The best results for the valence dimension from the RECOLA dataset are shown in Table III below.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
综合多模态情绪识别的深度神经网络
情绪可能以许多不同的方式表达,这使得自动情感识别具有挑战性。包括视听搜索和人机界面在内的一些行业可能会从这项技术中受益。最近,神经网络已经发展到以前所未有的准确性评估情绪状态。我们提供了一种利用视觉和听觉信号的情感识别方法。为了准确地表达各种语言模式中微妙的情感,分离出相关的特征是至关重要的。我们使用卷积神经网络(CNN)来解析音轨进行特征提取,并使用50层深度ResNet来处理视觉轨道。机器学习算法除了需要提取特征外,还应该对异常值具有鲁棒性,并反映其周围环境。为了解决这个问题,使用了LSTM网络。我们从头开始训练系统,使用来自AVEC 2016情绪识别研究挑战的RECOLA数据集,我们证明了我们的方法优于之前依赖于手动构建的听觉和视觉线索来识别真实情绪状态的方法。已经证明,视觉模态比唤醒更准确地预测效价。来自RECOLA数据集的价维的最佳结果如下表III所示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Best Ways Using AI in Impacting Success on MBA Graduates A Mechanism Used to Predict Diet Consumption and Stress Management in Humans Using IoMT ICDT 2023 Cover Page Machine Learning-Based Approach for Hand Gesture Recognition A Smart Innovation of Business Intelligence Based Analytical Model by Using POS Based Deep Learning Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1