Contextual modulation of affect: Comparing humans and deep neural networks

Soomin Shin, Doo-Hyun Kim, C. Wallraven
{"title":"Contextual modulation of affect: Comparing humans and deep neural networks","authors":"Soomin Shin, Doo-Hyun Kim, C. Wallraven","doi":"10.1145/3536220.3558036","DOIUrl":null,"url":null,"abstract":"When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information on emotion processing in humans and two deep neural network (DNN) models. We used results from a human experiment in which two types of pictures were rated for valence and arousal: the first type depicted people expressing an emotion in a social context including other people; the second was a context-reduced version in which all information except for the target expressor was blurred out. The resulting human ratings of valence and arousal were systematically decreased in the context-reduced version, highlighting the importance of context. We then compared human ratings with those of two DNN models (one trained on face images only, and the other trained also on contextual information). Analyses of both categorical and the valence/arousal ratings showed that although there were some superficial similarities, both models failed to capture human rating patterns both in context-rich and context-reduced conditions. Our study emphasizes the importance of a more holistic, multi-modal training regime with richer human data to build better emotion-understanding systems in the area of affective computing.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information on emotion processing in humans and two deep neural network (DNN) models. We used results from a human experiment in which two types of pictures were rated for valence and arousal: the first type depicted people expressing an emotion in a social context including other people; the second was a context-reduced version in which all information except for the target expressor was blurred out. The resulting human ratings of valence and arousal were systematically decreased in the context-reduced version, highlighting the importance of context. We then compared human ratings with those of two DNN models (one trained on face images only, and the other trained also on contextual information). Analyses of both categorical and the valence/arousal ratings showed that although there were some superficial similarities, both models failed to capture human rating patterns both in context-rich and context-reduced conditions. Our study emphasizes the importance of a more holistic, multi-modal training regime with richer human data to build better emotion-understanding systems in the area of affective computing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
情感的情境调节:比较人类和深度神经网络
在推断情绪时,人类依赖于许多线索,不仅包括面部表情、身体姿势,还包括表达者外部的上下文信息。本研究的目的是比较这些情境信息对人类和两种深度神经网络(DNN)模型的情绪处理的影响。我们使用了一项人体实验的结果,在该实验中,我们对两种类型的图片进行了效价和唤醒评级:第一种类型描绘了人们在包括其他人在内的社会环境中表达一种情感;第二种是上下文简化的版本,除了目标表达者之外的所有信息都被模糊化了。在情境减少的版本中,人类对效价和觉醒的评分系统地降低了,突出了情境的重要性。然后,我们将人类评分与两个DNN模型(一个只训练面部图像,另一个也训练上下文信息)的评分进行比较。对分类和效价/唤醒评级的分析表明,尽管表面上有一些相似之处,但这两个模型都未能捕捉到上下文丰富和上下文减少条件下的人类评级模式。我们的研究强调的重要性更加全面,综合训练与丰富人类数据构建更好的情绪理解系统领域的情感计算。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigating Transformer Encoders and Fusion Strategies for Speech Emotion Recognition in Emergency Call Center Conversations. Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings An Emotional Respiration Speech Dataset Can you tell that I’m confused? An overhearer study for German backchannels by an embodied agent
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1