Contextual modulation of affect: Comparing humans and deep neural networks

Companion Publication of the 2022 International Conference on Multimodal Interaction Pub Date : 2022-11-07 DOI:10.1145/3536220.3558036

Soomin Shin, Doo-Hyun Kim, C. Wallraven

{"title":"Contextual modulation of affect: Comparing humans and deep neural networks","authors":"Soomin Shin, Doo-Hyun Kim, C. Wallraven","doi":"10.1145/3536220.3558036","DOIUrl":null,"url":null,"abstract":"When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information on emotion processing in humans and two deep neural network (DNN) models. We used results from a human experiment in which two types of pictures were rated for valence and arousal: the first type depicted people expressing an emotion in a social context including other people; the second was a context-reduced version in which all information except for the target expressor was blurred out. The resulting human ratings of valence and arousal were systematically decreased in the context-reduced version, highlighting the importance of context. We then compared human ratings with those of two DNN models (one trained on face images only, and the other trained also on contextual information). Analyses of both categorical and the valence/arousal ratings showed that although there were some superficial similarities, both models failed to capture human rating patterns both in context-rich and context-reduced conditions. Our study emphasizes the importance of a more holistic, multi-modal training regime with richer human data to build better emotion-understanding systems in the area of affective computing.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information on emotion processing in humans and two deep neural network (DNN) models. We used results from a human experiment in which two types of pictures were rated for valence and arousal: the first type depicted people expressing an emotion in a social context including other people; the second was a context-reduced version in which all information except for the target expressor was blurred out. The resulting human ratings of valence and arousal were systematically decreased in the context-reduced version, highlighting the importance of context. We then compared human ratings with those of two DNN models (one trained on face images only, and the other trained also on contextual information). Analyses of both categorical and the valence/arousal ratings showed that although there were some superficial similarities, both models failed to capture human rating patterns both in context-rich and context-reduced conditions. Our study emphasizes the importance of a more holistic, multi-modal training regime with richer human data to build better emotion-understanding systems in the area of affective computing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

情感的情境调节:比较人类和深度神经网络

在推断情绪时，人类依赖于许多线索，不仅包括面部表情、身体姿势，还包括表达者外部的上下文信息。本研究的目的是比较这些情境信息对人类和两种深度神经网络(DNN)模型的情绪处理的影响。我们使用了一项人体实验的结果，在该实验中，我们对两种类型的图片进行了效价和唤醒评级:第一种类型描绘了人们在包括其他人在内的社会环境中表达一种情感;第二种是上下文简化的版本，除了目标表达者之外的所有信息都被模糊化了。在情境减少的版本中，人类对效价和觉醒的评分系统地降低了，突出了情境的重要性。然后，我们将人类评分与两个DNN模型(一个只训练面部图像，另一个也训练上下文信息)的评分进行比较。对分类和效价/唤醒评级的分析表明，尽管表面上有一些相似之处，但这两个模型都未能捕捉到上下文丰富和上下文减少条件下的人类评级模式。我们的研究强调的重要性更加全面,综合训练与丰富人类数据构建更好的情绪理解系统领域的情感计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Companion Publication of the 2022 International Conference on Multimodal Interaction

自引率

0.00%

发文量