Is She Truly Enjoying the Conversation?: Analysis of Physiological Signals toward Adaptive Dialogue Systems

Shun Katada, S. Okada, Yuki Hirano, Kazunori Komatani
{"title":"Is She Truly Enjoying the Conversation?: Analysis of Physiological Signals toward Adaptive Dialogue Systems","authors":"Shun Katada, S. Okada, Yuki Hirano, Kazunori Komatani","doi":"10.1145/3382507.3418844","DOIUrl":null,"url":null,"abstract":"In human-agent interactions, it is necessary for the systems to identify the current emotional state of the user to adapt their dialogue strategies. Nevertheless, this task is challenging because the current emotional states are not always expressed in a natural setting and change dynamically. Recent accumulated evidence has indicated the usefulness of physiological modalities to realize emotion recognition. However, the contribution of the time series physiological signals in human-agent interaction during a dialogue has not been extensively investigated. This paper presents a machine learning model based on physiological signals to estimate a user's sentiment at every exchange during a dialogue. Using a wearable sensing device, the time series physiological data including the electrodermal activity (EDA) and heart rate in addition to acoustic and visual information during a dialogue were collected. The sentiment labels were annotated by the participants themselves and by external human coders for each exchange consisting of a pair of system and participant utterances. The experimental results showed that a multimodal deep neural network (DNN) model combined with the EDA and visual features achieved an accuracy of 63.2%. In general, this task is challenging, as indicated by the accuracy of 63.0% attained by the external coders. The analysis of the sentiment estimation results for each individual indicated that the human coders often wrongly estimated the negative sentiment labels, and in this case, the performance of the DNN model was higher than that of the human coders. These results indicate that physiological signals can help in detecting the implicit aspects of negative sentiments, which are acoustically/visually indistinguishable.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3418844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

In human-agent interactions, it is necessary for the systems to identify the current emotional state of the user to adapt their dialogue strategies. Nevertheless, this task is challenging because the current emotional states are not always expressed in a natural setting and change dynamically. Recent accumulated evidence has indicated the usefulness of physiological modalities to realize emotion recognition. However, the contribution of the time series physiological signals in human-agent interaction during a dialogue has not been extensively investigated. This paper presents a machine learning model based on physiological signals to estimate a user's sentiment at every exchange during a dialogue. Using a wearable sensing device, the time series physiological data including the electrodermal activity (EDA) and heart rate in addition to acoustic and visual information during a dialogue were collected. The sentiment labels were annotated by the participants themselves and by external human coders for each exchange consisting of a pair of system and participant utterances. The experimental results showed that a multimodal deep neural network (DNN) model combined with the EDA and visual features achieved an accuracy of 63.2%. In general, this task is challenging, as indicated by the accuracy of 63.0% attained by the external coders. The analysis of the sentiment estimation results for each individual indicated that the human coders often wrongly estimated the negative sentiment labels, and in this case, the performance of the DNN model was higher than that of the human coders. These results indicate that physiological signals can help in detecting the implicit aspects of negative sentiments, which are acoustically/visually indistinguishable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
她真的喜欢你的谈话吗?自适应对话系统的生理信号分析
在人机交互中,系统有必要识别用户当前的情绪状态,以适应他们的对话策略。然而,这项任务是具有挑战性的,因为当前的情绪状态并不总是在自然环境中表达,并且是动态变化的。最近积累的证据表明,生理模式在实现情绪识别方面是有用的。然而,在对话过程中,时间序列生理信号在人机交互中的作用尚未得到广泛的研究。本文提出了一种基于生理信号的机器学习模型,用于估计用户在对话过程中的每次交流中的情绪。利用可穿戴传感装置,采集对话过程中的时间序列生理数据,包括皮电活动(EDA)和心率,以及声音和视觉信息。情感标签由参与者自己和外部人类编码器对每个由一对系统和参与者话语组成的交换进行注释。实验结果表明,结合EDA和视觉特征的多模态深度神经网络(DNN)模型的准确率达到了63.2%。一般来说,这项任务是具有挑战性的,正如外部编码器达到的63.0%的准确性所表明的那样。对每个个体的情感估计结果的分析表明,人类编码员经常错误地估计负面情感标签,在这种情况下,DNN模型的性能高于人类编码员。这些结果表明,生理信号可以帮助发现负面情绪的内隐方面,这是声学/视觉上无法区分的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception Human-centered Multimodal Machine Intelligence Touch Recognition with Attentive End-to-End Model MORSE: MultimOdal sentiment analysis for Real-life SEttings Temporal Attention and Consistency Measuring for Video Question Answering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1