基于高度不平衡野外数据的多模态接触和脱离检测方法

D. Fedotov, O. Perepelkina, E. Kazimirova, M. Konstantinova, W. Minker
{"title":"基于高度不平衡野外数据的多模态接触和脱离检测方法","authors":"D. Fedotov, O. Perepelkina, E. Kazimirova, M. Konstantinova, W. Minker","doi":"10.1145/3279810.3279842","DOIUrl":null,"url":null,"abstract":"Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of engaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data\",\"authors\":\"D. Fedotov, O. Perepelkina, E. Kazimirova, M. Konstantinova, W. Minker\",\"doi\":\"10.1145/3279810.3279842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of engaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.\",\"PeriodicalId\":326513,\"journal\":{\"name\":\"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3279810.3279842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3279810.3279842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

接触/脱离检测是一项具有挑战性的任务,出现在一系列人机交互问题中。虽然很重要,但这个问题还远远没有解决,到目前为止已经进行了一些涉及野外数据的研究。在参与/未参与状态的定义中消除歧义使得收集、注释和分析这些数据变得困难。在本文中,我们描述了构建参与/脱离模型的不同方法,这些模型处理来自自然对话的高度不平衡的多模态数据。我们通过直接分类设置了0.695(未加权平均召回率)的基线结果。然后我们尝试通过投入回归模型来检测脱离,因为它们具有很强的负相关。为了处理不平衡数据,我们应用了类加权和数据增强技术(SMOTE和mixup)。我们尝试多种模式的组合,以找到最有效的模式。我们使用来自音频(语音)和视频(面部、身体、嘴唇、眼睛)通道的特征。我们利用主成分分析对原始特征进行变换,并进行了几种情态融合实验。最后,我们结合方法并使用四种模态(除面外的所有通道)将性能提高到0.715。声音和嘴唇特征似乎是最重要的特征,它们可能与语言密切相关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data
Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of engaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Rule-based learning for eye movement type detection Predicting group satisfaction in meeting discussions Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal The role of emotion in problem solving: first results from observing chess Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1