Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics

Andrew Emerson, Patrick Houghton, Ke Chen, Vinay Basheerabad, Rutuja Ubale, C. W. Leong
{"title":"Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics","authors":"Andrew Emerson, Patrick Houghton, Ke Chen, Vinay Basheerabad, Rutuja Ubale, C. W. Leong","doi":"10.1145/3536220.3558007","DOIUrl":null,"url":null,"abstract":"A critical component of effective communication is the ability to project confidence. In video presentations (e.g., video interviews), there are many factors that influence perceived confidence by a listener. Advances in computer vision, speech processing, and natural language processing have enabled the automatic extraction of salient features that can be used to model a presenter’s perceived confidence. Moreover, these multimodal features can be used to automatically provide feedback to a user with ways they can improve their projected confidence. This paper introduces a multimodal approach to modeling user confidence in video presentations by leveraging features from visual cues (i.e., eye gaze) and speech patterns. We investigate the degree to which the extracted multimodal features were predictive of user confidence with a dataset of 48 2-minute videos, where the participants used a webcam and microphone to record themselves responding to a prompt. Comparative experimental results indicate that our modeling approach of using both visual and speech features are able to score 83% and 78% improvements over the random and majority label baselines, respectively. We discuss implications of using the multimodal features for modeling confidence as well as the potential for automated feedback to users who want to improve their confidence in video presentations.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A critical component of effective communication is the ability to project confidence. In video presentations (e.g., video interviews), there are many factors that influence perceived confidence by a listener. Advances in computer vision, speech processing, and natural language processing have enabled the automatic extraction of salient features that can be used to model a presenter’s perceived confidence. Moreover, these multimodal features can be used to automatically provide feedback to a user with ways they can improve their projected confidence. This paper introduces a multimodal approach to modeling user confidence in video presentations by leveraging features from visual cues (i.e., eye gaze) and speech patterns. We investigate the degree to which the extracted multimodal features were predictive of user confidence with a dataset of 48 2-minute videos, where the participants used a webcam and microphone to record themselves responding to a prompt. Comparative experimental results indicate that our modeling approach of using both visual and speech features are able to score 83% and 78% improvements over the random and majority label baselines, respectively. We discuss implications of using the multimodal features for modeling confidence as well as the potential for automated feedback to users who want to improve their confidence in video presentations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用时空多模态分析预测视频记录中的用户信心
有效沟通的一个重要组成部分是展现自信的能力。在视频演示(如视频访谈)中,有许多因素会影响听者感知到的自信。计算机视觉、语音处理和自然语言处理方面的进步已经能够自动提取显著特征,这些特征可以用来建立演示者感知自信的模型。此外,这些多模式特征可以用来自动向用户提供反馈,以提高他们的预期信心。本文介绍了一种多模态方法,通过利用视觉线索(即眼睛注视)和语音模式的特征来建模视频演示中的用户信心。我们用48个2分钟视频的数据集调查了提取的多模态特征对用户信心的预测程度,参与者使用网络摄像头和麦克风记录自己对提示的反应。对比实验结果表明,我们使用视觉和语音特征的建模方法分别比随机和多数标签基线提高了83%和78%。我们讨论了使用多模态特征对建模信心的影响,以及为希望提高视频演示信心的用户提供自动反馈的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigating Transformer Encoders and Fusion Strategies for Speech Emotion Recognition in Emergency Call Center Conversations. Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings An Emotional Respiration Speech Dataset Can you tell that I’m confused? An overhearer study for German backchannels by an embodied agent
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1