Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文中文

Eye-Tracking to Predict User Cognitive Abilities and Performance for User-Adaptive Narrative Visualizations 眼动追踪预测用户认知能力和用户自适应叙事可视化的表现

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418884

Oswald Barral, Sébastien Lallé, Grigorii Guz, A. Iranpour, C. Conati

We leverage eye-tracking data to predict user performance and levels of cognitive abilities while reading magazine-style narrative visualizations (MSNV), a widespread form of multimodal documents that combine text and visualizations. Such predictions are motivated by recent interest in devising user-adaptive MSNVs that can dynamically adapt to a user's needs. Our results provide evidence for the feasibility of real-time user modeling in MSNV, as we are the first to consider eye tracking data for predicting task comprehension and cognitive abilities while processing multimodal documents. We follow with a discussion on the implications to the design of personalized MSNVs.

我们利用眼动追踪数据来预测用户在阅读杂志式叙事可视化(MSNV)时的表现和认知能力水平，MSNV是一种结合文本和可视化的多模态文档的广泛形式。这种预测的动机是最近对设计能够动态适应用户需求的用户自适应msnv的兴趣。我们的研究结果为MSNV中实时用户建模的可行性提供了证据，因为我们是第一个在处理多模态文档时考虑眼动追踪数据来预测任务理解和认知能力的人。接下来，我们将讨论对个性化msnv设计的影响。

引用次数: 10

Predicting Video Affect via Induced Affection in the Wild 在野外通过诱导情感预测视频影响

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418838

Yi Ding, Radha Kumaran, Tianjiao Yang, Tobias Höllerer

Curating large and high quality datasets for studying affect is a costly and time consuming process, especially when the labels are continuous. In this paper, we examine the potential to use unlabeled public reactions in the form of textual comments to aid in classifying video affect. We examine two popular datasets used for affect recognition and mine public reactions for these videos. We learn a representation of these reactions by using the video ratings as a weakly supervised signal. We show that our model can learn a fine-graind prediction of comment affect when given a video alone. Furthermore, we demonstrate how predicting the affective properties of a comment can be a potentially useful modality to use in multimodal affect modeling.

管理大型和高质量的数据集来研究影响是一个昂贵和耗时的过程，特别是当标签是连续的。在本文中，我们研究了以文本评论的形式使用未标记的公众反应来帮助分类视频影响的潜力。我们检查了用于情感识别的两个流行数据集，并挖掘了这些视频的公众反应。我们通过使用视频评分作为弱监督信号来学习这些反应的表示。我们表明，当只给一个视频时，我们的模型可以学习对评论影响的细粒度预测。此外，我们展示了如何预测评论的情感属性可以成为多模态情感建模中潜在有用的模态。

引用次数: 0

BreathEasy: Assessing Respiratory Diseases Using Mobile Multimodal Sensors 使用移动多模态传感器评估呼吸系统疾病

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418852

Md. Mahbubur Rahman, M. Y. Ahmed, Tousif Ahmed, Bashima Islam, Viswam Nathan, K. Vatanparvar, Ebrahim Nemati, Daniel McCaffrey, Jilong Kuang, J. Gao

Mobil respiratory assessments using commodity smartphones and smartwatches are unmet needs for patient monitoring at home. In this paper, we show the feasibility of using multimodal sensors embedded in consumer mobile devices for non-invasive, low-effort respiratory assessment. We have conducted studies with 228 chronic respiratory patients and healthy subjects, and show that our model can estimate respiratory rate with mean absolute error (MAE) 0.72$pm$0.62 breath per minute and differentiate respiratory patients from healthy subjects with 90% recall and 76% precision when the user breathes normally by holding the device on the chest or the abdomen for a minute. Holding the device on the chest or abdomen needs significantly lower effort compared to traditional spirometry which requires a specialized device and forceful vigorous breathing. This paper shows the feasibility of developing a low-effort respiratory assessment towards making it available anywhere, anytime through users' own mobile devices.

使用普通智能手机和智能手表进行移动呼吸评估，尚不能满足患者在家监测的需求。在本文中，我们展示了在消费者移动设备中使用多模态传感器进行非侵入性、低费力呼吸评估的可行性。我们对228名慢性呼吸患者和健康受试者进行了研究，结果表明，我们的模型可以以平均绝对误差(MAE) 0.72$pm$0.62呼吸/分钟估计呼吸频率，当用户将设备放在胸部或腹部一分钟正常呼吸时，我们的模型可以以90%的召回率和76%的准确率区分呼吸患者和健康受试者。与传统的肺活量测定法相比，将仪器放在胸部或腹部所需的力气要小得多，传统的肺活量测定法需要专门的仪器和有力的呼吸。本文展示了开发一种低成本的呼吸评估的可行性，使其可以通过用户自己的移动设备随时随地使用。

{"title":"BreathEasy: Assessing Respiratory Diseases Using Mobile Multimodal Sensors","authors":"Md. Mahbubur Rahman, M. Y. Ahmed, Tousif Ahmed, Bashima Islam, Viswam Nathan, K. Vatanparvar, Ebrahim Nemati, Daniel McCaffrey, Jilong Kuang, J. Gao","doi":"10.1145/3382507.3418852","DOIUrl":"https://doi.org/10.1145/3382507.3418852","url":null,"abstract":"Mobil respiratory assessments using commodity smartphones and smartwatches are unmet needs for patient monitoring at home. In this paper, we show the feasibility of using multimodal sensors embedded in consumer mobile devices for non-invasive, low-effort respiratory assessment. We have conducted studies with 228 chronic respiratory patients and healthy subjects, and show that our model can estimate respiratory rate with mean absolute error (MAE) 0.72$pm$0.62 breath per minute and differentiate respiratory patients from healthy subjects with 90% recall and 76% precision when the user breathes normally by holding the device on the chest or the abdomen for a minute. Holding the device on the chest or abdomen needs significantly lower effort compared to traditional spirometry which requires a specialized device and forceful vigorous breathing. This paper shows the feasibility of developing a low-effort respiratory assessment towards making it available anywhere, anytime through users' own mobile devices.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Spark Creativity by Speaking Enthusiastically: Communication Training using an E-Coach 激情演讲激发创造力:使用电子教练进行沟通培训

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421164

Carla Viegas, Albert Lu, A. Su, Carter Strear, Yi Xu, Albert Topdjian, Daniel Limon, J. J. Xu

Enthusiasm in speech has a huge impact on listeners. Students of enthusiastic teachers show better performance. Leaders that are enthusiastic influence employee's innovative behavior and can also spark excitement in customers. We, at TalkMeUp, want to help people learn how to talk with enthusiasm in order to spark creativity among their listeners. In this work we want to present a multimodal speech analysis platform. We provide feedback on enthusiasm by analyzing eye contact, facial expressions, voice prosody, and text content.

演讲的热情对听者有巨大的影响。热情老师的学生表现更好。热情的领导者会影响员工的创新行为，也会激发客户的兴奋。在TalkMeUp，我们希望帮助人们学习如何充满热情地交谈，从而激发听众的创造力。在这项工作中，我们想要提出一个多模态语音分析平台。我们通过分析眼神交流、面部表情、语音韵律和文本内容来提供热情反馈。

引用次数: 1

Multimodal Assessment of Oral Presentations using HMMs 使用hmm进行口头报告的多模态评估

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418888

Everlyne Kimani, Prasanth Murali, Ameneh Shamekhi, Dhaval Parmar, Sumanth Munikoti, T. Bickmore

Audience perceptions of public speakers' performance change over time. Some speakers start strong but quickly transition to mundane delivery, while others may have a few impactful and engaging portions of their talk preceded and followed by more pedestrian delivery. In this work, we model the time-varying qualities of a presentation as perceived by the audience and use these models both to provide diagnostic information to presenters and to improve the quality of automated performance assessments. In particular, we use HMMs to model various dimensions of perceived quality and how they change over time and use the sequence of quality states to improve feedback and predictions. We evaluate this approach on a corpus of 74 presentations given in a controlled environment. Multimodal features-spanning acoustic qualities, speech disfluencies, and nonverbal behavior were derived both automatically and manually using crowdsourcing. Ground truth on audience perceptions was obtained using judge ratings on both overall presentations (aggregate) and portions of presentations segmented by topic. We distilled the overall presentation quality into states representing the presenter's gaze, audio, gesture, audience interaction, and proxemic behaviors. We demonstrate that an HMM of state-based representation of presentations improves the performance assessments.

观众对演讲者表现的看法会随着时间的推移而改变。有些演讲者一开始很有力，但很快就会过渡到平淡无奇的演讲，而另一些人可能会在演讲之前有一些有影响力和吸引人的部分，然后再进行一些平淡无奇的演讲。在这项工作中，我们对观众感知的演示的时变质量进行建模，并使用这些模型为演示者提供诊断信息，并提高自动性能评估的质量。特别是，我们使用hmm来模拟感知质量的各个维度以及它们如何随时间变化，并使用质量状态序列来改进反馈和预测。我们在受控环境中对74个演示文稿的语料库进行了评估。多模态特征——跨越声学质量、语音不流畅和非语言行为——通过众包自动和手动导出。观众感知的基本真相是通过评委对总体(总体)和按主题细分的部分演示的评分来获得的。我们将整个演示质量提炼成代表演示者的目光、音频、手势、观众互动和邻近行为的状态。我们证明了基于状态表示的HMM改进了性能评估。

{"title":"Multimodal Assessment of Oral Presentations using HMMs","authors":"Everlyne Kimani, Prasanth Murali, Ameneh Shamekhi, Dhaval Parmar, Sumanth Munikoti, T. Bickmore","doi":"10.1145/3382507.3418888","DOIUrl":"https://doi.org/10.1145/3382507.3418888","url":null,"abstract":"Audience perceptions of public speakers' performance change over time. Some speakers start strong but quickly transition to mundane delivery, while others may have a few impactful and engaging portions of their talk preceded and followed by more pedestrian delivery. In this work, we model the time-varying qualities of a presentation as perceived by the audience and use these models both to provide diagnostic information to presenters and to improve the quality of automated performance assessments. In particular, we use HMMs to model various dimensions of perceived quality and how they change over time and use the sequence of quality states to improve feedback and predictions. We evaluate this approach on a corpus of 74 presentations given in a controlled environment. Multimodal features-spanning acoustic qualities, speech disfluencies, and nonverbal behavior were derived both automatically and manually using crowdsourcing. Ground truth on audience perceptions was obtained using judge ratings on both overall presentations (aggregate) and portions of presentations segmented by topic. We distilled the overall presentation quality into states representing the presenter's gaze, audio, gesture, audience interaction, and proxemic behaviors. We demonstrate that an HMM of state-based representation of presentations improves the performance assessments.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127961629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Detection of Micro-expression Recognition Based on Spatio-Temporal Modelling and Spatial Attention 基于时空建模和空间注意的微表情识别检测

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421160

Mengjiong Bai

My PhD project aims to make contributions in the affective computing application to assist in the depression diagnosis by micro-expression recognition. My motivation is the similarities of the low-intensity facial expressions in micro-expressions and the low-intensity facial expressions (`frozen face?) in people with psycho-motor retardation caused by depression. It will focus on, firstly, investigating spatio-temporal modelling and attention systems for micro-expression recognition (MER) and, secondly, exploring the role of micro-expressions in automated depression analysis by improving deep learning architectures to detect low-intensity facial expressions. This work will investigate different deep learning architectures (e.g. Temporal Convolutional Networks (TCNN) or Gate Recurrent Unit (GRU)) and validate the results on publicly available micro-expression benchmark datasets to quantitatively analyse the robustness and accuracy of MER's contribution to improving automatic depression analysis. Moreover, video magnification as a way to enhance small movements will be combined with the deep learning methods to address the low-intensity issues in MER.

我的博士项目是在情感计算应用方面做出贡献，通过微表情识别辅助抑郁症诊断。我的动机是微表情中的低强度面部表情与抑郁症引起的精神运动迟缓患者的低强度面部表情(“冻脸”)的相似性。首先，研究微表情识别(MER)的时空建模和注意系统;其次，通过改进深度学习架构来检测低强度面部表情，探索微表情在自动抑郁分析中的作用。这项工作将研究不同的深度学习架构(例如时间卷积网络(TCNN)或门递归单元(GRU))，并在公开可用的微表情基准数据集上验证结果，以定量分析MER对改进自动抑郁分析的鲁棒性和准确性。此外，视频放大作为一种增强小动作的方法将与深度学习方法相结合，以解决MER中的低强度问题。

{"title":"Detection of Micro-expression Recognition Based on Spatio-Temporal Modelling and Spatial Attention","authors":"Mengjiong Bai","doi":"10.1145/3382507.3421160","DOIUrl":"https://doi.org/10.1145/3382507.3421160","url":null,"abstract":"My PhD project aims to make contributions in the affective computing application to assist in the depression diagnosis by micro-expression recognition. My motivation is the similarities of the low-intensity facial expressions in micro-expressions and the low-intensity facial expressions (`frozen face?) in people with psycho-motor retardation caused by depression. It will focus on, firstly, investigating spatio-temporal modelling and attention systems for micro-expression recognition (MER) and, secondly, exploring the role of micro-expressions in automated depression analysis by improving deep learning architectures to detect low-intensity facial expressions. This work will investigate different deep learning architectures (e.g. Temporal Convolutional Networks (TCNN) or Gate Recurrent Unit (GRU)) and validate the results on publicly available micro-expression benchmark datasets to quantitatively analyse the robustness and accuracy of MER's contribution to improving automatic depression analysis. Moreover, video magnification as a way to enhance small movements will be combined with the deep learning methods to address the low-intensity issues in MER.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Did the Children Behave?: Investigating the Relationship Between Attachment Condition and Child Computer Interaction 孩子们乖吗?依恋状况与儿童电脑互动关系的研究

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418858

Dong-Bach Vo, S. Brewster, A. Vinciarelli

This work investigates the interplay between Child-Computer Interaction and attachment, a psychological construct that accounts for how children perceive their parents to be. In particular, the article makes use of a multimodal approach to test whether children with different attachment conditions tend to use differently the same interactive system. The experiments show that the accuracy in predicting usage behaviour changes, to a statistically significant extent, according to the attachment conditions of the 52 experiment participants (age-range 5 to 9). Such a result suggests that attachment-relevant processes are actually at work when people interact with technology, at least when it comes to children.

这项工作调查了儿童与电脑互动和依恋之间的相互作用，依恋是一种解释儿童如何看待父母的心理结构。特别地，本文使用了多模态方法来测试不同依恋条件下的儿童是否倾向于使用不同的互动系统。实验表明，根据52名实验参与者(年龄范围为5至9岁)的依恋状况，预测使用行为的准确性在统计上显著地发生了变化。这样的结果表明，当人们与科技互动时，依恋相关过程实际上在起作用，至少在儿童身上是这样。

引用次数: 2

Early Prediction of Visitor Engagement in Science Museums with Multimodal Learning Analytics 基于多模态学习分析的科学博物馆游客参与早期预测

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418890

Andrew Emerson, Nathan L. Henderson, Jonathan P. Rowe, Wookhee Min, Seung Y. Lee, James Minogue, James C. Lester

Modeling visitor engagement is a key challenge in informal learning environments, such as museums and science centers. Devising predictive models of visitor engagement that accurately forecast salient features of visitor behavior, such as dwell time, holds significant potential for enabling adaptive learning environments and visitor analytics for museums and science centers. In this paper, we introduce a multimodal early prediction approach to modeling visitor engagement with interactive science museum exhibits. We utilize multimodal sensor data including eye gaze, facial expression, posture, and interaction log data captured during visitor interactions with an interactive museum exhibit for environmental science education, to induce predictive models of visitor dwell time. We investigate machine learning techniques (random forest, support vector machine, Lasso regression, gradient boosting trees, and multi-layer perceptron) to induce multimodal predictive models of visitor engagement with data from 85 museum visitors. Results from a series of ablation experiments suggest that incorporating additional modalities into predictive models of visitor engagement improves model accuracy. In addition, the models show improved predictive performance over time, demonstrating that increasingly accurate predictions of visitor dwell time can be achieved as more evidence becomes available from visitor interactions with interactive science museum exhibits. These findings highlight the efficacy of multimodal data for modeling museum exhibit visitor engagement.

在博物馆和科学中心等非正式学习环境中，建立访客参与模型是一项关键挑战。设计游客参与的预测模型，准确预测游客行为的显著特征，如停留时间，为博物馆和科学中心的适应性学习环境和游客分析提供了巨大的潜力。在本文中，我们引入了一种多模态早期预测方法来建模参观者与互动式科学博物馆展品的互动。我们利用多模态传感器数据，包括眼神、面部表情、姿势和互动日志数据，在游客与环境科学教育互动博物馆展览互动期间捕获，以诱导游客停留时间的预测模型。我们研究了机器学习技术(随机森林、支持向量机、Lasso回归、梯度增强树和多层感知器)，利用85名博物馆游客的数据推导出游客参与度的多模态预测模型。一系列消融实验的结果表明，在访问者参与的预测模型中加入额外的模式可以提高模型的准确性。此外，随着时间的推移，模型的预测性能也有所提高，这表明随着参观者与互动式科学博物馆展品的互动获得更多证据，对游客停留时间的预测可以越来越准确。这些发现强调了多模态数据对博物馆展览游客参与建模的有效性。

{"title":"Early Prediction of Visitor Engagement in Science Museums with Multimodal Learning Analytics","authors":"Andrew Emerson, Nathan L. Henderson, Jonathan P. Rowe, Wookhee Min, Seung Y. Lee, James Minogue, James C. Lester","doi":"10.1145/3382507.3418890","DOIUrl":"https://doi.org/10.1145/3382507.3418890","url":null,"abstract":"Modeling visitor engagement is a key challenge in informal learning environments, such as museums and science centers. Devising predictive models of visitor engagement that accurately forecast salient features of visitor behavior, such as dwell time, holds significant potential for enabling adaptive learning environments and visitor analytics for museums and science centers. In this paper, we introduce a multimodal early prediction approach to modeling visitor engagement with interactive science museum exhibits. We utilize multimodal sensor data including eye gaze, facial expression, posture, and interaction log data captured during visitor interactions with an interactive museum exhibit for environmental science education, to induce predictive models of visitor dwell time. We investigate machine learning techniques (random forest, support vector machine, Lasso regression, gradient boosting trees, and multi-layer perceptron) to induce multimodal predictive models of visitor engagement with data from 85 museum visitors. Results from a series of ablation experiments suggest that incorporating additional modalities into predictive models of visitor engagement improves model accuracy. In addition, the models show improved predictive performance over time, demonstrating that increasingly accurate predictions of visitor dwell time can be achieved as more evidence becomes available from visitor interactions with interactive science museum exhibits. These findings highlight the efficacy of multimodal data for modeling museum exhibit visitor engagement.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Group Level Audio-Video Emotion Recognition Using Hybrid Networks 使用混合网络的群体级音频-视频情感识别

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417968

Chuanhe Liu, Wenqian Jiang, Minghao Wang, Tianhao Tang

This paper presents a hybrid network for audio-video group Emo-tion Recognition. The proposed architecture includes audio stream,facial emotion stream, environmental object statistics stream (EOS)and video stream. We adopted this method at the 8th EmotionRecognition in the Wild Challenge (EmotiW2020). According to thefeedback of our submissions, the best result achieved 76.85% in theVideo level Group AFfect (VGAF) Test Database, 26.89% higherthan the baseline. Such improvements prove that our method isstate-of-the-art.

提出了一种用于音视频群情感识别的混合网络。所提出的架构包括音频流、面部情绪流、环境对象统计流和视频流。我们在第8届野生挑战赛(EmotiW2020)中采用了这种方法。根据我们提交的反馈，最佳结果在视频级组影响(VGAF)测试数据库中达到76.85%，比基线高26.89%。这样的改进证明我们的方法是最先进的。

引用次数: 17

Analysis of Face-Touching Behavior in Large Scale Social Interaction Dataset 大规模社会互动数据集中的触脸行为分析

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418876

Cigdem Beyan, Matteo Bustreo, Muhammad Shahid, Gianluca Bailo, N. Carissimi, Alessio Del Bue

We present the first publicly available annotations for the analysis of face-touching behavior. These annotations are for a dataset composed of audio-visual recordings of small group social interactions with a total number of 64 videos, each one lasting between 12 to 30 minutes and showing a single person while participating to four-people meetings. They were performed by in total 16 annotators with an almost perfect agreement (Cohen's Kappa=0.89) on average. In total, 74K and 2M video frames were labelled as face-touch and no-face-touch, respectively. Given the dataset and the collected annotations, we also present an extensive evaluation of several methods: rule-based, supervised learning with hand-crafted features and feature learning and inference with a Convolutional Neural Network (CNN) for Face-Touching detection. Our evaluation indicates that among all, CNN performed the best, reaching 83.76% F1-score and 0.84 Matthews Correlation Coefficient. To foster future research in this problem, code and dataset were made publicly available (github.com/IIT-PAVIS/Face-Touching-Behavior), providing all video frames, face-touch annotations, body pose estimations including face and hands key-points detection, face bounding boxes as well as the baseline methods implemented and the cross-validation splits used for training and evaluating our models.

我们提出了第一个公开可用的注释，用于分析触摸面部的行为。这些注释是针对一个数据集，该数据集由小团体社交互动的视听记录组成，共有64个视频，每个视频持续12到30分钟，显示一个人同时参加四人会议。总共有16位注释者执行了这些注释，平均几乎完全一致(Cohen’s Kappa=0.89)。总共有74K和2M视频帧分别被标记为面部触摸和非面部触摸。考虑到数据集和收集到的注释，我们还对几种方法进行了广泛的评估:基于规则的、有监督的手工特征学习，以及用于面部触摸检测的卷积神经网络(CNN)特征学习和推理。我们的评价表明，其中CNN表现最好，f1得分为83.76%，Matthews相关系数为0.84。为了促进对这个问题的未来研究，代码和数据集被公开(github.com/IIT-PAVIS/Face-Touching-Behavior)，提供了所有视频帧、面部触摸注释、身体姿势估计(包括面部和手部关键点检测)、面部边界框以及实现的基线方法和用于训练和评估我们的模型的交叉验证分割。

{"title":"Analysis of Face-Touching Behavior in Large Scale Social Interaction Dataset","authors":"Cigdem Beyan, Matteo Bustreo, Muhammad Shahid, Gianluca Bailo, N. Carissimi, Alessio Del Bue","doi":"10.1145/3382507.3418876","DOIUrl":"https://doi.org/10.1145/3382507.3418876","url":null,"abstract":"We present the first publicly available annotations for the analysis of face-touching behavior. These annotations are for a dataset composed of audio-visual recordings of small group social interactions with a total number of 64 videos, each one lasting between 12 to 30 minutes and showing a single person while participating to four-people meetings. They were performed by in total 16 annotators with an almost perfect agreement (Cohen's Kappa=0.89) on average. In total, 74K and 2M video frames were labelled as face-touch and no-face-touch, respectively. Given the dataset and the collected annotations, we also present an extensive evaluation of several methods: rule-based, supervised learning with hand-crafted features and feature learning and inference with a Convolutional Neural Network (CNN) for Face-Touching detection. Our evaluation indicates that among all, CNN performed the best, reaching 83.76% F1-score and 0.84 Matthews Correlation Coefficient. To foster future research in this problem, code and dataset were made publicly available (github.com/IIT-PAVIS/Face-Touching-Behavior), providing all video frames, face-touch annotations, body pose estimations including face and hands key-points detection, face bounding boxes as well as the baseline methods implemented and the cross-validation splits used for training and evaluating our models.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123751220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀