首页 > 最新文献

Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference 面向人类导航意图推理的多模态和上下文感知框架
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421156
Z. Zhang
A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.
一个被社会接受的机器人需要做出正确的决定,能够理解人类的意图,以便与人类互动并安全地在人类周围导航。尽管计算机视觉和机器人技术的研究近年来取得了巨大的进步,但今天的机器人系统仍然需要更好地理解人类的意图,才能更有效地被广泛接受。目前,这种推断通常只使用一种感知模式,如视觉或人类运动轨迹。在这篇扩展摘要中,我描述了我的博士研究计划,即开发一种新的多模态和上下文感知框架,其中机器人通过由人类时间面部、身体姿势和凝视特征、人类运动特征以及环境背景组成的多模态感知来推断人类导航意图。为了实现这一框架,设计了一个数据收集实验来获取多模态人机交互数据。我们最初的框架设计是基于一个时间神经网络模型,以人体运动、身体姿势和头部方向特征作为输入。我们将增加神经网络模型的复杂性以及输入特征。从长远来看,这个框架可以使自动驾驶、服务和家用机器人等各种环境受益。
{"title":"Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference","authors":"Z. Zhang","doi":"10.1145/3382507.3421156","DOIUrl":"https://doi.org/10.1145/3382507.3421156","url":null,"abstract":"A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126567628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Comparison between Laboratory and Wearable Sensors in the Context of Physiological Synchrony 生理同步环境下实验室传感器与可穿戴传感器的比较
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418837
Jasper J. van Beers, I. Stuldreher, Nattapong Thammasan, A. Brouwer
Measuring concurrent changes in autonomic physiological responses aggregated across individuals (Physiological Synchrony - PS) can provide insight into group-level cognitive or emotional processes. Utilizing cheap and easy-to-use wearable sensors to measure physiology rather than their high-end laboratory counterparts is desirable. Since it is currently ambiguous how different signal properties (arising from different types of measuring equipment) influence the detection of PS associated with mental processes, it is unclear whether, or to what extent, PS based on data from wearables compares to that from their laboratory equivalents. Existing literature has investigated PS using both types of equipment, but none compared them directly. In this study, we measure PS in electrodermal activity (EDA) and inter-beat interval (IBI, inverse of heart rate) of participants who listened to the same audio stream but were either instructed to attend to the presented narrative (n=13) or to the interspersed auditory events (n=13). Both laboratory and wearable sensors were used (ActiveTwo electrocardiogram (ECG) and EDA; Wahoo Tickr and EdaMove4). A participant's attentional condition was classified based on which attentional group they shared greater synchrony with. For both types of sensors, we found classification accuracies of 73% or higher in both EDA and IBI. We found no significant difference in classification accuracies between the laboratory and wearable sensors. These findings encourage the use of wearables for PS based research and for in-the-field measurements.
测量个体间自主生理反应的同步变化(生理同步- PS)可以深入了解群体水平的认知或情绪过程。利用廉价和易于使用的可穿戴传感器来测量生理,而不是他们的高端实验室同行是可取的。由于目前尚不清楚不同的信号特性(来自不同类型的测量设备)如何影响与心理过程相关的PS检测,因此尚不清楚基于可穿戴设备数据的PS是否或在多大程度上与实验室等效数据相比较。现有文献研究了使用两种设备的PS,但没有直接比较它们。在这项研究中,我们测量了皮电活动(EDA)和心跳间隔(IBI,心率的倒数)的PS,这些参与者听了相同的音频流,但被指示注意所呈现的叙述(n=13)或穿插的听觉事件(n=13)。使用实验室和可穿戴传感器(ActiveTwo心电图和EDA);雅虎股票和EdaMove4。参与者的注意力状况是根据他们与哪个注意力组有更大的同步性来分类的。对于这两种类型的传感器,我们发现EDA和IBI的分类准确率为73%或更高。我们发现实验室和可穿戴传感器在分类精度上没有显著差异。这些发现鼓励可穿戴设备用于基于PS的研究和现场测量。
{"title":"A Comparison between Laboratory and Wearable Sensors in the Context of Physiological Synchrony","authors":"Jasper J. van Beers, I. Stuldreher, Nattapong Thammasan, A. Brouwer","doi":"10.1145/3382507.3418837","DOIUrl":"https://doi.org/10.1145/3382507.3418837","url":null,"abstract":"Measuring concurrent changes in autonomic physiological responses aggregated across individuals (Physiological Synchrony - PS) can provide insight into group-level cognitive or emotional processes. Utilizing cheap and easy-to-use wearable sensors to measure physiology rather than their high-end laboratory counterparts is desirable. Since it is currently ambiguous how different signal properties (arising from different types of measuring equipment) influence the detection of PS associated with mental processes, it is unclear whether, or to what extent, PS based on data from wearables compares to that from their laboratory equivalents. Existing literature has investigated PS using both types of equipment, but none compared them directly. In this study, we measure PS in electrodermal activity (EDA) and inter-beat interval (IBI, inverse of heart rate) of participants who listened to the same audio stream but were either instructed to attend to the presented narrative (n=13) or to the interspersed auditory events (n=13). Both laboratory and wearable sensors were used (ActiveTwo electrocardiogram (ECG) and EDA; Wahoo Tickr and EdaMove4). A participant's attentional condition was classified based on which attentional group they shared greater synchrony with. For both types of sensors, we found classification accuracies of 73% or higher in both EDA and IBI. We found no significant difference in classification accuracies between the laboratory and wearable sensors. These findings encourage the use of wearables for PS based research and for in-the-field measurements.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges EmotiW 2020:驾驶员凝视、群体情感、学生参与和基于生理信号的挑战
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417973
Abhinav Dhall, Garima Sharma, R. Goecke, Tom Gedeon
This paper introduces the Eighth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW is a benchmarking effort run as a grand challenge of the 22nd ACM International Conference on Multimodal Interaction 2020. It comprises of four tasks related to automatic human behavior analysis: a) driver gaze prediction; b) audio-visual group-level emotion recognition; c) engagement prediction in the wild; and d) physiological signal based emotion recognition. The motivation of EmotiW is to bring researchers in affective computing, computer vision, speech processing and machine learning to a common platform for evaluating techniques on a test data. We discuss the challenge protocols, databases and their associated baselines.
本文介绍了第八届野外情绪识别挑战赛(EmotiW)。EmotiW是作为2020年第22届ACM国际多模式交互会议的重大挑战而进行的基准测试工作。它包括与人类行为自动分析相关的四个任务:a)驾驶员注视预测;B)视听群体级情感识别;C)野外参与度预测;d)基于生理信号的情绪识别。EmotiW的动机是将情感计算、计算机视觉、语音处理和机器学习方面的研究人员带到一个共同的平台上,以评估测试数据上的技术。我们讨论了挑战协议、数据库及其相关基线。
{"title":"EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges","authors":"Abhinav Dhall, Garima Sharma, R. Goecke, Tom Gedeon","doi":"10.1145/3382507.3417973","DOIUrl":"https://doi.org/10.1145/3382507.3417973","url":null,"abstract":"This paper introduces the Eighth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW is a benchmarking effort run as a grand challenge of the 22nd ACM International Conference on Multimodal Interaction 2020. It comprises of four tasks related to automatic human behavior analysis: a) driver gaze prediction; b) audio-visual group-level emotion recognition; c) engagement prediction in the wild; and d) physiological signal based emotion recognition. The motivation of EmotiW is to bring researchers in affective computing, computer vision, speech processing and machine learning to a common platform for evaluating techniques on a test data. We discuss the challenge protocols, databases and their associated baselines.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131466210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Spark Creativity by Speaking Enthusiastically: Communication Training using an E-Coach 激情演讲激发创造力:使用电子教练进行沟通培训
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421164
Carla Viegas, Albert Lu, A. Su, Carter Strear, Yi Xu, Albert Topdjian, Daniel Limon, J. J. Xu
Enthusiasm in speech has a huge impact on listeners. Students of enthusiastic teachers show better performance. Leaders that are enthusiastic influence employee's innovative behavior and can also spark excitement in customers. We, at TalkMeUp, want to help people learn how to talk with enthusiasm in order to spark creativity among their listeners. In this work we want to present a multimodal speech analysis platform. We provide feedback on enthusiasm by analyzing eye contact, facial expressions, voice prosody, and text content.
演讲的热情对听者有巨大的影响。热情老师的学生表现更好。热情的领导者会影响员工的创新行为,也会激发客户的兴奋。在TalkMeUp,我们希望帮助人们学习如何充满热情地交谈,从而激发听众的创造力。在这项工作中,我们想要提出一个多模态语音分析平台。我们通过分析眼神交流、面部表情、语音韵律和文本内容来提供热情反馈。
{"title":"Spark Creativity by Speaking Enthusiastically: Communication Training using an E-Coach","authors":"Carla Viegas, Albert Lu, A. Su, Carter Strear, Yi Xu, Albert Topdjian, Daniel Limon, J. J. Xu","doi":"10.1145/3382507.3421164","DOIUrl":"https://doi.org/10.1145/3382507.3421164","url":null,"abstract":"Enthusiasm in speech has a huge impact on listeners. Students of enthusiastic teachers show better performance. Leaders that are enthusiastic influence employee's innovative behavior and can also spark excitement in customers. We, at TalkMeUp, want to help people learn how to talk with enthusiasm in order to spark creativity among their listeners. In this work we want to present a multimodal speech analysis platform. We provide feedback on enthusiasm by analyzing eye contact, facial expressions, voice prosody, and text content.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124590474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Assessment of Oral Presentations using HMMs 使用hmm进行口头报告的多模态评估
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418888
Everlyne Kimani, Prasanth Murali, Ameneh Shamekhi, Dhaval Parmar, Sumanth Munikoti, T. Bickmore
Audience perceptions of public speakers' performance change over time. Some speakers start strong but quickly transition to mundane delivery, while others may have a few impactful and engaging portions of their talk preceded and followed by more pedestrian delivery. In this work, we model the time-varying qualities of a presentation as perceived by the audience and use these models both to provide diagnostic information to presenters and to improve the quality of automated performance assessments. In particular, we use HMMs to model various dimensions of perceived quality and how they change over time and use the sequence of quality states to improve feedback and predictions. We evaluate this approach on a corpus of 74 presentations given in a controlled environment. Multimodal features-spanning acoustic qualities, speech disfluencies, and nonverbal behavior were derived both automatically and manually using crowdsourcing. Ground truth on audience perceptions was obtained using judge ratings on both overall presentations (aggregate) and portions of presentations segmented by topic. We distilled the overall presentation quality into states representing the presenter's gaze, audio, gesture, audience interaction, and proxemic behaviors. We demonstrate that an HMM of state-based representation of presentations improves the performance assessments.
观众对演讲者表现的看法会随着时间的推移而改变。有些演讲者一开始很有力,但很快就会过渡到平淡无奇的演讲,而另一些人可能会在演讲之前有一些有影响力和吸引人的部分,然后再进行一些平淡无奇的演讲。在这项工作中,我们对观众感知的演示的时变质量进行建模,并使用这些模型为演示者提供诊断信息,并提高自动性能评估的质量。特别是,我们使用hmm来模拟感知质量的各个维度以及它们如何随时间变化,并使用质量状态序列来改进反馈和预测。我们在受控环境中对74个演示文稿的语料库进行了评估。多模态特征——跨越声学质量、语音不流畅和非语言行为——通过众包自动和手动导出。观众感知的基本真相是通过评委对总体(总体)和按主题细分的部分演示的评分来获得的。我们将整个演示质量提炼成代表演示者的目光、音频、手势、观众互动和邻近行为的状态。我们证明了基于状态表示的HMM改进了性能评估。
{"title":"Multimodal Assessment of Oral Presentations using HMMs","authors":"Everlyne Kimani, Prasanth Murali, Ameneh Shamekhi, Dhaval Parmar, Sumanth Munikoti, T. Bickmore","doi":"10.1145/3382507.3418888","DOIUrl":"https://doi.org/10.1145/3382507.3418888","url":null,"abstract":"Audience perceptions of public speakers' performance change over time. Some speakers start strong but quickly transition to mundane delivery, while others may have a few impactful and engaging portions of their talk preceded and followed by more pedestrian delivery. In this work, we model the time-varying qualities of a presentation as perceived by the audience and use these models both to provide diagnostic information to presenters and to improve the quality of automated performance assessments. In particular, we use HMMs to model various dimensions of perceived quality and how they change over time and use the sequence of quality states to improve feedback and predictions. We evaluate this approach on a corpus of 74 presentations given in a controlled environment. Multimodal features-spanning acoustic qualities, speech disfluencies, and nonverbal behavior were derived both automatically and manually using crowdsourcing. Ground truth on audience perceptions was obtained using judge ratings on both overall presentations (aggregate) and portions of presentations segmented by topic. We distilled the overall presentation quality into states representing the presenter's gaze, audio, gesture, audience interaction, and proxemic behaviors. We demonstrate that an HMM of state-based representation of presentations improves the performance assessments.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127961629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detection of Micro-expression Recognition Based on Spatio-Temporal Modelling and Spatial Attention 基于时空建模和空间注意的微表情识别检测
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421160
Mengjiong Bai
My PhD project aims to make contributions in the affective computing application to assist in the depression diagnosis by micro-expression recognition. My motivation is the similarities of the low-intensity facial expressions in micro-expressions and the low-intensity facial expressions (`frozen face?) in people with psycho-motor retardation caused by depression. It will focus on, firstly, investigating spatio-temporal modelling and attention systems for micro-expression recognition (MER) and, secondly, exploring the role of micro-expressions in automated depression analysis by improving deep learning architectures to detect low-intensity facial expressions. This work will investigate different deep learning architectures (e.g. Temporal Convolutional Networks (TCNN) or Gate Recurrent Unit (GRU)) and validate the results on publicly available micro-expression benchmark datasets to quantitatively analyse the robustness and accuracy of MER's contribution to improving automatic depression analysis. Moreover, video magnification as a way to enhance small movements will be combined with the deep learning methods to address the low-intensity issues in MER.
我的博士项目是在情感计算应用方面做出贡献,通过微表情识别辅助抑郁症诊断。我的动机是微表情中的低强度面部表情与抑郁症引起的精神运动迟缓患者的低强度面部表情(“冻脸”)的相似性。首先,研究微表情识别(MER)的时空建模和注意系统;其次,通过改进深度学习架构来检测低强度面部表情,探索微表情在自动抑郁分析中的作用。这项工作将研究不同的深度学习架构(例如时间卷积网络(TCNN)或门递归单元(GRU)),并在公开可用的微表情基准数据集上验证结果,以定量分析MER对改进自动抑郁分析的鲁棒性和准确性。此外,视频放大作为一种增强小动作的方法将与深度学习方法相结合,以解决MER中的低强度问题。
{"title":"Detection of Micro-expression Recognition Based on Spatio-Temporal Modelling and Spatial Attention","authors":"Mengjiong Bai","doi":"10.1145/3382507.3421160","DOIUrl":"https://doi.org/10.1145/3382507.3421160","url":null,"abstract":"My PhD project aims to make contributions in the affective computing application to assist in the depression diagnosis by micro-expression recognition. My motivation is the similarities of the low-intensity facial expressions in micro-expressions and the low-intensity facial expressions (`frozen face?) in people with psycho-motor retardation caused by depression. It will focus on, firstly, investigating spatio-temporal modelling and attention systems for micro-expression recognition (MER) and, secondly, exploring the role of micro-expressions in automated depression analysis by improving deep learning architectures to detect low-intensity facial expressions. This work will investigate different deep learning architectures (e.g. Temporal Convolutional Networks (TCNN) or Gate Recurrent Unit (GRU)) and validate the results on publicly available micro-expression benchmark datasets to quantitatively analyse the robustness and accuracy of MER's contribution to improving automatic depression analysis. Moreover, video magnification as a way to enhance small movements will be combined with the deep learning methods to address the low-intensity issues in MER.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Did the Children Behave?: Investigating the Relationship Between Attachment Condition and Child Computer Interaction 孩子们乖吗?依恋状况与儿童电脑互动关系的研究
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418858
Dong-Bach Vo, S. Brewster, A. Vinciarelli
This work investigates the interplay between Child-Computer Interaction and attachment, a psychological construct that accounts for how children perceive their parents to be. In particular, the article makes use of a multimodal approach to test whether children with different attachment conditions tend to use differently the same interactive system. The experiments show that the accuracy in predicting usage behaviour changes, to a statistically significant extent, according to the attachment conditions of the 52 experiment participants (age-range 5 to 9). Such a result suggests that attachment-relevant processes are actually at work when people interact with technology, at least when it comes to children.
这项工作调查了儿童与电脑互动和依恋之间的相互作用,依恋是一种解释儿童如何看待父母的心理结构。特别地,本文使用了多模态方法来测试不同依恋条件下的儿童是否倾向于使用不同的互动系统。实验表明,根据52名实验参与者(年龄范围为5至9岁)的依恋状况,预测使用行为的准确性在统计上显著地发生了变化。这样的结果表明,当人们与科技互动时,依恋相关过程实际上在起作用,至少在儿童身上是这样。
{"title":"Did the Children Behave?: Investigating the Relationship Between Attachment Condition and Child Computer Interaction","authors":"Dong-Bach Vo, S. Brewster, A. Vinciarelli","doi":"10.1145/3382507.3418858","DOIUrl":"https://doi.org/10.1145/3382507.3418858","url":null,"abstract":"This work investigates the interplay between Child-Computer Interaction and attachment, a psychological construct that accounts for how children perceive their parents to be. In particular, the article makes use of a multimodal approach to test whether children with different attachment conditions tend to use differently the same interactive system. The experiments show that the accuracy in predicting usage behaviour changes, to a statistically significant extent, according to the attachment conditions of the 52 experiment participants (age-range 5 to 9). Such a result suggests that attachment-relevant processes are actually at work when people interact with technology, at least when it comes to children.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Early Prediction of Visitor Engagement in Science Museums with Multimodal Learning Analytics 基于多模态学习分析的科学博物馆游客参与早期预测
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418890
Andrew Emerson, Nathan L. Henderson, Jonathan P. Rowe, Wookhee Min, Seung Y. Lee, James Minogue, James C. Lester
Modeling visitor engagement is a key challenge in informal learning environments, such as museums and science centers. Devising predictive models of visitor engagement that accurately forecast salient features of visitor behavior, such as dwell time, holds significant potential for enabling adaptive learning environments and visitor analytics for museums and science centers. In this paper, we introduce a multimodal early prediction approach to modeling visitor engagement with interactive science museum exhibits. We utilize multimodal sensor data including eye gaze, facial expression, posture, and interaction log data captured during visitor interactions with an interactive museum exhibit for environmental science education, to induce predictive models of visitor dwell time. We investigate machine learning techniques (random forest, support vector machine, Lasso regression, gradient boosting trees, and multi-layer perceptron) to induce multimodal predictive models of visitor engagement with data from 85 museum visitors. Results from a series of ablation experiments suggest that incorporating additional modalities into predictive models of visitor engagement improves model accuracy. In addition, the models show improved predictive performance over time, demonstrating that increasingly accurate predictions of visitor dwell time can be achieved as more evidence becomes available from visitor interactions with interactive science museum exhibits. These findings highlight the efficacy of multimodal data for modeling museum exhibit visitor engagement.
在博物馆和科学中心等非正式学习环境中,建立访客参与模型是一项关键挑战。设计游客参与的预测模型,准确预测游客行为的显著特征,如停留时间,为博物馆和科学中心的适应性学习环境和游客分析提供了巨大的潜力。在本文中,我们引入了一种多模态早期预测方法来建模参观者与互动式科学博物馆展品的互动。我们利用多模态传感器数据,包括眼神、面部表情、姿势和互动日志数据,在游客与环境科学教育互动博物馆展览互动期间捕获,以诱导游客停留时间的预测模型。我们研究了机器学习技术(随机森林、支持向量机、Lasso回归、梯度增强树和多层感知器),利用85名博物馆游客的数据推导出游客参与度的多模态预测模型。一系列消融实验的结果表明,在访问者参与的预测模型中加入额外的模式可以提高模型的准确性。此外,随着时间的推移,模型的预测性能也有所提高,这表明随着参观者与互动式科学博物馆展品的互动获得更多证据,对游客停留时间的预测可以越来越准确。这些发现强调了多模态数据对博物馆展览游客参与建模的有效性。
{"title":"Early Prediction of Visitor Engagement in Science Museums with Multimodal Learning Analytics","authors":"Andrew Emerson, Nathan L. Henderson, Jonathan P. Rowe, Wookhee Min, Seung Y. Lee, James Minogue, James C. Lester","doi":"10.1145/3382507.3418890","DOIUrl":"https://doi.org/10.1145/3382507.3418890","url":null,"abstract":"Modeling visitor engagement is a key challenge in informal learning environments, such as museums and science centers. Devising predictive models of visitor engagement that accurately forecast salient features of visitor behavior, such as dwell time, holds significant potential for enabling adaptive learning environments and visitor analytics for museums and science centers. In this paper, we introduce a multimodal early prediction approach to modeling visitor engagement with interactive science museum exhibits. We utilize multimodal sensor data including eye gaze, facial expression, posture, and interaction log data captured during visitor interactions with an interactive museum exhibit for environmental science education, to induce predictive models of visitor dwell time. We investigate machine learning techniques (random forest, support vector machine, Lasso regression, gradient boosting trees, and multi-layer perceptron) to induce multimodal predictive models of visitor engagement with data from 85 museum visitors. Results from a series of ablation experiments suggest that incorporating additional modalities into predictive models of visitor engagement improves model accuracy. In addition, the models show improved predictive performance over time, demonstrating that increasingly accurate predictions of visitor dwell time can be achieved as more evidence becomes available from visitor interactions with interactive science museum exhibits. These findings highlight the efficacy of multimodal data for modeling museum exhibit visitor engagement.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Group Level Audio-Video Emotion Recognition Using Hybrid Networks 使用混合网络的群体级音频-视频情感识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417968
Chuanhe Liu, Wenqian Jiang, Minghao Wang, Tianhao Tang
This paper presents a hybrid network for audio-video group Emo-tion Recognition. The proposed architecture includes audio stream,facial emotion stream, environmental object statistics stream (EOS)and video stream. We adopted this method at the 8th EmotionRecognition in the Wild Challenge (EmotiW2020). According to thefeedback of our submissions, the best result achieved 76.85% in theVideo level Group AFfect (VGAF) Test Database, 26.89% higherthan the baseline. Such improvements prove that our method isstate-of-the-art.
提出了一种用于音视频群情感识别的混合网络。所提出的架构包括音频流、面部情绪流、环境对象统计流和视频流。我们在第8届野生挑战赛(EmotiW2020)中采用了这种方法。根据我们提交的反馈,最佳结果在视频级组影响(VGAF)测试数据库中达到76.85%,比基线高26.89%。这样的改进证明我们的方法是最先进的。
{"title":"Group Level Audio-Video Emotion Recognition Using Hybrid Networks","authors":"Chuanhe Liu, Wenqian Jiang, Minghao Wang, Tianhao Tang","doi":"10.1145/3382507.3417968","DOIUrl":"https://doi.org/10.1145/3382507.3417968","url":null,"abstract":"This paper presents a hybrid network for audio-video group Emo-tion Recognition. The proposed architecture includes audio stream,facial emotion stream, environmental object statistics stream (EOS)and video stream. We adopted this method at the 8th EmotionRecognition in the Wild Challenge (EmotiW2020). According to thefeedback of our submissions, the best result achieved 76.85% in theVideo level Group AFfect (VGAF) Test Database, 26.89% higherthan the baseline. Such improvements prove that our method isstate-of-the-art.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124921558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Analysis of Face-Touching Behavior in Large Scale Social Interaction Dataset 大规模社会互动数据集中的触脸行为分析
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418876
Cigdem Beyan, Matteo Bustreo, Muhammad Shahid, Gianluca Bailo, N. Carissimi, Alessio Del Bue
We present the first publicly available annotations for the analysis of face-touching behavior. These annotations are for a dataset composed of audio-visual recordings of small group social interactions with a total number of 64 videos, each one lasting between 12 to 30 minutes and showing a single person while participating to four-people meetings. They were performed by in total 16 annotators with an almost perfect agreement (Cohen's Kappa=0.89) on average. In total, 74K and 2M video frames were labelled as face-touch and no-face-touch, respectively. Given the dataset and the collected annotations, we also present an extensive evaluation of several methods: rule-based, supervised learning with hand-crafted features and feature learning and inference with a Convolutional Neural Network (CNN) for Face-Touching detection. Our evaluation indicates that among all, CNN performed the best, reaching 83.76% F1-score and 0.84 Matthews Correlation Coefficient. To foster future research in this problem, code and dataset were made publicly available (github.com/IIT-PAVIS/Face-Touching-Behavior), providing all video frames, face-touch annotations, body pose estimations including face and hands key-points detection, face bounding boxes as well as the baseline methods implemented and the cross-validation splits used for training and evaluating our models.
我们提出了第一个公开可用的注释,用于分析触摸面部的行为。这些注释是针对一个数据集,该数据集由小团体社交互动的视听记录组成,共有64个视频,每个视频持续12到30分钟,显示一个人同时参加四人会议。总共有16位注释者执行了这些注释,平均几乎完全一致(Cohen’s Kappa=0.89)。总共有74K和2M视频帧分别被标记为面部触摸和非面部触摸。考虑到数据集和收集到的注释,我们还对几种方法进行了广泛的评估:基于规则的、有监督的手工特征学习,以及用于面部触摸检测的卷积神经网络(CNN)特征学习和推理。我们的评价表明,其中CNN表现最好,f1得分为83.76%,Matthews相关系数为0.84。为了促进对这个问题的未来研究,代码和数据集被公开(github.com/IIT-PAVIS/Face-Touching-Behavior),提供了所有视频帧、面部触摸注释、身体姿势估计(包括面部和手部关键点检测)、面部边界框以及实现的基线方法和用于训练和评估我们的模型的交叉验证分割。
{"title":"Analysis of Face-Touching Behavior in Large Scale Social Interaction Dataset","authors":"Cigdem Beyan, Matteo Bustreo, Muhammad Shahid, Gianluca Bailo, N. Carissimi, Alessio Del Bue","doi":"10.1145/3382507.3418876","DOIUrl":"https://doi.org/10.1145/3382507.3418876","url":null,"abstract":"We present the first publicly available annotations for the analysis of face-touching behavior. These annotations are for a dataset composed of audio-visual recordings of small group social interactions with a total number of 64 videos, each one lasting between 12 to 30 minutes and showing a single person while participating to four-people meetings. They were performed by in total 16 annotators with an almost perfect agreement (Cohen's Kappa=0.89) on average. In total, 74K and 2M video frames were labelled as face-touch and no-face-touch, respectively. Given the dataset and the collected annotations, we also present an extensive evaluation of several methods: rule-based, supervised learning with hand-crafted features and feature learning and inference with a Convolutional Neural Network (CNN) for Face-Touching detection. Our evaluation indicates that among all, CNN performed the best, reaching 83.76% F1-score and 0.84 Matthews Correlation Coefficient. To foster future research in this problem, code and dataset were made publicly available (github.com/IIT-PAVIS/Face-Touching-Behavior), providing all video frames, face-touch annotations, body pose estimations including face and hands key-points detection, face bounding boxes as well as the baseline methods implemented and the cross-validation splits used for training and evaluating our models.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123751220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Proceedings of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1