Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文中文

Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children 更好地理解参与与儿童的多方口头互动

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820733

S. Moubayed, J. Lehman

A system's ability to understand and model a human's engagement during an interactive task is important for both adapting its behavior to the moment and achieving a coherent interaction over time. Standard practice for creating such a capability requires uncovering and modeling the multimodal cues that predict engagement in a given task environment. The first step in this methodology is to have human coders produce "gold standard" judgments of sample behavior. In this paper we report results from applying this first step to the complex and varied behavior of children playing a fast-paced, speech-controlled, side-scrolling game called Mole Madness. We introduce a concrete metric for engagement-willingness to continue the interaction--that leads to better inter-coder judgments for children playing in pairs, explore how coders perceive the relative contribution of audio and visual cues, and describe engagement trends and patterns in our population. We also examine how the measures change when the same children play Mole Madness with a robot instead of a peer. We conclude by discussing the implications of the differences within and across play conditions for the automatic estimation of engagement and the extension of our autonomous robot player into a "buddy" that can individualize interaction for each player and game.

在交互任务中，系统理解和模拟人类参与的能力对于使其行为适应当前情况和随着时间的推移实现连贯的交互都很重要。创建这种能力的标准实践需要揭示和建模在给定任务环境中预测参与的多模态线索。这种方法的第一步是让人类编码员对样本行为产生“黄金标准”判断。在本文中，我们报告了将这第一步应用于儿童玩一款名为《鼹鼠疯狂》的快节奏、语音控制、横向卷轴游戏的复杂多样行为的结果。我们引入了一个具体的参与度指标——继续互动的意愿——这可以让编码员更好地判断成对玩的孩子，探索编码员如何感知音频和视觉线索的相对贡献，并描述我们人群中的参与度趋势和模式。我们还研究了当同样的孩子与机器人而不是同伴玩《鼹鼠疯狂》时，测量结果是如何变化的。最后，我们讨论了游戏条件内部和游戏条件之间的差异对自动评估用户粘性的影响，并将自主机器人玩家扩展为能够为每个玩家和游戏个性化互动的“伙伴”。

{"title":"Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children","authors":"S. Moubayed, J. Lehman","doi":"10.1145/2818346.2820733","DOIUrl":"https://doi.org/10.1145/2818346.2820733","url":null,"abstract":"A system's ability to understand and model a human's engagement during an interactive task is important for both adapting its behavior to the moment and achieving a coherent interaction over time. Standard practice for creating such a capability requires uncovering and modeling the multimodal cues that predict engagement in a given task environment. The first step in this methodology is to have human coders produce \"gold standard\" judgments of sample behavior. In this paper we report results from applying this first step to the complex and varied behavior of children playing a fast-paced, speech-controlled, side-scrolling game called Mole Madness. We introduce a concrete metric for engagement-willingness to continue the interaction--that leads to better inter-coder judgments for children playing in pairs, explore how coders perceive the relative contribution of audio and visual cues, and describe engagement trends and patterns in our population. We also examine how the measures change when the same children play Mole Madness with a robot instead of a peer. We conclude by discussing the implications of the differences within and across play conditions for the automatic estimation of engagement and the extension of our autonomous robot player into a \"buddy\" that can individualize interaction for each player and game.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"143 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75350975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Multimodal Assessment of Teaching Behavior in Immersive Rehearsal Environment-TeachLivE 沉浸式排练环境下教学行为的多模态评价

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823306

R. Barmaki

Nonverbal behaviors such as facial expressions, eye contact, gestures, and body movements in general have strong impacts on the process of communicative interactions. Gestures play an important role in interpersonal communication in the classroom between student and teacher. To assist teachers with exhibiting open and positive nonverbal signals in their actual classroom, we have designed a multimodal teaching application with provisions for real-time feedback in coordination with our TeachLivE test-bed environment and its reflective application; ReflectLivE. Individuals walk into this virtual environment and interact with five virtual students shown on a large screen display. The recent research study is designed to have two settings (7-minute long each). In each of the settings, the participants are provided lesson plans from which they teach. All the participants are asked to take part in both settings, with half receiving automated real-time feedback about their body poses in the first session (group 1) and the other half receiving such feedback in the second session (group 2). Feedback is in the form of a visual indication each time the participant exhibits a closed stance. To create this automated feedback application, a closed posture corpus was collected and trained based on the existing TeachLivE teaching records. After each session, the participants take a post-questionnaire about their experience. We hypothesize that visual feedback improves positive body gestures for both groups during the feedback session, and that, for group 2, this persists into their second unaided session but, for group 1, improvements occur only during the second session.

非语言行为，如面部表情、眼神交流、手势和身体动作，通常对交流互动的过程有很大的影响。手势在课堂上师生之间的人际交往中起着重要的作用。为了帮助教师在实际课堂中表现出开放和积极的非语言信号，我们设计了一个多模式教学应用程序，并提供实时反馈，以配合我们的TeachLivE测试平台环境及其反思性应用程序;ReflectLivE。个人走进这个虚拟环境，并与显示在大屏幕上的五个虚拟学生互动。最近的研究被设计成两个场景(每个场景7分钟)。在每个设置中，参与者都提供了他们教学的课程计划。所有的参与者都被要求参加两种设置，其中一半人在第一阶段(第一组)收到关于他们身体姿势的自动实时反馈，另一半人在第二阶段(第二组)收到这样的反馈。反馈以视觉指示的形式出现，每次参与者表现出封闭的姿势。为了创建这个自动反馈应用程序，基于现有的TeachLivE教学记录收集和训练了一个封闭的姿势语料库。每次疗程结束后，参与者都会对他们的经历进行问卷调查。我们假设，在反馈过程中，视觉反馈改善了两组人的积极肢体动作，对第二组来说，这种改善会持续到他们的第二次独立会话，而对第一组来说，这种改善只会在第二次会话中出现。

{"title":"Multimodal Assessment of Teaching Behavior in Immersive Rehearsal Environment-TeachLivE","authors":"R. Barmaki","doi":"10.1145/2818346.2823306","DOIUrl":"https://doi.org/10.1145/2818346.2823306","url":null,"abstract":"Nonverbal behaviors such as facial expressions, eye contact, gestures, and body movements in general have strong impacts on the process of communicative interactions. Gestures play an important role in interpersonal communication in the classroom between student and teacher. To assist teachers with exhibiting open and positive nonverbal signals in their actual classroom, we have designed a multimodal teaching application with provisions for real-time feedback in coordination with our TeachLivE test-bed environment and its reflective application; ReflectLivE. Individuals walk into this virtual environment and interact with five virtual students shown on a large screen display. The recent research study is designed to have two settings (7-minute long each). In each of the settings, the participants are provided lesson plans from which they teach. All the participants are asked to take part in both settings, with half receiving automated real-time feedback about their body poses in the first session (group 1) and the other half receiving such feedback in the second session (group 2). Feedback is in the form of a visual indication each time the participant exhibits a closed stance. To create this automated feedback application, a closed posture corpus was collected and trained based on the existing TeachLivE teaching records. After each session, the participants take a post-questionnaire about their experience. We hypothesize that visual feedback improves positive body gestures for both groups during the feedback session, and that, for group 2, this persists into their second unaided session but, for group 1, improvements occur only during the second session.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72720552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Social Touch Gesture Recognition using Random Forest and Boosting on Distinct Feature Sets 基于随机森林和显著特征集增强的社交触摸手势识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830599

Y. F. A. Gaus, Temitayo A. Olugbade, Asim Jan, R. Qin, Jingxin Liu, Fan Zhang, H. Meng, N. Bianchi-Berthouze

Touch is a primary nonverbal communication channel used to communicate emotions or other social messages. Despite its importance, this channel is still very little explored in the affective computing field, as much more focus has been placed on visual and aural channels. In this paper, we investigate the possibility to automatically discriminate between different social touch types. We propose five distinct feature sets for describing touch behaviours captured by a grid of pressure sensors. These features are then combined together by using the Random Forest and Boosting methods for categorizing the touch gesture type. The proposed methods were evaluated on both the HAART (7 gesture types over different surfaces) and the CoST (14 gesture types over the same surface) datasets made available by the Social Touch Gesture Challenge 2015. Well above chance level performances were achieved with a 67% accuracy for the HAART and 59% for the CoST testing datasets respectively.

触摸是一种主要的非语言交流渠道，用于交流情感或其他社会信息。尽管它很重要，但在情感计算领域，这个渠道仍然很少被探索，因为更多的焦点放在视觉和听觉渠道上。在本文中，我们研究了自动区分不同社交触摸类型的可能性。我们提出了五个不同的特征集来描述由压力传感器网格捕获的触摸行为。然后使用Random Forest和Boosting方法将这些特征组合在一起，对触摸手势类型进行分类。提出的方法在2015年社交触摸手势挑战提供的HAART(不同表面上的7种手势类型)和CoST(同一表面上的14种手势类型)数据集上进行了评估。HAART和CoST测试数据集的准确率分别为67%和59%，远高于机会水平。

引用次数: 27

Challenges in Deep Learning for Multimodal Applications 深度学习在多模态应用中的挑战

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823313

Sayan Ghosh

This consortium paper outlines a research plan for investigating deep learning techniques as applied to multimodal multi-task learning and multimodal fusion. We discuss our prior research results in this area, and how these results motivate us to explore more in this direction. We also define concrete steps of enquiry we wish to undertake as a short-term goal, and further outline some other challenges of multimodal learning using deep neural networks, such as inter and intra-modality synchronization, robustness to noise in modality data acquisition, and data insufficiency.

这篇联合论文概述了一项研究计划，用于研究应用于多模态多任务学习和多模态融合的深度学习技术。我们讨论了我们之前在这一领域的研究成果，以及这些结果如何激励我们在这一方向上进行更多的探索。我们还定义了我们希望作为短期目标进行的具体调查步骤，并进一步概述了使用深度神经网络进行多模态学习的一些其他挑战，例如模态间和模态内同步，模态数据采集中对噪声的鲁棒性以及数据不足。

引用次数: 7

Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge EmotiW 2015大挑战中基于视频的面部情感识别的电影摄影符号学量化

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830592

Albert C. Cruz

The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.

野外情感识别挑战对当前听觉和视觉情感量化系统提出了重大挑战。为了克服这些挑战，我们研究了基于电影符号学的补充元特征。电影场景的呈现和安排往往是为了放大观众所理解的情感。这种技术在电影工业中被称为mise en scene，涉及对调色板、光源颜色以及场景中演员和物体的安排的严格和有意的控制。为此，提出了两种场景信息提取算法。基于三分法则的运动历史直方图检测沿三分法则指导方针的运动。三分法则色彩布局描述符简洁地描述了三分法则交点处的场景。提出了一个综合的系统来测量表达、情感、语音、语法、语义和基于电影的元信息。在EmotiW 2015挑战的验证集上，所提出的场景特征比LBP-TOP特征具有更高的分类率和ROC面积。完整的系统在测试集上的分类性能比基线算法提高了3.17%。

{"title":"Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge","authors":"Albert C. Cruz","doi":"10.1145/2818346.2830592","DOIUrl":"https://doi.org/10.1145/2818346.2830592","url":null,"abstract":"The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84877247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

NaLMC: A Database on Non-acted and Acted Emotional Sequences in HCI NaLMC: HCI中非行为和行为情绪序列的数据库

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820772

Kim Hartmann, J. Krüger, J. Frommer, A. Wendemuth

We report on the investigation on acted and non-acted emotional speech and the resulting Non-/acted LAST MINUTE corpus (NaLMC) database. The database consists of newly recorded acted emotional speech samples which were designed to allow the direct comparison of acted and non-acted emotional speech. The non-acted samples are taken from the LAST MINUTE corpus (LMC) [1]. Furthermore, emotional labels were added to selected passages of the LMC and a self-rating of the LMC recordings was performed. Although the main objective of the NaLMC database is to allow the comparative analysis of acted and non-acted emotional speech, both audio and video signals were recorded to allow multimodal investigations.

我们报告了对行为和非行为情绪言语的调查，以及由此产生的非/行为最后一分钟语料库(NaLMC)数据库。该数据库由新记录的行为情绪言语样本组成，旨在直接比较行为和非行为情绪言语。未起作用的样本取自LAST MINUTE语料(LMC)[1]。此外，将情绪标签添加到LMC的选定段落中，并对LMC录音进行自评。虽然NaLMC数据库的主要目的是允许对有行为和无行为的情绪言语进行比较分析，但音频和视频信号都被记录下来，以便进行多模态调查。

引用次数: 0

Instantaneous and Robust Eye-Activity Based Task Analysis 基于瞬时鲁棒眼活动的任务分析

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823312

Hoe Kin Wong

Task analysis using eye-activity has previously been used for estimating cognitive load on a per-task basis. However, since pupil size is a continuous physiological signal, eye-based classification accuracy of cognitive load can be improved by considering cognitive load at a higher temporal resolution and incorporating models of the interactions between the task-evoked pupillary response (TEPR) and other pupillary responses such as the Pupillary Light Reflex into the classification model. In this work, methods of using eye-activity as a measure of continuous mental load will be investigated. Subsequently pupil light reflex models will be incorporated into task analysis to investigate the possibility of enhancing the reliability of cognitive load estimation in varied lighting conditions. This will culminate in the development and evaluation of a classification system which measures rapidly changing cognitive load. Task analysis of this calibre will enable interfaces in wearable optical devices to be constantly aware of the user's mental state and control information flow to prevent information overload and interruptions.

使用眼活动的任务分析以前被用于估算每个任务的认知负荷。然而，由于瞳孔大小是一个连续的生理信号，通过在更高的时间分辨率下考虑认知负荷，并将任务诱发瞳孔反应(TEPR)与其他瞳孔反应(如瞳孔光反射)之间的相互作用模型纳入分类模型，可以提高基于眼睛的认知负荷分类的准确性。在这项工作中，将研究使用眼活动作为持续精神负荷测量的方法。随后，瞳孔光反射模型将被纳入任务分析，以研究在不同照明条件下提高认知负荷估计可靠性的可能性。这将最终发展和评估一个分类系统，测量快速变化的认知负荷。这种口径的任务分析将使可穿戴光学设备的接口能够不断地了解用户的精神状态，并控制信息流，以防止信息过载和中断。

引用次数: 1

Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection 基于生理信号流形表示的观众同步检测:在电影亮点检测中的应用

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820773

Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun

Detection of highlights in movies is a challenge for the affective understanding and implicit tagging of films. Under the hypothesis that synchronization of the reaction of spectators indicates such highlights, we define a synchronization measure between spectators that is capable of extracting movie highlights. The intuitive idea of our approach is to define (a) a parameterization of one spectator's physiological data on a manifold; (b) the synchronization measure between spectators as the Kolmogorov-Smirnov distance between local shape distributions of the underlying manifolds. We evaluate our approach using data collected in an experiment where the electro-dermal activity of spectators was recorded during the entire projection of a movie in a cinema. We compare our methodology with baseline synchronization measures, such as correlation, Spearman's rank correlation, mutual information, Kolmogorov-Smirnov distance. Results indicate that the proposed approach allows to accurately distinguish highlight from non-highlight scenes.

电影中的亮点检测是对电影情感理解和隐性标注的挑战。假设观众反应的同步性表明了这些亮点，我们定义了一种能够提取电影亮点的观众之间的同步度量。我们的方法的直观想法是定义(a)一个观众的生理数据在流形上的参数化;(b)观众之间的同步度量作为底层流形局部形状分布之间的Kolmogorov-Smirnov距离。我们使用实验中收集的数据来评估我们的方法，该实验记录了观众在电影放映期间的整个皮肤电活动。我们将我们的方法与基线同步度量进行比较，如相关性、Spearman等级相关性、互信息、Kolmogorov-Smirnov距离。结果表明，该方法可以准确区分高光和非高光场景。

{"title":"Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection","authors":"Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun","doi":"10.1145/2818346.2820773","DOIUrl":"https://doi.org/10.1145/2818346.2820773","url":null,"abstract":"Detection of highlights in movies is a challenge for the affective understanding and implicit tagging of films. Under the hypothesis that synchronization of the reaction of spectators indicates such highlights, we define a synchronization measure between spectators that is capable of extracting movie highlights. The intuitive idea of our approach is to define (a) a parameterization of one spectator's physiological data on a manifold; (b) the synchronization measure between spectators as the Kolmogorov-Smirnov distance between local shape distributions of the underlying manifolds. We evaluate our approach using data collected in an experiment where the electro-dermal activity of spectators was recorded during the entire projection of a movie in a cinema. We compare our methodology with baseline synchronization measures, such as correlation, Spearman's rank correlation, mutual information, Kolmogorov-Smirnov distance. Results indicate that the proposed approach allows to accurately distinguish highlight from non-highlight scenes.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"400 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80275527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits 利用深度传感器分析多模态演示:硬件，软件和工具包

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830605

C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland

Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.

肢体语言在学习过程和交流中起着重要的作用。例如，传播学研究提供的证据表明，数学知识可以体现在教师和学生的手势中。同样，在口头演讲中，演讲者也利用身体姿势和手势来传达思想和重要信息。因此，捕捉和分析非语言行为是多模态学习分析研究的一个重要方面。在传感能力方面，微软Kinect等深度传感器的引入极大地促进了这一领域的研究和发展。然而，硬件和软件功能的快速发展并不总是与文献中报道的扩展特性集同步。例如，尽管Anvil是一个广泛使用的最先进的运动跟踪注释和可视化工具包，但它基于OpenNI的运动记录组件已经过时了。作为我们开发多模态教育评估研究的一部分，我们开始努力开发和标准化算法，用于多模态特征提取和创建自动评分模型。本文概述了教育任务中多模态研究的相关工作，并总结了我们在开发沟通技能评估中使用多模态传感器的工作，重点介绍了深度传感器的使用。具体来说，我们专注于使用微软Kinect进行公共演讲评估的任务。此外，我们引入了一个开源的Python包，用于从Kinect运动数据中计算富有表现力的肢体语言特征，我们希望这将使MLA研究社区受益。

{"title":"Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits","authors":"C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland","doi":"10.1145/2818346.2830605","DOIUrl":"https://doi.org/10.1145/2818346.2830605","url":null,"abstract":"Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83451695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Deception Detection using Real-life Trial Data 使用真实试验数据进行欺骗检测

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820758

Verónica Pérez-Rosas, M. Abouelenien, Rada Mihalcea, Mihai Burzo

Hearings of witnesses and defendants play a crucial role when reaching court trial decisions. Given the high-stake nature of trial outcomes, implementing accurate and effective computational methods to evaluate the honesty of court testimonies can offer valuable support during the decision making process. In this paper, we address the identification of deception in real-life trial data. We introduce a novel dataset consisting of videos collected from public court trials. We explore the use of verbal and non-verbal modalities to build a multimodal deception detection system that aims to discriminate between truthful and deceptive statements provided by defendants and witnesses. We achieve classification accuracies in the range of 60-75% when using a model that extracts and fuses features from the linguistic and gesture modalities. In addition, we present a human deception detection study where we evaluate the human capability of detecting deception in trial hearings. The results show that our system outperforms the human capability of identifying deceit.

证人和被告的听证在法庭作出审判决定时起着至关重要的作用。鉴于审判结果的高风险性质，实施准确有效的计算方法来评估法庭证词的诚实性可以在决策过程中提供有价值的支持。在本文中，我们解决了在现实生活中的试验数据欺骗的识别。我们介绍了一个新的数据集，包括从公开法庭审判中收集的视频。我们探索使用语言和非语言模式来建立一个多模式欺骗检测系统，旨在区分被告和证人提供的真实和欺骗性陈述。当使用从语言和手势模式中提取和融合特征的模型时，我们实现了60-75%的分类精度。此外，我们提出了一项人类欺骗检测研究，我们评估了人类在审判听证会上检测欺骗的能力。结果表明，我们的系统优于人类识别欺骗的能力。

引用次数: 167

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀