首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild 野外多模态情感检测的准确性与可用性启发式
Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute
This paper discusses multimodal affect detection from a fusion of facial expressions and interaction features derived from students' interactions with an educational game in the noisy real-world context of a computer-enabled classroom. Log data of students' interactions with the game and face videos from 133 students were recorded in a computer-enabled classroom over a two day period. Human observers live annotated learning-centered affective states such as engagement, confusion, and frustration. The face-only detectors were more accurate than interaction-only detectors. Multimodal affect detectors did not show any substantial improvement in accuracy over the face-only detectors. However, the face-only detectors were only applicable to 65% of the cases due to face registration errors caused by excessive movement, occlusion, poor lighting, and other factors. Multimodal fusion techniques were able to improve the applicability of detectors to 98% of cases without sacrificing classification accuracy. Balancing the accuracy vs. applicability tradeoff appears to be an important feature of multimodal affect detection.
本文讨论了基于面部表情融合的多模态情感检测和交互特征,这些特征来源于学生在嘈杂的现实世界背景下的计算机教室中与教育游戏的互动。在为期两天的电脑教室里,研究人员记录了133名学生与游戏互动的日志数据和面部视频。人类观察者生活在以学习为中心的情感状态中,比如投入、困惑和沮丧。只看脸的探测器比只看互动的探测器更准确。多模态情感检测器在准确性上没有显示出任何实质性的提高。然而,由于过度运动、遮挡、光线不足等因素导致的人脸配准错误,纯人脸检测器仅适用于65%的病例。多模态融合技术能够在不牺牲分类精度的情况下将检测器的适用性提高到98%。平衡准确性与适用性的权衡似乎是多模态影响检测的一个重要特征。
{"title":"Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild","authors":"Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute","doi":"10.1145/2818346.2820739","DOIUrl":"https://doi.org/10.1145/2818346.2820739","url":null,"abstract":"This paper discusses multimodal affect detection from a fusion of facial expressions and interaction features derived from students' interactions with an educational game in the noisy real-world context of a computer-enabled classroom. Log data of students' interactions with the game and face videos from 133 students were recorded in a computer-enabled classroom over a two day period. Human observers live annotated learning-centered affective states such as engagement, confusion, and frustration. The face-only detectors were more accurate than interaction-only detectors. Multimodal affect detectors did not show any substantial improvement in accuracy over the face-only detectors. However, the face-only detectors were only applicable to 65% of the cases due to face registration errors caused by excessive movement, occlusion, poor lighting, and other factors. Multimodal fusion techniques were able to improve the applicability of detectors to 98% of cases without sacrificing classification accuracy. Balancing the accuracy vs. applicability tradeoff appears to be an important feature of multimodal affect detection.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88503240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children 更好地理解参与与儿童的多方口头互动
S. Moubayed, J. Lehman
A system's ability to understand and model a human's engagement during an interactive task is important for both adapting its behavior to the moment and achieving a coherent interaction over time. Standard practice for creating such a capability requires uncovering and modeling the multimodal cues that predict engagement in a given task environment. The first step in this methodology is to have human coders produce "gold standard" judgments of sample behavior. In this paper we report results from applying this first step to the complex and varied behavior of children playing a fast-paced, speech-controlled, side-scrolling game called Mole Madness. We introduce a concrete metric for engagement-willingness to continue the interaction--that leads to better inter-coder judgments for children playing in pairs, explore how coders perceive the relative contribution of audio and visual cues, and describe engagement trends and patterns in our population. We also examine how the measures change when the same children play Mole Madness with a robot instead of a peer. We conclude by discussing the implications of the differences within and across play conditions for the automatic estimation of engagement and the extension of our autonomous robot player into a "buddy" that can individualize interaction for each player and game.
在交互任务中,系统理解和模拟人类参与的能力对于使其行为适应当前情况和随着时间的推移实现连贯的交互都很重要。创建这种能力的标准实践需要揭示和建模在给定任务环境中预测参与的多模态线索。这种方法的第一步是让人类编码员对样本行为产生“黄金标准”判断。在本文中,我们报告了将这第一步应用于儿童玩一款名为《鼹鼠疯狂》的快节奏、语音控制、横向卷轴游戏的复杂多样行为的结果。我们引入了一个具体的参与度指标——继续互动的意愿——这可以让编码员更好地判断成对玩的孩子,探索编码员如何感知音频和视觉线索的相对贡献,并描述我们人群中的参与度趋势和模式。我们还研究了当同样的孩子与机器人而不是同伴玩《鼹鼠疯狂》时,测量结果是如何变化的。最后,我们讨论了游戏条件内部和游戏条件之间的差异对自动评估用户粘性的影响,并将自主机器人玩家扩展为能够为每个玩家和游戏个性化互动的“伙伴”。
{"title":"Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children","authors":"S. Moubayed, J. Lehman","doi":"10.1145/2818346.2820733","DOIUrl":"https://doi.org/10.1145/2818346.2820733","url":null,"abstract":"A system's ability to understand and model a human's engagement during an interactive task is important for both adapting its behavior to the moment and achieving a coherent interaction over time. Standard practice for creating such a capability requires uncovering and modeling the multimodal cues that predict engagement in a given task environment. The first step in this methodology is to have human coders produce \"gold standard\" judgments of sample behavior. In this paper we report results from applying this first step to the complex and varied behavior of children playing a fast-paced, speech-controlled, side-scrolling game called Mole Madness. We introduce a concrete metric for engagement-willingness to continue the interaction--that leads to better inter-coder judgments for children playing in pairs, explore how coders perceive the relative contribution of audio and visual cues, and describe engagement trends and patterns in our population. We also examine how the measures change when the same children play Mole Madness with a robot instead of a peer. We conclude by discussing the implications of the differences within and across play conditions for the automatic estimation of engagement and the extension of our autonomous robot player into a \"buddy\" that can individualize interaction for each player and game.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"143 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75350975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings 基于呼吸和凝视的多模态融合预测多方会议下一位发言者
Ryo Ishii, Shiro Kumano, K. Otsuka
Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.
最近,利用非语言行为来预测轮流情况的技术受到了很多关注,比如谁将是下一个发言者,以及在多人会议中下一个发言的时机。人们早就知道,凝视是一种身体行为,在人与人之间的说话转换中起着重要作用。最近,一系列研究集中在轮流和呼吸之间的关系上,呼吸是一种生物信号,传达了开始说话的意图或初步行动的信息。已经证明,在多方会议中,呼吸和凝视行为分别具有预测下一位发言者和下一个发言时间的潜力。本文采用多模态融合的方法,将呼吸行为和凝视行为这两种质量完全不同的模态融合在一起,实现了一种基于呼吸行为和凝视行为的多模态融合模型,用于预测多方会议中下一个说话人。该模型有两步处理。第一个是预测是否发生了轮流或轮流;二是在轮流发言时预测下一位发言者。我们构建了呼吸或凝视行为的预测模型,并将呼吸和凝视行为作为特征,并比较了它们的性能。结果表明,结合呼吸和凝视行为的模型比只结合呼吸和凝视行为的模型效果更好。研究表明,利用呼吸和注视行为的多模态融合可以有效地预测多方会议中的下一位发言者。研究发现,凝视行为比呼吸更有助于预测轮位保持/轮位轮换,而呼吸作用更有助于预测下一个说话人的轮位轮换。
{"title":"Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings","authors":"Ryo Ishii, Shiro Kumano, K. Otsuka","doi":"10.1145/2818346.2820755","DOIUrl":"https://doi.org/10.1145/2818346.2820755","url":null,"abstract":"Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91535789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
I Would Hire You in a Minute: Thin Slices of Nonverbal Behavior in Job Interviews 我会在一分钟内雇用你:工作面试中的非语言行为切片
L. Nguyen, D. Gática-Pérez
In everyday life, judgments people make about others are based on brief excerpts of interactions, known as thin slices. Inferences stemming from such minimal information can be quite accurate, and nonverbal behavior plays an important role in the impression formation. Because protagonists are strangers, employment interviews are a case where both nonverbal behavior and thin slices can be predictive of outcomes. In this work, we analyze the predictive validity of thin slices of real job interviews, where slices are defined by the sequence of questions in a structured interview format. We approach this problem from an audio-visual, dyadic, and nonverbal perspective, where sensing, cue extraction, and inference are automated. Our study shows that although nonverbal behavioral cues extracted from thin slices were not as predictive as when extracted from the full interaction, they were still predictive of hirability impressions with $R^2$ values up to $0.34$, which was comparable to the predictive validity of human observers on thin slices. Applicant audio cues were found to yield the most accurate results.
在日常生活中,人们对他人的判断是基于互动的简短摘录,即所谓的“薄片”。从这些最少的信息中得出的推论可以相当准确,非语言行为在印象形成中起着重要作用。因为主角都是陌生人,所以在求职面试中,非语言行为和薄片都可以预测结果。在这项工作中,我们分析了真实工作面试的薄片的预测有效性,其中薄片是由结构化面试格式中的问题序列定义的。我们从视听、二元和非语言的角度来解决这个问题,其中感知、线索提取和推理是自动化的。我们的研究表明,尽管从薄片中提取的非语言行为线索不像从完整的互动中提取的那样具有预测性,但它们仍然可以预测可预测性印象,其R^2$值高达0.34$,这与人类观察者在薄片上的预测有效性相当。申请人的音频提示被发现产生最准确的结果。
{"title":"I Would Hire You in a Minute: Thin Slices of Nonverbal Behavior in Job Interviews","authors":"L. Nguyen, D. Gática-Pérez","doi":"10.1145/2818346.2820760","DOIUrl":"https://doi.org/10.1145/2818346.2820760","url":null,"abstract":"In everyday life, judgments people make about others are based on brief excerpts of interactions, known as thin slices. Inferences stemming from such minimal information can be quite accurate, and nonverbal behavior plays an important role in the impression formation. Because protagonists are strangers, employment interviews are a case where both nonverbal behavior and thin slices can be predictive of outcomes. In this work, we analyze the predictive validity of thin slices of real job interviews, where slices are defined by the sequence of questions in a structured interview format. We approach this problem from an audio-visual, dyadic, and nonverbal perspective, where sensing, cue extraction, and inference are automated. Our study shows that although nonverbal behavioral cues extracted from thin slices were not as predictive as when extracted from the full interaction, they were still predictive of hirability impressions with $R^2$ values up to $0.34$, which was comparable to the predictive validity of human observers on thin slices. Applicant audio cues were found to yield the most accurate results.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90852149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge EmotiW 2015大挑战中基于视频的面部情感识别的电影摄影符号学量化
Albert C. Cruz
The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.
野外情感识别挑战对当前听觉和视觉情感量化系统提出了重大挑战。为了克服这些挑战,我们研究了基于电影符号学的补充元特征。电影场景的呈现和安排往往是为了放大观众所理解的情感。这种技术在电影工业中被称为mise en scene,涉及对调色板、光源颜色以及场景中演员和物体的安排的严格和有意的控制。为此,提出了两种场景信息提取算法。基于三分法则的运动历史直方图检测沿三分法则指导方针的运动。三分法则色彩布局描述符简洁地描述了三分法则交点处的场景。提出了一个综合的系统来测量表达、情感、语音、语法、语义和基于电影的元信息。在EmotiW 2015挑战的验证集上,所提出的场景特征比LBP-TOP特征具有更高的分类率和ROC面积。完整的系统在测试集上的分类性能比基线算法提高了3.17%。
{"title":"Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge","authors":"Albert C. Cruz","doi":"10.1145/2818346.2830592","DOIUrl":"https://doi.org/10.1145/2818346.2830592","url":null,"abstract":"The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84877247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
NaLMC: A Database on Non-acted and Acted Emotional Sequences in HCI NaLMC: HCI中非行为和行为情绪序列的数据库
Kim Hartmann, J. Krüger, J. Frommer, A. Wendemuth
We report on the investigation on acted and non-acted emotional speech and the resulting Non-/acted LAST MINUTE corpus (NaLMC) database. The database consists of newly recorded acted emotional speech samples which were designed to allow the direct comparison of acted and non-acted emotional speech. The non-acted samples are taken from the LAST MINUTE corpus (LMC) [1]. Furthermore, emotional labels were added to selected passages of the LMC and a self-rating of the LMC recordings was performed. Although the main objective of the NaLMC database is to allow the comparative analysis of acted and non-acted emotional speech, both audio and video signals were recorded to allow multimodal investigations.
我们报告了对行为和非行为情绪言语的调查,以及由此产生的非/行为最后一分钟语料库(NaLMC)数据库。该数据库由新记录的行为情绪言语样本组成,旨在直接比较行为和非行为情绪言语。未起作用的样本取自LAST MINUTE语料(LMC)[1]。此外,将情绪标签添加到LMC的选定段落中,并对LMC录音进行自评。虽然NaLMC数据库的主要目的是允许对有行为和无行为的情绪言语进行比较分析,但音频和视频信号都被记录下来,以便进行多模态调查。
{"title":"NaLMC: A Database on Non-acted and Acted Emotional Sequences in HCI","authors":"Kim Hartmann, J. Krüger, J. Frommer, A. Wendemuth","doi":"10.1145/2818346.2820772","DOIUrl":"https://doi.org/10.1145/2818346.2820772","url":null,"abstract":"We report on the investigation on acted and non-acted emotional speech and the resulting Non-/acted LAST MINUTE corpus (NaLMC) database. The database consists of newly recorded acted emotional speech samples which were designed to allow the direct comparison of acted and non-acted emotional speech. The non-acted samples are taken from the LAST MINUTE corpus (LMC) [1]. Furthermore, emotional labels were added to selected passages of the LMC and a self-rating of the LMC recordings was performed. Although the main objective of the NaLMC database is to allow the comparative analysis of acted and non-acted emotional speech, both audio and video signals were recorded to allow multimodal investigations.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87975619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instantaneous and Robust Eye-Activity Based Task Analysis 基于瞬时鲁棒眼活动的任务分析
Hoe Kin Wong
Task analysis using eye-activity has previously been used for estimating cognitive load on a per-task basis. However, since pupil size is a continuous physiological signal, eye-based classification accuracy of cognitive load can be improved by considering cognitive load at a higher temporal resolution and incorporating models of the interactions between the task-evoked pupillary response (TEPR) and other pupillary responses such as the Pupillary Light Reflex into the classification model. In this work, methods of using eye-activity as a measure of continuous mental load will be investigated. Subsequently pupil light reflex models will be incorporated into task analysis to investigate the possibility of enhancing the reliability of cognitive load estimation in varied lighting conditions. This will culminate in the development and evaluation of a classification system which measures rapidly changing cognitive load. Task analysis of this calibre will enable interfaces in wearable optical devices to be constantly aware of the user's mental state and control information flow to prevent information overload and interruptions.
使用眼活动的任务分析以前被用于估算每个任务的认知负荷。然而,由于瞳孔大小是一个连续的生理信号,通过在更高的时间分辨率下考虑认知负荷,并将任务诱发瞳孔反应(TEPR)与其他瞳孔反应(如瞳孔光反射)之间的相互作用模型纳入分类模型,可以提高基于眼睛的认知负荷分类的准确性。在这项工作中,将研究使用眼活动作为持续精神负荷测量的方法。随后,瞳孔光反射模型将被纳入任务分析,以研究在不同照明条件下提高认知负荷估计可靠性的可能性。这将最终发展和评估一个分类系统,测量快速变化的认知负荷。这种口径的任务分析将使可穿戴光学设备的接口能够不断地了解用户的精神状态,并控制信息流,以防止信息过载和中断。
{"title":"Instantaneous and Robust Eye-Activity Based Task Analysis","authors":"Hoe Kin Wong","doi":"10.1145/2818346.2823312","DOIUrl":"https://doi.org/10.1145/2818346.2823312","url":null,"abstract":"Task analysis using eye-activity has previously been used for estimating cognitive load on a per-task basis. However, since pupil size is a continuous physiological signal, eye-based classification accuracy of cognitive load can be improved by considering cognitive load at a higher temporal resolution and incorporating models of the interactions between the task-evoked pupillary response (TEPR) and other pupillary responses such as the Pupillary Light Reflex into the classification model. In this work, methods of using eye-activity as a measure of continuous mental load will be investigated. Subsequently pupil light reflex models will be incorporated into task analysis to investigate the possibility of enhancing the reliability of cognitive load estimation in varied lighting conditions. This will culminate in the development and evaluation of a classification system which measures rapidly changing cognitive load. Task analysis of this calibre will enable interfaces in wearable optical devices to be constantly aware of the user's mental state and control information flow to prevent information overload and interruptions.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"44 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91497943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection 基于生理信号流形表示的观众同步检测:在电影亮点检测中的应用
Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun
Detection of highlights in movies is a challenge for the affective understanding and implicit tagging of films. Under the hypothesis that synchronization of the reaction of spectators indicates such highlights, we define a synchronization measure between spectators that is capable of extracting movie highlights. The intuitive idea of our approach is to define (a) a parameterization of one spectator's physiological data on a manifold; (b) the synchronization measure between spectators as the Kolmogorov-Smirnov distance between local shape distributions of the underlying manifolds. We evaluate our approach using data collected in an experiment where the electro-dermal activity of spectators was recorded during the entire projection of a movie in a cinema. We compare our methodology with baseline synchronization measures, such as correlation, Spearman's rank correlation, mutual information, Kolmogorov-Smirnov distance. Results indicate that the proposed approach allows to accurately distinguish highlight from non-highlight scenes.
电影中的亮点检测是对电影情感理解和隐性标注的挑战。假设观众反应的同步性表明了这些亮点,我们定义了一种能够提取电影亮点的观众之间的同步度量。我们的方法的直观想法是定义(a)一个观众的生理数据在流形上的参数化;(b)观众之间的同步度量作为底层流形局部形状分布之间的Kolmogorov-Smirnov距离。我们使用实验中收集的数据来评估我们的方法,该实验记录了观众在电影放映期间的整个皮肤电活动。我们将我们的方法与基线同步度量进行比较,如相关性、Spearman等级相关性、互信息、Kolmogorov-Smirnov距离。结果表明,该方法可以准确区分高光和非高光场景。
{"title":"Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection","authors":"Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun","doi":"10.1145/2818346.2820773","DOIUrl":"https://doi.org/10.1145/2818346.2820773","url":null,"abstract":"Detection of highlights in movies is a challenge for the affective understanding and implicit tagging of films. Under the hypothesis that synchronization of the reaction of spectators indicates such highlights, we define a synchronization measure between spectators that is capable of extracting movie highlights. The intuitive idea of our approach is to define (a) a parameterization of one spectator's physiological data on a manifold; (b) the synchronization measure between spectators as the Kolmogorov-Smirnov distance between local shape distributions of the underlying manifolds. We evaluate our approach using data collected in an experiment where the electro-dermal activity of spectators was recorded during the entire projection of a movie in a cinema. We compare our methodology with baseline synchronization measures, such as correlation, Spearman's rank correlation, mutual information, Kolmogorov-Smirnov distance. Results indicate that the proposed approach allows to accurately distinguish highlight from non-highlight scenes.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"400 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80275527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits 利用深度传感器分析多模态演示:硬件,软件和工具包
C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland
Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.
肢体语言在学习过程和交流中起着重要的作用。例如,传播学研究提供的证据表明,数学知识可以体现在教师和学生的手势中。同样,在口头演讲中,演讲者也利用身体姿势和手势来传达思想和重要信息。因此,捕捉和分析非语言行为是多模态学习分析研究的一个重要方面。在传感能力方面,微软Kinect等深度传感器的引入极大地促进了这一领域的研究和发展。然而,硬件和软件功能的快速发展并不总是与文献中报道的扩展特性集同步。例如,尽管Anvil是一个广泛使用的最先进的运动跟踪注释和可视化工具包,但它基于OpenNI的运动记录组件已经过时了。作为我们开发多模态教育评估研究的一部分,我们开始努力开发和标准化算法,用于多模态特征提取和创建自动评分模型。本文概述了教育任务中多模态研究的相关工作,并总结了我们在开发沟通技能评估中使用多模态传感器的工作,重点介绍了深度传感器的使用。具体来说,我们专注于使用微软Kinect进行公共演讲评估的任务。此外,我们引入了一个开源的Python包,用于从Kinect运动数据中计算富有表现力的肢体语言特征,我们希望这将使MLA研究社区受益。
{"title":"Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits","authors":"C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland","doi":"10.1145/2818346.2830605","DOIUrl":"https://doi.org/10.1145/2818346.2830605","url":null,"abstract":"Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83451695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Deception Detection using Real-life Trial Data 使用真实试验数据进行欺骗检测
Verónica Pérez-Rosas, M. Abouelenien, Rada Mihalcea, Mihai Burzo
Hearings of witnesses and defendants play a crucial role when reaching court trial decisions. Given the high-stake nature of trial outcomes, implementing accurate and effective computational methods to evaluate the honesty of court testimonies can offer valuable support during the decision making process. In this paper, we address the identification of deception in real-life trial data. We introduce a novel dataset consisting of videos collected from public court trials. We explore the use of verbal and non-verbal modalities to build a multimodal deception detection system that aims to discriminate between truthful and deceptive statements provided by defendants and witnesses. We achieve classification accuracies in the range of 60-75% when using a model that extracts and fuses features from the linguistic and gesture modalities. In addition, we present a human deception detection study where we evaluate the human capability of detecting deception in trial hearings. The results show that our system outperforms the human capability of identifying deceit.
证人和被告的听证在法庭作出审判决定时起着至关重要的作用。鉴于审判结果的高风险性质,实施准确有效的计算方法来评估法庭证词的诚实性可以在决策过程中提供有价值的支持。在本文中,我们解决了在现实生活中的试验数据欺骗的识别。我们介绍了一个新的数据集,包括从公开法庭审判中收集的视频。我们探索使用语言和非语言模式来建立一个多模式欺骗检测系统,旨在区分被告和证人提供的真实和欺骗性陈述。当使用从语言和手势模式中提取和融合特征的模型时,我们实现了60-75%的分类精度。此外,我们提出了一项人类欺骗检测研究,我们评估了人类在审判听证会上检测欺骗的能力。结果表明,我们的系统优于人类识别欺骗的能力。
{"title":"Deception Detection using Real-life Trial Data","authors":"Verónica Pérez-Rosas, M. Abouelenien, Rada Mihalcea, Mihai Burzo","doi":"10.1145/2818346.2820758","DOIUrl":"https://doi.org/10.1145/2818346.2820758","url":null,"abstract":"Hearings of witnesses and defendants play a crucial role when reaching court trial decisions. Given the high-stake nature of trial outcomes, implementing accurate and effective computational methods to evaluate the honesty of court testimonies can offer valuable support during the decision making process. In this paper, we address the identification of deception in real-life trial data. We introduce a novel dataset consisting of videos collected from public court trials. We explore the use of verbal and non-verbal modalities to build a multimodal deception detection system that aims to discriminate between truthful and deceptive statements provided by defendants and witnesses. We achieve classification accuracies in the range of 60-75% when using a model that extracts and fuses features from the linguistic and gesture modalities. In addition, we present a human deception detection study where we evaluate the human capability of detecting deception in trial hearings. The results show that our system outperforms the human capability of identifying deceit.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83785615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1