首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
Interactive Web-based Image Sonification for the Blind 盲人交互式网络图像超声
T. Wörtwein, Boris Schauerte, Karin Müller, R. Stiefelhagen
In this demonstration, we show a web-based sonification platform that allows blind users to interactively experience various information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays. This way, blind users can easily access information such as, for example, maps or graphs. Our current prototype provides various sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.
在这个演示中,我们展示了一个基于web的声音平台,它允许盲人用户使用两种当今广泛使用的技术交互体验各种信息:实现高级JavaScript api的现代web浏览器和触摸感应显示器。这样,盲人用户可以很容易地访问信息,例如地图或图表。我们目前的原型提供了各种声音,可以根据图像类型和用户偏好进行切换。这个原型可以在Chrome和Firefox上运行,可以在个人电脑、智能手机和平板电脑上运行。
{"title":"Interactive Web-based Image Sonification for the Blind","authors":"T. Wörtwein, Boris Schauerte, Karin Müller, R. Stiefelhagen","doi":"10.1145/2818346.2823298","DOIUrl":"https://doi.org/10.1145/2818346.2823298","url":null,"abstract":"In this demonstration, we show a web-based sonification platform that allows blind users to interactively experience various information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays. This way, blind users can easily access information such as, for example, maps or graphs. Our current prototype provides various sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82066242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Presentation Trainer, your Public Speaking Multimodal Coach 演讲培训师,你的公共演讲多模式教练
J. Schneider, D. Börner, P. V. Rosmalen, M. Specht
The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on this performance the Presentation Trainer selects the type of intervention that will be presented as feedback to the user. This feedback mechanism has been designed taking in consideration the results from previous studies that show how difficult it is for learners to perceive and correctly interpret real-time feedback while practicing their speeches. In this paper we present the user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch, showing that the feedback provided by the Presentation Trainer has a significant influence on learning.
Presentation Trainer是一个多模式的工具,通过给使用者非语言交流的不同方面的实时反馈,来支持公众演讲技巧的练习。它会跟踪用户的声音和身体来解读她当前的表现。基于此表现,演示培训师选择将作为反馈呈现给用户的干预类型。这种反馈机制的设计考虑了之前的研究结果,这些研究表明学习者在练习演讲时感知和正确解读实时反馈是多么困难。在本文中,我们展示了使用Presentation Trainer进行电梯游说练习的参与者的用户体验评估,表明Presentation Trainer提供的反馈对学习有显著的影响。
{"title":"Presentation Trainer, your Public Speaking Multimodal Coach","authors":"J. Schneider, D. Börner, P. V. Rosmalen, M. Specht","doi":"10.1145/2818346.2830603","DOIUrl":"https://doi.org/10.1145/2818346.2830603","url":null,"abstract":"The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on this performance the Presentation Trainer selects the type of intervention that will be presented as feedback to the user. This feedback mechanism has been designed taking in consideration the results from previous studies that show how difficult it is for learners to perceive and correctly interpret real-time feedback while practicing their speeches. In this paper we present the user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch, showing that the feedback provided by the Presentation Trainer has a significant influence on learning.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82342851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions 凝视+手势:富有表现力的、精确的、有针对性的自由空间互动
Ishan Chatterjee, R. Xiao, Chris Harrison
Humans rely on eye gaze and hand manipulations extensively in their everyday activities. Most often, users gaze at an object to perceive it and then use their hands to manipulate it. We propose applying a multimodal, gaze plus free-space gesture approach to enable rapid, precise and expressive touch-free interactions. We show the input methods are highly complementary, mitigating issues of imprecision and limited expressivity in gaze-alone systems, and issues of targeting speed in gesture-alone systems. We extend an existing interaction taxonomy that naturally divides the gaze+gesture interaction space, which we then populate with a series of example interaction techniques to illustrate the character and utility of each method. We contextualize these interaction techniques in three example scenarios. In our user study, we pit our approach against five contemporary approaches; results show that gaze+gesture can outperform systems using gaze or gesture alone, and in general, approach the performance of "gold standard" input systems, such as the mouse and trackpad.
人类在日常活动中广泛依赖于眼睛注视和手部操作。大多数情况下,用户盯着一个物体来感知它,然后用手来操纵它。我们建议应用多模态、凝视加自由空间手势方法来实现快速、精确和富有表现力的无触摸交互。我们展示了输入法是高度互补的,减轻了不精确的问题和有限的表现力在单一的注视系统,和目标速度的问题在单一的手势系统。我们扩展了现有的交互分类法,该分类法自然地划分了凝视+手势交互空间,然后我们用一系列示例交互技术填充该分类法,以说明每种方法的特点和实用性。我们将这些交互技术置于三个示例场景中。在我们的用户研究中,我们将我们的方法与五种当代方法进行比较;结果表明,凝视+手势可以优于单独使用凝视或手势的系统,并且通常接近“黄金标准”输入系统的性能,例如鼠标和触控板。
{"title":"Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions","authors":"Ishan Chatterjee, R. Xiao, Chris Harrison","doi":"10.1145/2818346.2820752","DOIUrl":"https://doi.org/10.1145/2818346.2820752","url":null,"abstract":"Humans rely on eye gaze and hand manipulations extensively in their everyday activities. Most often, users gaze at an object to perceive it and then use their hands to manipulate it. We propose applying a multimodal, gaze plus free-space gesture approach to enable rapid, precise and expressive touch-free interactions. We show the input methods are highly complementary, mitigating issues of imprecision and limited expressivity in gaze-alone systems, and issues of targeting speed in gesture-alone systems. We extend an existing interaction taxonomy that naturally divides the gaze+gesture interaction space, which we then populate with a series of example interaction techniques to illustrate the character and utility of each method. We contextualize these interaction techniques in three example scenarios. In our user study, we pit our approach against five contemporary approaches; results show that gaze+gesture can outperform systems using gaze or gesture alone, and in general, approach the performance of \"gold standard\" input systems, such as the mouse and trackpad.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82494451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
A Visual Analytics Approach to Finding Factors Improving Automatic Speaker Identifications 寻找提高自动说话人识别因素的可视化分析方法
P. Bruneau, M. Stefas, H. Bredin, Johann Poignant, T. Tamisier, C. Barras
Classification quality criteria such as precision, recall, and F-measure are generally the basis for evaluating contributions in automatic speaker recognition. Specifically, comparisons are carried out mostly via mean values estimated on a set of media. Whilst this approach is relevant to assess improvement w.r.t. the state-of-the-art, or ranking participants in the context of an automatic annotation challenge, it gives little insight to system designers in terms of cues for improving algorithms, hypothesis formulation, and evidence display. This paper presents a design study of a visual and interactive approach to analyze errors made by automatic annotation algorithms. A timeline-based tool emerged from prior steps of this study. A critical review, driven by user interviews, exposes caveats and refines user objectives. The next step of the study is then initiated by sketching designs combining elements of the current prototype to principles newly identified as relevant.
分类质量标准,如精度、召回率和F-measure通常是评估自动说话人识别贡献的基础。具体来说,比较主要是通过在一组媒体上估计的平均值来进行的。虽然这种方法与评估最先进技术的改进或在自动注释挑战的上下文中对参与者进行排名相关,但它在改进算法、假设公式和证据显示的线索方面给系统设计者提供的见解很少。本文提出了一种可视化和交互式的方法来分析自动标注算法所产生的错误。基于时间轴的工具从本研究的先前步骤中出现。由用户访谈驱动的批判性审查,揭示了警告并改进了用户目标。研究的下一步是通过将当前原型的元素与新确定的相关原则结合起来进行草图设计。
{"title":"A Visual Analytics Approach to Finding Factors Improving Automatic Speaker Identifications","authors":"P. Bruneau, M. Stefas, H. Bredin, Johann Poignant, T. Tamisier, C. Barras","doi":"10.1145/2818346.2820769","DOIUrl":"https://doi.org/10.1145/2818346.2820769","url":null,"abstract":"Classification quality criteria such as precision, recall, and F-measure are generally the basis for evaluating contributions in automatic speaker recognition. Specifically, comparisons are carried out mostly via mean values estimated on a set of media. Whilst this approach is relevant to assess improvement w.r.t. the state-of-the-art, or ranking participants in the context of an automatic annotation challenge, it gives little insight to system designers in terms of cues for improving algorithms, hypothesis formulation, and evidence display. This paper presents a design study of a visual and interactive approach to analyze errors made by automatic annotation algorithms. A timeline-based tool emerged from prior steps of this study. A critical review, driven by user interviews, exposes caveats and refines user objectives. The next step of the study is then initiated by sketching designs combining elements of the current prototype to principles newly identified as relevant.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82363880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Digital Flavor: Towards Digitally Simulating Virtual Flavors 数字风味:迈向数字模拟虚拟风味
Nimesha Ranasinghe, Gajan Suthokumar, Kuan-Yi Lee, E. Do
Flavor is often a pleasurable sensory perception we experience daily while eating and drinking. However, the sensation of flavor is rarely considered in the age of digital communication mainly due to the unavailability of flavors as a digitally controllable media. This paper introduces a digital instrument (Digital Flavor Synthesizing device), which actuates taste (electrical and thermal stimulation) and smell sensations (controlled scent emitting) together to simulate different flavors digitally. A preliminary user experiment is conducted to study the effectiveness of this method with predefined five different flavor stimuli. Experimental results show that the users were effectively able to identify different flavors such as minty, spicy, and lemony. Moreover, we outline several challenges ahead along with future possibilities of this technology. In summary, our work demonstrates a novel controllable instrument for flavor simulation, which will be valuable in multimodal interactive systems for rendering virtual flavors digitally.
味道通常是我们每天吃喝时体验到的一种愉悦的感官感受。然而,在数字通信时代,味道的感觉很少被考虑,这主要是因为味道作为一种数字可控的媒介是不可用的。本文介绍了一种数字式仪器(数字式风味合成装置),它能同时驱动味觉(电和热刺激)和嗅觉(受控的香味释放),以数字方式模拟不同的风味。通过初步的用户实验,研究了该方法在预先设定的五种不同风味刺激下的有效性。实验结果表明,用户能够有效地识别薄荷、辛辣和柠檬等不同的味道。此外,我们还概述了该技术未来可能面临的几个挑战。总之,我们的工作展示了一种新的可控制的风味模拟仪器,这将在多模态交互系统中具有价值,可以数字化地呈现虚拟风味。
{"title":"Digital Flavor: Towards Digitally Simulating Virtual Flavors","authors":"Nimesha Ranasinghe, Gajan Suthokumar, Kuan-Yi Lee, E. Do","doi":"10.1145/2818346.2820761","DOIUrl":"https://doi.org/10.1145/2818346.2820761","url":null,"abstract":"Flavor is often a pleasurable sensory perception we experience daily while eating and drinking. However, the sensation of flavor is rarely considered in the age of digital communication mainly due to the unavailability of flavors as a digitally controllable media. This paper introduces a digital instrument (Digital Flavor Synthesizing device), which actuates taste (electrical and thermal stimulation) and smell sensations (controlled scent emitting) together to simulate different flavors digitally. A preliminary user experiment is conducted to study the effectiveness of this method with predefined five different flavor stimuli. Experimental results show that the users were effectively able to identify different flavors such as minty, spicy, and lemony. Moreover, we outline several challenges ahead along with future possibilities of this technology. In summary, our work demonstrates a novel controllable instrument for flavor simulation, which will be valuable in multimodal interactive systems for rendering virtual flavors digitally.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89518836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Exploring Behavior Representation for Learning Analytics 探索学习分析的行为表示
M. Worsley, Stefan Scherer, Louis-Philippe Morency, Paulo Blikstein
Multimodal analysis has long been an integral part of studying learning. Historically multimodal analyses of learning have been extremely laborious and time intensive. However, researchers have recently been exploring ways to use multimodal computational analysis in the service of studying how people learn in complex learning environments. In an effort to advance this research agenda, we present a comparative analysis of four different data segmentation techniques. In particular, we propose affect- and pose-based data segmentation, as alternatives to human-based segmentation, and fixed-window segmentation. In a study of ten dyads working on an open-ended engineering design task, we find that affect- and pose-based segmentation are more effective, than traditional approaches, for drawing correlations between learning-relevant constructs, and multimodal behaviors. We also find that pose-based segmentation outperforms the two more traditional segmentation strategies for predicting student success on the hands-on task. In this paper we discuss the algorithms used, our results, and the implications that this work may have in non-education-related contexts.
多模态分析一直是研究学习的重要组成部分。从历史上看,学习的多模态分析是非常费力和耗时的。然而,研究人员最近一直在探索如何使用多模态计算分析来研究人们如何在复杂的学习环境中学习。为了推进这一研究议程,我们对四种不同的数据分割技术进行了比较分析。特别是,我们提出了基于情感和姿态的数据分割,作为基于人的分割和固定窗口分割的替代方案。在一项针对开放式工程设计任务的十个二人组的研究中,我们发现基于情感和姿势的分割在绘制学习相关构念和多模态行为之间的相关性方面比传统方法更有效。我们还发现,基于姿势的分割在预测学生在实践任务中的成功方面优于两种更传统的分割策略。在本文中,我们讨论了所使用的算法,我们的结果,以及这项工作在非教育相关背景下可能产生的影响。
{"title":"Exploring Behavior Representation for Learning Analytics","authors":"M. Worsley, Stefan Scherer, Louis-Philippe Morency, Paulo Blikstein","doi":"10.1145/2818346.2820737","DOIUrl":"https://doi.org/10.1145/2818346.2820737","url":null,"abstract":"Multimodal analysis has long been an integral part of studying learning. Historically multimodal analyses of learning have been extremely laborious and time intensive. However, researchers have recently been exploring ways to use multimodal computational analysis in the service of studying how people learn in complex learning environments. In an effort to advance this research agenda, we present a comparative analysis of four different data segmentation techniques. In particular, we propose affect- and pose-based data segmentation, as alternatives to human-based segmentation, and fixed-window segmentation. In a study of ten dyads working on an open-ended engineering design task, we find that affect- and pose-based segmentation are more effective, than traditional approaches, for drawing correlations between learning-relevant constructs, and multimodal behaviors. We also find that pose-based segmentation outperforms the two more traditional segmentation strategies for predicting student success on the hands-on task. In this paper we discuss the algorithms used, our results, and the implications that this work may have in non-education-related contexts.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88539425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Adjacent Vehicle Collision Warning System using Image Sensor and Inertial Measurement Unit 基于图像传感器和惯性测量单元的相邻车辆碰撞预警系统
Asif Iqbal, C. Busso, N. Gans
Advanced driver assistance systems are the newest addition to vehicular technology. Such systems use a wide array of sensors to provide a superior driving experience. Vehicle safety and driver alert are important parts of these system. This paper proposes a driver alert system to prevent and mitigate adjacent vehicle collisions by proving warning information of on-road vehicles and possible collisions. A dynamic Bayesian network (DBN) is utilized to fuse multiple sensors to provide driver awareness. It detects oncoming adjacent vehicles and gathers ego vehicle motion characteristics using an on-board camera and inertial measurement unit (IMU). A histogram of oriented gradient feature based classifier is used to detect any adjacent vehicles. Vehicles front-rear end and side faces were considered in training the classifier. Ego vehicles heading, speed and acceleration are captured from the IMU and feed into the DBN. The network parameters were learned from data via expectation maximization(EM) algorithm. The DBN is designed to provide two type of warning to the driver, a cautionary warning and a brake alert for possible collision with other vehicles. Experiments were completed on multiple public databases, demonstrating successful warnings and brake alerts in most situations.
先进的驾驶辅助系统是最新的车辆技术。这样的系统使用广泛的传感器阵列来提供优越的驾驶体验。车辆安全和驾驶员警报是该系统的重要组成部分。本文提出了一种驾驶员预警系统,通过验证道路上车辆的预警信息和可能发生的碰撞,来预防和减轻相邻车辆的碰撞。利用动态贝叶斯网络(DBN)融合多个传感器,提供驾驶员感知。该系统利用车载摄像头和惯性测量单元(IMU)来检测迎面而来的相邻车辆,并收集车辆的运动特征。采用直方图梯度特征分类器检测相邻车辆。在训练分类器时考虑了车辆的前后端和侧面。Ego车辆的航向,速度和加速度从IMU捕获并馈送到DBN。通过期望最大化(EM)算法从数据中学习网络参数。DBN旨在向驾驶员提供两种类型的警告,一种是警示性警告,另一种是可能与其他车辆发生碰撞的制动警报。在多个公共数据库上完成了实验,在大多数情况下演示了成功的警告和制动警报。
{"title":"Adjacent Vehicle Collision Warning System using Image Sensor and Inertial Measurement Unit","authors":"Asif Iqbal, C. Busso, N. Gans","doi":"10.1145/2818346.2820741","DOIUrl":"https://doi.org/10.1145/2818346.2820741","url":null,"abstract":"Advanced driver assistance systems are the newest addition to vehicular technology. Such systems use a wide array of sensors to provide a superior driving experience. Vehicle safety and driver alert are important parts of these system. This paper proposes a driver alert system to prevent and mitigate adjacent vehicle collisions by proving warning information of on-road vehicles and possible collisions. A dynamic Bayesian network (DBN) is utilized to fuse multiple sensors to provide driver awareness. It detects oncoming adjacent vehicles and gathers ego vehicle motion characteristics using an on-board camera and inertial measurement unit (IMU). A histogram of oriented gradient feature based classifier is used to detect any adjacent vehicles. Vehicles front-rear end and side faces were considered in training the classifier. Ego vehicles heading, speed and acceleration are captured from the IMU and feed into the DBN. The network parameters were learned from data via expectation maximization(EM) algorithm. The DBN is designed to provide two type of warning to the driver, a cautionary warning and a brake alert for possible collision with other vehicles. Experiments were completed on multiple public databases, demonstrating successful warnings and brake alerts in most situations.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91189600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Multimodal System for Real-Time Action Instruction in Motor Skill Learning 运动技能学习中实时动作指导的多模态系统
I. D. Kok, J. Hough, Felix Hülsmann, M. Botsch, David Schlangen, S. Kopp
We present a multimodal coaching system that supports online motor skill learning. In this domain, closed-loop interaction between the movements of the user and the action instructions by the system is an essential requirement. To achieve this, the actions of the user need to be measured and evaluated and the system must be able to give corrective instructions on the ongoing performance. Timely delivery of these instructions, particularly during execution of the motor skill by the user, is thus of the highest importance. Based on the results of an empirical study on motor skill coaching, we analyze the requirements for an interactive coaching system and present an architecture that combines motion analysis, dialogue management, and virtual human animation in a motion tracking and 3D virtual reality hardware setup. In a preliminary study we demonstrate that the current system is capable of delivering the closed-loop interaction that is required in the motor skill learning domain.
我们提出了一个支持在线运动技能学习的多模式指导系统。在这个领域中,用户的动作和系统的动作指令之间的闭环交互是必不可少的要求。为了实现这一目标,需要对用户的行为进行测量和评估,并且系统必须能够对正在进行的性能给出纠正性指示。因此,及时提供这些指令,特别是在用户执行运动技能时,是最重要的。基于对运动技能训练的实证研究结果,我们分析了交互式训练系统的需求,并在运动跟踪和3D虚拟现实硬件设置中提出了一个结合运动分析、对话管理和虚拟人动画的架构。在初步研究中,我们证明了当前的系统能够提供运动技能学习领域所需的闭环交互。
{"title":"A Multimodal System for Real-Time Action Instruction in Motor Skill Learning","authors":"I. D. Kok, J. Hough, Felix Hülsmann, M. Botsch, David Schlangen, S. Kopp","doi":"10.1145/2818346.2820746","DOIUrl":"https://doi.org/10.1145/2818346.2820746","url":null,"abstract":"We present a multimodal coaching system that supports online motor skill learning. In this domain, closed-loop interaction between the movements of the user and the action instructions by the system is an essential requirement. To achieve this, the actions of the user need to be measured and evaluated and the system must be able to give corrective instructions on the ongoing performance. Timely delivery of these instructions, particularly during execution of the motor skill by the user, is thus of the highest importance. Based on the results of an empirical study on motor skill coaching, we analyze the requirements for an interactive coaching system and present an architecture that combines motion analysis, dialogue management, and virtual human animation in a motion tracking and 3D virtual reality hardware setup. In a preliminary study we demonstrate that the current system is capable of delivering the closed-loop interaction that is required in the motor skill learning domain.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90107278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits 基于多模态数据分类的说话人特征识别
Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, Stefan Scherer
Human communication involves conveying messages both through verbal and non-verbal channels (facial expression, gestures, prosody, etc.). Nonetheless, the task of learning these patterns for a computer by combining cues from multiple modalities is challenging because it requires effective representation of the signals and also taking into consideration the complex interactions between them. From the machine learning perspective this presents a two-fold challenge: a) Modeling the intermodal variations and dependencies; b) Representing the data using an apt number of features, such that the necessary patterns are captured but at the same time allaying concerns such as over-fitting. In this work we attempt to address these aspects of multimodal recognition, in the context of recognizing two essential speaker traits, namely passion and credibility of online movie reviewers. We propose a novel ensemble classification approach that combines two different perspectives on classifying multimodal data. Each of these perspectives attempts to independently address the two-fold challenge. In the first, we combine the features from multiple modalities but assume inter-modality conditional independence. In the other one, we explicitly capture the correlation between the modalities but in a space of few dimensions and explore a novel clustering based kernel similarity approach for recognition. Additionally, this work investigates a recent technique for encoding text data that captures semantic similarity of verbal content and preserves word-ordering. The experimental results on a recent public dataset shows significant improvement of our approach over multiple baselines. Finally, we also analyze the most discriminative elements of a speaker's non-verbal behavior that contribute to his/her perceived credibility/passionateness.
人类的交流包括通过语言和非语言渠道(面部表情、手势、韵律等)传达信息。然而,通过结合来自多种模式的线索来为计算机学习这些模式的任务是具有挑战性的,因为它需要有效地表示信号,并考虑到它们之间复杂的相互作用。从机器学习的角度来看,这提出了双重挑战:a)建模多式联运变化和依赖关系;b)使用适当数量的特征来表示数据,以便捕获必要的模式,同时减轻过度拟合等问题。在这项工作中,我们试图解决多模态识别的这些方面,在识别两个基本的说话者特征的背景下,即激情和信誉的在线电影评论家。我们提出了一种新的集成分类方法,结合了两种不同的观点对多模态数据进行分类。这些观点中的每一个都试图独立地解决双重挑战。首先,我们结合了多个模态的特征,但假设模态间条件独立。在另一种方法中,我们明确地捕获了模态之间的相关性,但在几个维度的空间中,并探索了一种新的基于聚类的核相似度识别方法。此外,本研究还研究了一种最新的文本数据编码技术,该技术可以捕获口头内容的语义相似性并保持词序。在最近的一个公共数据集上的实验结果表明,我们的方法在多个基线上有了显著的改进。最后,我们还分析了演讲者非语言行为中最具歧视性的因素,这些因素有助于他/她感知到的可信度/激情。
{"title":"Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits","authors":"Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, Stefan Scherer","doi":"10.1145/2818346.2820747","DOIUrl":"https://doi.org/10.1145/2818346.2820747","url":null,"abstract":"Human communication involves conveying messages both through verbal and non-verbal channels (facial expression, gestures, prosody, etc.). Nonetheless, the task of learning these patterns for a computer by combining cues from multiple modalities is challenging because it requires effective representation of the signals and also taking into consideration the complex interactions between them. From the machine learning perspective this presents a two-fold challenge: a) Modeling the intermodal variations and dependencies; b) Representing the data using an apt number of features, such that the necessary patterns are captured but at the same time allaying concerns such as over-fitting. In this work we attempt to address these aspects of multimodal recognition, in the context of recognizing two essential speaker traits, namely passion and credibility of online movie reviewers. We propose a novel ensemble classification approach that combines two different perspectives on classifying multimodal data. Each of these perspectives attempts to independently address the two-fold challenge. In the first, we combine the features from multiple modalities but assume inter-modality conditional independence. In the other one, we explicitly capture the correlation between the modalities but in a space of few dimensions and explore a novel clustering based kernel similarity approach for recognition. Additionally, this work investigates a recent technique for encoding text data that captures semantic similarity of verbal content and preserves word-ordering. The experimental results on a recent public dataset shows significant improvement of our approach over multiple baselines. Finally, we also analyze the most discriminative elements of a speaker's non-verbal behavior that contribute to his/her perceived credibility/passionateness.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74390640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Model of Personality-Based, Nonverbal Behavior in Affective Virtual Humanoid Character 情感虚拟类人角色中基于人格的非语言行为模型
M. Saberi, Ulysses Bernardet, S. DiPaola
In this demonstration a human user interacts with a virtual humanoid character in real-time. Our goal is to create a character that is perceived as imbued with a distinct personality while responding dynamically to inputs from the environment [4] [1]. A hybrid model that comprises continuous and discrete components, firstly, drives the logical behavior of the virtual character moving through states of the interaction, and secondly, continuously updates of the emotional expressions of the virtual character depending on feedback from interactions with the environment. A Rock-Paper-Scissors game scenario is used as framework for the interaction scenario and provides an easy-to-learn and engaging demo environment with minimum conversation.
在这个演示中,一个人类用户与一个虚拟的类人角色实时交互。我们的目标是创造一个被认为具有独特个性的角色,同时动态地响应来自环境的输入[4][1]。一个由连续和离散组件组成的混合模型,首先驱动虚拟角色在交互状态中移动的逻辑行为,其次,根据与环境交互的反馈,不断更新虚拟角色的情感表达。石头剪刀布游戏场景被用作交互场景的框架,并提供了一个简单易学且具有吸引力的演示环境。
{"title":"Model of Personality-Based, Nonverbal Behavior in Affective Virtual Humanoid Character","authors":"M. Saberi, Ulysses Bernardet, S. DiPaola","doi":"10.1145/2818346.2823296","DOIUrl":"https://doi.org/10.1145/2818346.2823296","url":null,"abstract":"In this demonstration a human user interacts with a virtual humanoid character in real-time. Our goal is to create a character that is perceived as imbued with a distinct personality while responding dynamically to inputs from the environment [4] [1]. A hybrid model that comprises continuous and discrete components, firstly, drives the logical behavior of the virtual character moving through states of the interaction, and secondly, continuously updates of the emotional expressions of the virtual character depending on feedback from interactions with the environment. A Rock-Paper-Scissors game scenario is used as framework for the interaction scenario and provides an easy-to-learn and engaging demo environment with minimum conversation.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74827844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1