Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文中文

A Visual Analytics Approach to Finding Factors Improving Automatic Speaker Identifications 寻找提高自动说话人识别因素的可视化分析方法

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820769

P. Bruneau, M. Stefas, H. Bredin, Johann Poignant, T. Tamisier, C. Barras

Classification quality criteria such as precision, recall, and F-measure are generally the basis for evaluating contributions in automatic speaker recognition. Specifically, comparisons are carried out mostly via mean values estimated on a set of media. Whilst this approach is relevant to assess improvement w.r.t. the state-of-the-art, or ranking participants in the context of an automatic annotation challenge, it gives little insight to system designers in terms of cues for improving algorithms, hypothesis formulation, and evidence display. This paper presents a design study of a visual and interactive approach to analyze errors made by automatic annotation algorithms. A timeline-based tool emerged from prior steps of this study. A critical review, driven by user interviews, exposes caveats and refines user objectives. The next step of the study is then initiated by sketching designs combining elements of the current prototype to principles newly identified as relevant.

分类质量标准，如精度、召回率和F-measure通常是评估自动说话人识别贡献的基础。具体来说，比较主要是通过在一组媒体上估计的平均值来进行的。虽然这种方法与评估最先进技术的改进或在自动注释挑战的上下文中对参与者进行排名相关，但它在改进算法、假设公式和证据显示的线索方面给系统设计者提供的见解很少。本文提出了一种可视化和交互式的方法来分析自动标注算法所产生的错误。基于时间轴的工具从本研究的先前步骤中出现。由用户访谈驱动的批判性审查，揭示了警告并改进了用户目标。研究的下一步是通过将当前原型的元素与新确定的相关原则结合起来进行草图设计。

引用次数: 1

Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions 凝视+手势:富有表现力的、精确的、有针对性的自由空间互动

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820752

Ishan Chatterjee, R. Xiao, Chris Harrison

Humans rely on eye gaze and hand manipulations extensively in their everyday activities. Most often, users gaze at an object to perceive it and then use their hands to manipulate it. We propose applying a multimodal, gaze plus free-space gesture approach to enable rapid, precise and expressive touch-free interactions. We show the input methods are highly complementary, mitigating issues of imprecision and limited expressivity in gaze-alone systems, and issues of targeting speed in gesture-alone systems. We extend an existing interaction taxonomy that naturally divides the gaze+gesture interaction space, which we then populate with a series of example interaction techniques to illustrate the character and utility of each method. We contextualize these interaction techniques in three example scenarios. In our user study, we pit our approach against five contemporary approaches; results show that gaze+gesture can outperform systems using gaze or gesture alone, and in general, approach the performance of "gold standard" input systems, such as the mouse and trackpad.

人类在日常活动中广泛依赖于眼睛注视和手部操作。大多数情况下，用户盯着一个物体来感知它，然后用手来操纵它。我们建议应用多模态、凝视加自由空间手势方法来实现快速、精确和富有表现力的无触摸交互。我们展示了输入法是高度互补的，减轻了不精确的问题和有限的表现力在单一的注视系统，和目标速度的问题在单一的手势系统。我们扩展了现有的交互分类法，该分类法自然地划分了凝视+手势交互空间，然后我们用一系列示例交互技术填充该分类法，以说明每种方法的特点和实用性。我们将这些交互技术置于三个示例场景中。在我们的用户研究中，我们将我们的方法与五种当代方法进行比较;结果表明，凝视+手势可以优于单独使用凝视或手势的系统，并且通常接近“黄金标准”输入系统的性能，例如鼠标和触控板。

引用次数: 113

A Multimodal System for Real-Time Action Instruction in Motor Skill Learning 运动技能学习中实时动作指导的多模态系统

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820746

I. D. Kok, J. Hough, Felix Hülsmann, M. Botsch, David Schlangen, S. Kopp

We present a multimodal coaching system that supports online motor skill learning. In this domain, closed-loop interaction between the movements of the user and the action instructions by the system is an essential requirement. To achieve this, the actions of the user need to be measured and evaluated and the system must be able to give corrective instructions on the ongoing performance. Timely delivery of these instructions, particularly during execution of the motor skill by the user, is thus of the highest importance. Based on the results of an empirical study on motor skill coaching, we analyze the requirements for an interactive coaching system and present an architecture that combines motion analysis, dialogue management, and virtual human animation in a motion tracking and 3D virtual reality hardware setup. In a preliminary study we demonstrate that the current system is capable of delivering the closed-loop interaction that is required in the motor skill learning domain.

我们提出了一个支持在线运动技能学习的多模式指导系统。在这个领域中，用户的动作和系统的动作指令之间的闭环交互是必不可少的要求。为了实现这一目标，需要对用户的行为进行测量和评估，并且系统必须能够对正在进行的性能给出纠正性指示。因此，及时提供这些指令，特别是在用户执行运动技能时，是最重要的。基于对运动技能训练的实证研究结果，我们分析了交互式训练系统的需求，并在运动跟踪和3D虚拟现实硬件设置中提出了一个结合运动分析、对话管理和虚拟人动画的架构。在初步研究中，我们证明了当前的系统能够提供运动技能学习领域所需的闭环交互。

引用次数: 28

A Computational Model of Culture-Specific Emotion Detection for Artificial Agents in the Learning Domain 学习领域人工智能的特定文化情感检测计算模型

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823307

Ganapreeta Renunathan Naidu

Nowadays, intelligent agents are expected to be affect-sensitive as agents are becoming essential entities that supports computer-mediated tasks, especially in teaching and training. These agents use common natural modalities-such as facial expressions, gestures and eye gaze in order to recognize a user's affective state and respond accordingly. However, these nonverbal cues may not be universal as emotion recognition and expression differ from culture to culture. It is important that intelligent interfaces are equipped with the abilities to meet the challenge of cultural diversity to facilitate human-machine interaction particularly in Asia. Asians are known to be more passive and possess certain traits such as indirectness and non-confrontationalism, which lead to emotions such as (culture-specific form of) shyness and timidity. Therefore, a model based on other culture may not be applicable in an Asian setting, out-rulling a one-size-fits-all approach. This study is initiated to identify the discriminative markers of culture-specific emotions based on the multimodal interactions.

如今，随着智能体成为支持计算机中介任务的重要实体，特别是在教学和培训中，智能体被期望具有影响敏感性。这些代理使用常见的自然形态——如面部表情、手势和眼神——来识别用户的情感状态并做出相应的反应。然而，这些非语言线索可能不是普遍的，因为情感识别和表达因文化而异。重要的是，智能界面必须具备应对文化多样性挑战的能力，以促进人机交互，特别是在亚洲。众所周知，亚洲人更被动，具有间接和非对抗主义等某些特征，这些特征会导致(特定文化形式的)害羞和胆怯等情绪。因此，基于其他文化的模式可能不适用于亚洲环境，排除了一刀切的方法。本研究旨在基于多模态交互作用来识别文化特异性情绪的鉴别标记。

引用次数: 1

Providing Real-time Feedback for Student Teachers in a Virtual Rehearsal Environment 在虚拟排练环境中为实习教师提供实时反馈

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830604

R. Barmaki, C. Hughes

Research in learning analytics and educational data mining has recently become prominent in the fields of computer science and education. Most scholars in the field emphasize student learning and student data analytics; however, it is also important to focus on teaching analytics and teacher preparation because of their key roles in student learning, especially in K-12 learning environments. Nonverbal communication strategies play an important role in successful interpersonal communication of teachers with their students. In order to assist novice or practicing teachers with exhibiting open and affirmative nonverbal cues in their classrooms, we have designed a multimodal teaching platform with provisions for online feedback. We used an interactive teaching rehearsal software, TeachLivE, as our basic research environment. TeachLivE employs a digital puppetry paradigm as its core technology. Individuals walk into this virtual environment and interact with virtual students displayed on a large screen. They can practice classroom management, pedagogy and content delivery skills with a teaching plan in the TeachLivE environment. We have designed an experiment to evaluate the impact of an online nonverbal feedback application. In this experiment, different types of multimodal data have been collected during two experimental settings. These data include talk-time and nonverbal behaviors of the virtual students, captured in log files; talk time and full body tracking data of the participant; and video recording of the virtual classroom with the participant. 34 student teachers participated in this 30-minute experiment. In each of the settings, the participants were provided with teaching plans from which they taught. All the participants took part in both of the experimental settings. In order to have a balanced experiment design, half of the participants received nonverbal online feedback in their first session and the other half received this feedback in the second session. A visual indication was used for feedback each time the participant exhibited a closed, defensive posture. Based on recorded full-body tracking data, we observed that only those who received feedback in their first session demonstrated a significant number of open postures in the session containing no feedback. However, the post-questionnaire information indicated that all participants were more mindful of their body postures while teaching after they had participated in the study.

近年来，学习分析和教育数据挖掘的研究在计算机科学和教育领域已成为突出的研究方向。该领域的大多数学者强调学生学习和学生数据分析;然而，关注教学分析和教师准备也很重要，因为它们在学生学习中起着关键作用，特别是在K-12学习环境中。非语言交际策略在师生人际交往中起着重要的作用。为了帮助新手或实习教师在课堂上展示开放和肯定的非语言暗示，我们设计了一个多模式教学平台，并提供在线反馈。我们使用交互式教学排练软件TeachLivE作为基础研究环境。TeachLivE采用数字木偶范式作为其核心技术。个人走进这个虚拟环境，与显示在大屏幕上的虚拟学生互动。他们可以在TeachLivE环境中通过教学计划练习课堂管理、教学法和内容交付技能。我们设计了一个实验来评估在线非语言反馈应用程序的影响。在本实验中，在两个实验设置中收集了不同类型的多模态数据。这些数据包括记录在日志文件中的虚拟学生的谈话时间和非语言行为;参与者的通话时间和全身跟踪数据;以及参与者在虚拟教室的视频记录。34名实习教师参加了这个30分钟的实验。在每一种情况下，参与者都被提供了教学计划。所有的参与者都参加了两种实验设置。为了平衡实验设计，一半的参与者在第一阶段接受非语言在线反馈，另一半在第二阶段接受非语言在线反馈。每次参与者表现出封闭的、防御性的姿势时，视觉指示被用于反馈。根据记录的全身跟踪数据，我们观察到只有那些在第一次训练中得到反馈的人在没有反馈的训练中表现出大量的开放姿势。然而，问卷调查后的信息表明，所有参与者在参与研究后，在教学时都更加注意自己的身体姿势。

{"title":"Providing Real-time Feedback for Student Teachers in a Virtual Rehearsal Environment","authors":"R. Barmaki, C. Hughes","doi":"10.1145/2818346.2830604","DOIUrl":"https://doi.org/10.1145/2818346.2830604","url":null,"abstract":"Research in learning analytics and educational data mining has recently become prominent in the fields of computer science and education. Most scholars in the field emphasize student learning and student data analytics; however, it is also important to focus on teaching analytics and teacher preparation because of their key roles in student learning, especially in K-12 learning environments. Nonverbal communication strategies play an important role in successful interpersonal communication of teachers with their students. In order to assist novice or practicing teachers with exhibiting open and affirmative nonverbal cues in their classrooms, we have designed a multimodal teaching platform with provisions for online feedback. We used an interactive teaching rehearsal software, TeachLivE, as our basic research environment. TeachLivE employs a digital puppetry paradigm as its core technology. Individuals walk into this virtual environment and interact with virtual students displayed on a large screen. They can practice classroom management, pedagogy and content delivery skills with a teaching plan in the TeachLivE environment. We have designed an experiment to evaluate the impact of an online nonverbal feedback application. In this experiment, different types of multimodal data have been collected during two experimental settings. These data include talk-time and nonverbal behaviors of the virtual students, captured in log files; talk time and full body tracking data of the participant; and video recording of the virtual classroom with the participant. 34 student teachers participated in this 30-minute experiment. In each of the settings, the participants were provided with teaching plans from which they taught. All the participants took part in both of the experimental settings. In order to have a balanced experiment design, half of the participants received nonverbal online feedback in their first session and the other half received this feedback in the second session. A visual indication was used for feedback each time the participant exhibited a closed, defensive posture. Based on recorded full-body tracking data, we observed that only those who received feedback in their first session demonstrated a significant number of open postures in the session containing no feedback. However, the post-questionnaire information indicated that all participants were more mindful of their body postures while teaching after they had participated in the study.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76959158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Spoken Interruptions Signal Productive Problem Solving and Domain Expertise in Mathematics 口头中断信号生产问题解决和领域专业知识的数学

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820743

S. Oviatt, Kevin Hang, Jianlong Zhou, Fang Chen

Prevailing social norms prohibit interrupting another person when they are speaking. In this research, simultaneous speech was investigated in groups of students as they jointly solved math problems and peer tutored one another. Analyses were based on the Math Data Corpus, which includes ground-truth performance coding and speech transcriptions. Simultaneous speech was elevated 120-143% during the most productive phase of problem solving, compared with matched intervals. It also was elevated 18-37% in students who were domain experts, compared with non-experts. Qualitative analyses revealed that experts differed from non-experts in the function of their interruptions. Analysis of these functional asymmetries produced nine key behaviors that were used to identify the dominant math expert in a group with 95-100% accuracy in three minutes. This research demonstrates that overlapped speech is a marker of group problem-solving progress and domain expertise. It provides valuable information for the emerging field of learning analytics.

普遍的社会规范禁止在别人说话时打断别人。在本研究中，我们调查了学生在共同解决数学问题和相互辅导时的同声演讲。分析基于数学数据语料库，其中包括真实表现编码和语音转录。与匹配的时间间隔相比，在解决问题的最高效阶段，同时说话的能力提高了120-143%。与非专家相比，领域专家的学生的智商也提高了18-37%。定性分析表明，专家与非专家在打断的功能上存在差异。对这些功能不对称的分析产生了九种关键行为，用来在三分钟内以95-100%的准确率识别小组中的主导数学专家。本研究表明，重叠言语是群体问题解决进展和领域专长的标志。它为新兴的学习分析领域提供了有价值的信息。

引用次数: 14

Multimodal Selfies: Designing a Multimodal Recording Device for Students in Traditional Classrooms 多模态自拍:设计一种用于传统教室学生的多模态记录设备

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830606

Federico Domínguez, K. Chiluiza, Vanessa Echeverría, X. Ochoa

The traditional recording of student interaction in classrooms has raised privacy concerns in both students and academics. However, the same students are happy to share their daily lives through social media. Perception of data ownership is the key factor in this paradox. This article proposes the design of a personal Multimodal Recording Device (MRD) that could capture the actions of its owner during lectures. The MRD would be able to capture close-range video, audio, writing, and other environmental signals. Differently from traditional centralized recording systems, students would have control over their own recorded data. They could decide to share their information in exchange of access to the recordings of the instructor, notes form their classmates, and analysis of, for example, their attention performance. By sharing their data, students participate in the co-creation of enhanced and synchronized course notes that will benefit all the participating students. This work presents details about how such a device could be build from available components. This work also discusses and evaluates the design of such device, including its foreseeable costs, scalability, flexibility, intrusiveness and recording quality.

学生在课堂上互动的传统记录引起了学生和学者对隐私的关注。然而，同样是这些学生也乐于通过社交媒体分享他们的日常生活。对数据所有权的认知是这个悖论的关键因素。本文提出了一种个人多模态录音设备(MRD)的设计，该设备可以捕捉其所有者在讲座期间的动作。MRD将能够捕捉近距离的视频、音频、文字和其他环境信号。与传统的集中记录系统不同，学生可以控制自己记录的数据。他们可以决定分享他们的信息，以换取访问讲师的录音，同学的笔记，以及分析，例如，他们的注意力表现。通过共享他们的数据，学生们参与了共同创建增强和同步的课程笔记，这将使所有参与的学生受益。这项工作详细介绍了如何从可用组件构建这样的设备。本文还讨论和评估了这种设备的设计，包括其可预见的成本、可扩展性、灵活性、侵入性和记录质量。

{"title":"Multimodal Selfies: Designing a Multimodal Recording Device for Students in Traditional Classrooms","authors":"Federico Domínguez, K. Chiluiza, Vanessa Echeverría, X. Ochoa","doi":"10.1145/2818346.2830606","DOIUrl":"https://doi.org/10.1145/2818346.2830606","url":null,"abstract":"The traditional recording of student interaction in classrooms has raised privacy concerns in both students and academics. However, the same students are happy to share their daily lives through social media. Perception of data ownership is the key factor in this paradox. This article proposes the design of a personal Multimodal Recording Device (MRD) that could capture the actions of its owner during lectures. The MRD would be able to capture close-range video, audio, writing, and other environmental signals. Differently from traditional centralized recording systems, students would have control over their own recorded data. They could decide to share their information in exchange of access to the recordings of the instructor, notes form their classmates, and analysis of, for example, their attention performance. By sharing their data, students participate in the co-creation of enhanced and synchronized course notes that will benefit all the participating students. This work presents details about how such a device could be build from available components. This work also discusses and evaluates the design of such device, including its foreseeable costs, scalability, flexibility, intrusiveness and recording quality.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75843857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Digital Flavor: Towards Digitally Simulating Virtual Flavors 数字风味:迈向数字模拟虚拟风味

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820761

Nimesha Ranasinghe, Gajan Suthokumar, Kuan-Yi Lee, E. Do

Flavor is often a pleasurable sensory perception we experience daily while eating and drinking. However, the sensation of flavor is rarely considered in the age of digital communication mainly due to the unavailability of flavors as a digitally controllable media. This paper introduces a digital instrument (Digital Flavor Synthesizing device), which actuates taste (electrical and thermal stimulation) and smell sensations (controlled scent emitting) together to simulate different flavors digitally. A preliminary user experiment is conducted to study the effectiveness of this method with predefined five different flavor stimuli. Experimental results show that the users were effectively able to identify different flavors such as minty, spicy, and lemony. Moreover, we outline several challenges ahead along with future possibilities of this technology. In summary, our work demonstrates a novel controllable instrument for flavor simulation, which will be valuable in multimodal interactive systems for rendering virtual flavors digitally.

味道通常是我们每天吃喝时体验到的一种愉悦的感官感受。然而，在数字通信时代，味道的感觉很少被考虑，这主要是因为味道作为一种数字可控的媒介是不可用的。本文介绍了一种数字式仪器(数字式风味合成装置)，它能同时驱动味觉(电和热刺激)和嗅觉(受控的香味释放)，以数字方式模拟不同的风味。通过初步的用户实验，研究了该方法在预先设定的五种不同风味刺激下的有效性。实验结果表明，用户能够有效地识别薄荷、辛辣和柠檬等不同的味道。此外，我们还概述了该技术未来可能面临的几个挑战。总之，我们的工作展示了一种新的可控制的风味模拟仪器，这将在多模态交互系统中具有价值，可以数字化地呈现虚拟风味。

引用次数: 26

Session details: Demonstrations 会议详情:演示

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/3252453

Stefan Scherer

引用次数: 0

Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits 基于多模态数据分类的说话人特征识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820747

Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, Stefan Scherer

Human communication involves conveying messages both through verbal and non-verbal channels (facial expression, gestures, prosody, etc.). Nonetheless, the task of learning these patterns for a computer by combining cues from multiple modalities is challenging because it requires effective representation of the signals and also taking into consideration the complex interactions between them. From the machine learning perspective this presents a two-fold challenge: a) Modeling the intermodal variations and dependencies; b) Representing the data using an apt number of features, such that the necessary patterns are captured but at the same time allaying concerns such as over-fitting. In this work we attempt to address these aspects of multimodal recognition, in the context of recognizing two essential speaker traits, namely passion and credibility of online movie reviewers. We propose a novel ensemble classification approach that combines two different perspectives on classifying multimodal data. Each of these perspectives attempts to independently address the two-fold challenge. In the first, we combine the features from multiple modalities but assume inter-modality conditional independence. In the other one, we explicitly capture the correlation between the modalities but in a space of few dimensions and explore a novel clustering based kernel similarity approach for recognition. Additionally, this work investigates a recent technique for encoding text data that captures semantic similarity of verbal content and preserves word-ordering. The experimental results on a recent public dataset shows significant improvement of our approach over multiple baselines. Finally, we also analyze the most discriminative elements of a speaker's non-verbal behavior that contribute to his/her perceived credibility/passionateness.

人类的交流包括通过语言和非语言渠道(面部表情、手势、韵律等)传达信息。然而，通过结合来自多种模式的线索来为计算机学习这些模式的任务是具有挑战性的，因为它需要有效地表示信号，并考虑到它们之间复杂的相互作用。从机器学习的角度来看，这提出了双重挑战:a)建模多式联运变化和依赖关系;b)使用适当数量的特征来表示数据，以便捕获必要的模式，同时减轻过度拟合等问题。在这项工作中，我们试图解决多模态识别的这些方面，在识别两个基本的说话者特征的背景下，即激情和信誉的在线电影评论家。我们提出了一种新的集成分类方法，结合了两种不同的观点对多模态数据进行分类。这些观点中的每一个都试图独立地解决双重挑战。首先，我们结合了多个模态的特征，但假设模态间条件独立。在另一种方法中，我们明确地捕获了模态之间的相关性，但在几个维度的空间中，并探索了一种新的基于聚类的核相似度识别方法。此外，本研究还研究了一种最新的文本数据编码技术，该技术可以捕获口头内容的语义相似性并保持词序。在最近的一个公共数据集上的实验结果表明，我们的方法在多个基线上有了显著的改进。最后，我们还分析了演讲者非语言行为中最具歧视性的因素，这些因素有助于他/她感知到的可信度/激情。

{"title":"Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits","authors":"Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, Stefan Scherer","doi":"10.1145/2818346.2820747","DOIUrl":"https://doi.org/10.1145/2818346.2820747","url":null,"abstract":"Human communication involves conveying messages both through verbal and non-verbal channels (facial expression, gestures, prosody, etc.). Nonetheless, the task of learning these patterns for a computer by combining cues from multiple modalities is challenging because it requires effective representation of the signals and also taking into consideration the complex interactions between them. From the machine learning perspective this presents a two-fold challenge: a) Modeling the intermodal variations and dependencies; b) Representing the data using an apt number of features, such that the necessary patterns are captured but at the same time allaying concerns such as over-fitting. In this work we attempt to address these aspects of multimodal recognition, in the context of recognizing two essential speaker traits, namely passion and credibility of online movie reviewers. We propose a novel ensemble classification approach that combines two different perspectives on classifying multimodal data. Each of these perspectives attempts to independently address the two-fold challenge. In the first, we combine the features from multiple modalities but assume inter-modality conditional independence. In the other one, we explicitly capture the correlation between the modalities but in a space of few dimensions and explore a novel clustering based kernel similarity approach for recognition. Additionally, this work investigates a recent technique for encoding text data that captures semantic similarity of verbal content and preserves word-ordering. The experimental results on a recent public dataset shows significant improvement of our approach over multiple baselines. Finally, we also analyze the most discriminative elements of a speaker's non-verbal behavior that contribute to his/her perceived credibility/passionateness.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74390640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀