Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文中文

Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors 基于有意义行为的语音驱动动画目标手势检索

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820750

Najmeh Sadoughi, C. Busso

Creating believable behaviors for conversational agents (CAs) is a challenging task, given the complex relationship between speech and various nonverbal behaviors. The two main approaches are rule-based systems, which tend to produce behaviors with limited variations compared to natural interactions, and data-driven systems, which tend to ignore the underlying semantic meaning of the message (e.g., gestures without meaning). We envision a hybrid system, acting as the behavior realization layer in rule-based systems, while exploiting the rich variation in natural interactions. Constrained on a given target gesture (e.g., head nod) and speech signal, the system will generate novel realizations learned from the data, capturing the timely relationship between speech and gestures. An important task in this research is identifying multiple examples of the target gestures in the corpus. This paper proposes a data mining framework for detecting gestures of interest in a motion capture database. First, we train One-class support vector machines (SVMs) to detect candidate segments conveying the target gesture. Second, we use dynamic time alignment kernel (DTAK) to compare the similarity between the examples (i.e., target gesture) and the given segments. We evaluate the approach for five prototypical hand and head gestures showing reasonable performance. These retrieved gestures are then used to train a speech-driven framework based on dynamic Bayesian networks (DBNs) to synthesize these target behaviors.

鉴于言语和各种非语言行为之间的复杂关系，为会话代理(ca)创建可信的行为是一项具有挑战性的任务。两种主要的方法是基于规则的系统，它倾向于产生与自然交互相比具有有限变化的行为，以及数据驱动的系统，它倾向于忽略消息的潜在语义(例如，没有意义的手势)。我们设想一个混合系统，作为基于规则的系统中的行为实现层，同时利用自然交互中的丰富变化。在给定目标手势(例如，点头)和语音信号的约束下，系统将从数据中学习产生新的实现，捕获语音和手势之间的及时关系。本研究的一个重要任务是在语料库中识别目标手势的多个例子。本文提出了一种用于动作捕捉数据库中感兴趣手势检测的数据挖掘框架。首先，我们训练一类支持向量机(svm)来检测传递目标手势的候选片段。其次，我们使用动态时间对齐核(DTAK)来比较示例(即目标手势)与给定片段之间的相似性。我们评估了五种表现合理的手部和头部手势原型的方法。然后使用这些检索到的手势来训练基于动态贝叶斯网络(dbn)的语音驱动框架，以合成这些目标行为。

{"title":"Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors","authors":"Najmeh Sadoughi, C. Busso","doi":"10.1145/2818346.2820750","DOIUrl":"https://doi.org/10.1145/2818346.2820750","url":null,"abstract":"Creating believable behaviors for conversational agents (CAs) is a challenging task, given the complex relationship between speech and various nonverbal behaviors. The two main approaches are rule-based systems, which tend to produce behaviors with limited variations compared to natural interactions, and data-driven systems, which tend to ignore the underlying semantic meaning of the message (e.g., gestures without meaning). We envision a hybrid system, acting as the behavior realization layer in rule-based systems, while exploiting the rich variation in natural interactions. Constrained on a given target gesture (e.g., head nod) and speech signal, the system will generate novel realizations learned from the data, capturing the timely relationship between speech and gestures. An important task in this research is identifying multiple examples of the target gestures in the corpus. This paper proposes a data mining framework for detecting gestures of interest in a motion capture database. First, we train One-class support vector machines (SVMs) to detect candidate segments conveying the target gesture. Second, we use dynamic time alignment kernel (DTAK) to compare the similarity between the examples (i.e., target gesture) and the given segments. We evaluate the approach for five prototypical hand and head gestures showing reasonable performance. These retrieved gestures are then used to train a speech-driven framework based on dynamic Bayesian networks (DBNs) to synthesize these target behaviors.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73517836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild 结合多模态特征的融合网络用于野外情绪识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830586

Bo Sun, Liandong Li, Guoyan Zhou, Xuewen Wu, Jun He, Lejun Yu, Dongxue Li, Qinglan Wei

In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.

在本文中，我们描述了我们在第三届野外情绪识别挑战赛(EmotiW 2015)中的工作。对于每个视频片段，我们提取MSDF、LBP-TOP、HOG、LPQ-TOP和声学特征来识别电影人物的情绪。对于基于视频帧的静态面部表情识别，我们提取了MSDF、DCNN和RCNN特征。我们在few和SFEW数据集上对这些类型的特征训练线性支持向量机分类器，并提出了一种新的融合网络，将提取的所有特征结合在决策层。最终我们在few测试集上的识别率为51.02%，在SFEW测试集上的识别率为51.08%，大大优于基线的39.33%和39.13%。

引用次数: 48

Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions 基于一致性的动态主动学习及其在口语互动情绪识别中的应用

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820774

Yue Zhang, E. Coutinho, Zixing Zhang, C. Quan, Björn Schuller

In this contribution, we propose a novel method for Active Learning (AL) - Dynamic Active Learning (DAL) - which targets the reduction of the costly human labelling work necessary for modelling subjective tasks such as emotion recognition in spoken interactions. The method implements an adaptive query strategy that minimises the amount of human labelling work by deciding for each instance whether it should automatically be labelled by machine or manually by human, as well as how many human annotators are required. Extensive experiments on standardised test-beds show that DAL significantly improves the efficiency of conventional AL. In particular, DAL achieves the same classification accuracy obtained with AL with up to 79.17% less human annotation effort.

在这篇文章中，我们提出了一种新的主动学习(AL)方法——动态主动学习(DAL)——其目标是减少建模主观任务(如口头互动中的情绪识别)所需的昂贵的人类标签工作。该方法实现了一种自适应查询策略，通过决定每个实例是由机器自动标记还是由人工手动标记，以及需要多少人工注释者，来最大限度地减少人工标记工作。在标准化试验台上进行的大量实验表明，DAL显著提高了传统人工智能的效率，特别是DAL达到了与人工智能相同的分类精度，而人工标注的工作量减少了79.17%。

引用次数: 21

Sharing Touch Interfaces: Proximity-Sensitive Touch Targets for Tablet-Mediated Collaboration 共享触摸界面:平板电脑协作的接近敏感触摸目标

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820740

Ilhan Aslan, Thomas Meneweger, Verena Fuchsberger, M. Tscheligi

During conversational practices, such as a tablet-mediated sales conversation between a salesperson and a customer, tablets are often used by two users who prefer specific bodily formations in order to easily face each other and the surface of the touchscreen. In a series of studies, we investigated bodily formations that are preferred during tablet-mediated sales conversations, and explored the effect of these formations on performance in acquiring touch targets (e.g., buttons) on a tablet device. We found that bodily formations cause decreased viewing angles to the shared screen, which results in a decreased performance in target acquisition. In order to address this issue, a multi-modal design consideration is presented, which combines mid-air finger movement and touch into a unified input modality, allowing the design of proximity sensitive touch targets. We conclude that the proposed embodied interaction design not only has potential to improve targeting performance, but also adapts the ``agency' of touch targets for multi-user settings.

在对话过程中，例如销售人员和客户之间的平板电脑销售对话，平板电脑通常由两个用户使用，他们喜欢特定的身体形状，以便轻松地面对彼此和触摸屏表面。在一系列的研究中，我们调查了在平板电脑销售对话中首选的身体形态，并探索了这些形态对平板电脑设备上获取触摸目标(如按钮)的性能的影响。我们发现身体形态会导致共享屏幕的视角降低，从而导致目标获取性能下降。为了解决这一问题，提出了一种多模态设计思路，将空中手指运动和触摸结合为统一的输入模态，从而实现近敏触摸目标的设计。我们得出的结论是，所提出的嵌入交互设计不仅具有提高目标定位性能的潜力，而且还适应了多用户设置的触摸目标的“代理”。

引用次数: 9

CuddleBits: Friendly, Low-cost Furballs that Respond to Touch CuddleBits:友好、低成本的毛绒球，可以对触摸做出反应

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823293

Laura Cang, Paul Bucci, Karon E Maclean

We present a real-time touch gesture recognition system using a low-cost fabric pressure sensor mounted on a small zoomorphic robot, affectionately called the `CuddleBit'. We explore the relationship between gesture recognition and affect through the lens of human-robot interaction. We demonstrate our real-time gesture recognition system, including both software and hardware, and a haptic display that brings the CuddleBit to life.

我们提出了一种实时触摸手势识别系统，使用安装在小型动物机器人上的低成本织物压力传感器，亲切地称为“CuddleBit”。我们通过人机交互的视角来探索手势识别与情感之间的关系。我们展示了我们的实时手势识别系统，包括软件和硬件，以及一个触觉显示器，使CuddleBit栩栩如生。

引用次数: 10

Session details: Oral Session 3: Language, Speech and Dialog 口语部分:语言、演讲和对话

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/3252448

J. Lehman

引用次数: 0

Analyzing Multimodality of Video for User Engagement Assessment 分析视频的多模态以评估用户粘性

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820775

F. Salim, F. Haider, Owen Conlan, S. Luz, N. Campbell

These days, several hours of new video content is uploaded to the internet every second. It is simply impossible for anyone to see every piece of video which could be engaging or even useful to them. Therefore it is desirable to identify videos that might be regarded as engaging automatically, for a variety of applications such as recommendation and personalized video segmentation etc. This paper explores how multimodal characteristics of video, such as prosodic, visual and paralinguistic features, can help in assessing user engagement with videos. The approach proposed in this paper achieved good accuracy (maximum F score of 96.93 %) through a novel combination of features extracted directly from video recordings, demonstrating the potential of this method in identifying engaging content.

如今，每秒钟就有几个小时的新视频内容上传到互联网上。任何人都不可能看到每一段对他们有吸引力甚至有用的视频。因此，对于各种应用，如推荐和个性化视频分割等，需要自动识别可能被认为具有吸引力的视频。本文探讨了视频的多模态特征，如韵律、视觉和副语言特征，如何帮助评估用户对视频的参与。本文提出的方法通过直接从视频记录中提取特征的新颖组合获得了良好的准确性(最高F分为96.93%)，证明了该方法在识别引人入胜的内容方面的潜力。

引用次数: 3

The Application of Word Processor UI paradigms to Audio and Animation Editing 文字处理器UI范例在音频和动画编辑中的应用

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823292

A. D. Milota

This demonstration showcases Quixotic, an audio editor, and Quintessence, an animation editor. Both appropriate many of the interaction techniques found in word processors, and allow users to more quickly create time-variant media. Our different approach to the interface aims to make recorded speech and simple animation into media that can be efficiently used for one-to-one asynchronous communications, quick note taking and documentation, as well as for idea refinement.

这个演示展示了音频编辑器堂吉诃德和动画编辑器Quintessence。两者都适用于文字处理器中的许多交互技术，并允许用户更快地创建时变媒体。我们对界面的不同方法旨在将录制的语音和简单的动画制作成媒体，可以有效地用于一对一的异步通信，快速记笔记和文档，以及改进想法。

引用次数: 0

Multimodal Public Speaking Performance Assessment 多模式公共演讲表现评估

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820762

T. Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, R. Stiefelhagen, Stefan Scherer

The ability to speak proficiently in public is essential for many professions and in everyday life. Public speaking skills are difficult to master and require extensive training. Recent developments in technology enable new approaches for public speaking training that allow users to practice in engaging and interactive environments. Here, we focus on the automatic assessment of nonverbal behavior and multimodal modeling of public speaking behavior. We automatically identify audiovisual nonverbal behaviors that are correlated to expert judges' opinions of key performance aspects. These automatic assessments enable a virtual audience to provide feedback that is essential for training during a public speaking performance. We utilize multimodal ensemble tree learners to automatically approximate expert judges' evaluations to provide post-hoc performance assessments to the speakers. Our automatic performance evaluation is highly correlated with the experts' opinions with r = 0.745 for the overall performance assessments. We compare multimodal approaches with single modalities and find that the multimodal ensembles consistently outperform single modalities.

在公共场合熟练发言的能力对许多职业和日常生活都是必不可少的。公众演讲技巧很难掌握，需要大量的训练。技术的最新发展为公众演讲培训提供了新的方法，允许使用者在参与和互动的环境中练习。本文主要研究了非语言行为的自动评估和公共演讲行为的多模态建模。我们自动识别与专家评委对关键表现方面的意见相关的视听非语言行为。这些自动评估使虚拟听众能够提供反馈，这对公开演讲表演的培训至关重要。我们利用多模态集成树学习器来自动逼近专家评委的评价，为演讲者提供事后绩效评估。我们的自动绩效评价与专家意见高度相关，总体绩效评价r = 0.745。我们将多模态方法与单模态方法进行比较，发现多模态集成始终优于单模态。

{"title":"Multimodal Public Speaking Performance Assessment","authors":"T. Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, R. Stiefelhagen, Stefan Scherer","doi":"10.1145/2818346.2820762","DOIUrl":"https://doi.org/10.1145/2818346.2820762","url":null,"abstract":"The ability to speak proficiently in public is essential for many professions and in everyday life. Public speaking skills are difficult to master and require extensive training. Recent developments in technology enable new approaches for public speaking training that allow users to practice in engaging and interactive environments. Here, we focus on the automatic assessment of nonverbal behavior and multimodal modeling of public speaking behavior. We automatically identify audiovisual nonverbal behaviors that are correlated to expert judges' opinions of key performance aspects. These automatic assessments enable a virtual audience to provide feedback that is essential for training during a public speaking performance. We utilize multimodal ensemble tree learners to automatically approximate expert judges' evaluations to provide post-hoc performance assessments to the speakers. Our automatic performance evaluation is highly correlated with the experts' opinions with r = 0.745 for the overall performance assessments. We compare multimodal approaches with single modalities and find that the multimodal ensembles consistently outperform single modalities.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80239130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

AttentiveLearner: Adaptive Mobile MOOC Learning via Implicit Cognitive States Inference AttentiveLearner:基于内隐认知状态推断的自适应移动MOOC学习

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823297

Xiang Xiao, Phuong Pham, Jingtao Wang

This demo presents AttentiveLearner, a mobile learning system optimized for consuming lecture videos in Massive Open Online Courses (MOOCs) and flipped classrooms. AttentiveLearner uses on-lens finger gestures for video control and captures learners' physiological states through implicit heart rate tracking on unmodified mobile phones. Through three user studies to date, we found AttentiveLearner easy to learn, and intuitive to use. The heart beat waveforms captured by AttentiveLearner can be used to infer learners' cognitive states and attention. AttentiveLearner may serve as a promising supplemental feedback channel orthogonal to today's learning analytics technologies.

本演示展示了AttentiveLearner，这是一款针对大规模开放在线课程(MOOCs)和翻转课堂中消费讲座视频进行了优化的移动学习系统。AttentiveLearner使用镜头上的手指手势进行视频控制，并通过未修改的手机上的隐式心率跟踪来捕捉学习者的生理状态。通过迄今为止的三个用户研究，我们发现AttentiveLearner易于学习，使用直观。AttentiveLearner捕捉到的心跳波形可以用来推断学习者的认知状态和注意力。AttentiveLearner可以作为一个有前途的补充反馈渠道，与今天的学习分析技术正交。

引用次数: 7

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀