Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文中文

Active Haptic Feedback for Touch Enabled TV Remote 主动触觉反馈为触摸启用电视遥控器

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820768

Anton Treskunov, Mike Darnell, Rongrong Wang

Recently a number of TV manufacturers introduced TV remotes with a touchpad which is used for indirect control of TV UI. Users can navigate the UI by moving a finger across the touch pad. However, due to the latency in visual feedback, there is a disconnection between the finger movement on the touchpad and the visual perception in the TV UI, which often causes overshooting. In this paper, we investigate how haptic feedback affects the user experience of the touchpad-based TV remote. We described two haptic prototypes built on the smartphone and Samsung 2013 TV remote respectively. We conducted two user studies with two prototypes to evaluate how the user preference and the user performance been affected. The results show that there is overwhelming support of haptic feedback in terms of subjective user preference, though we didn't find significant difference in performance between with and without haptic feedback conditions.

最近，一些电视制造商推出了带有触摸板的电视遥控器，用于间接控制电视UI。用户可以通过在触摸板上移动手指来浏览UI。然而，由于视觉反馈的延迟，在触摸板上的手指运动与电视UI中的视觉感知之间存在脱节，这经常导致过冲。在本文中，我们研究了触觉反馈如何影响基于触摸板的电视遥控器的用户体验。我们分别描述了两种基于智能手机和三星2013电视遥控器的触觉原型。我们使用两个原型进行了两个用户研究，以评估用户偏好和用户性能是如何受到影响的。结果表明，在主观用户偏好方面，触觉反馈得到了压倒性的支持，尽管我们没有发现有和没有触觉反馈条件的表现有显著差异。

引用次数: 2

Multimodal Human Activity Recognition for Industrial Manufacturing Processes in Robotic Workcells 机器人工场工业制造过程的多模态人类活动识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820738

Alina Roitberg, N. Somani, A. Perzylo, Markus Rickert, A. Knoll

We present an approach for monitoring and interpreting human activities based on a novel multimodal vision-based interface, aiming at improving the efficiency of human-robot interaction (HRI) in industrial environments. Multi-modality is an important concept in this design, where we combine inputs from several state-of-the-art sensors to provide a variety of information, e.g. skeleton and fingertip poses. Based on typical industrial workflows, we derived multiple levels of human activity labels, including large-scale activities (e.g. assembly) and simpler sub-activities (e.g. hand gestures), creating a duration- and complexity-based hierarchy. We train supervised generative classifiers for each activity level and combine the output of this stage with a trained Hierarchical Hidden Markov Model (HHMM), which models not only the temporal aspects between the activities on the same level, but also the hierarchical relationships between the levels.

我们提出了一种基于新型多模态视觉界面的人类活动监测和解释方法，旨在提高工业环境中人机交互(HRI)的效率。在这个设计中，多模态是一个重要的概念，我们将几个最先进的传感器的输入结合起来，提供各种信息，例如骨骼和指尖姿势。基于典型的工业工作流，我们衍生了多个层次的人类活动标签，包括大规模活动(例如组装)和更简单的子活动(例如手势)，创建了一个基于持续时间和复杂性的层次结构。我们为每个活动级别训练监督生成分类器，并将该阶段的输出与训练的层次隐马尔可夫模型(HHMM)结合起来，该模型不仅对同一级别上活动之间的时间方面进行建模，而且还对级别之间的层次关系进行建模。

引用次数: 54

Temporal Association Rules for Modelling Multimodal Social Signals 多模态社会信号建模的时间关联规则

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823305

Thomas Janssoone

In this paper, we present the first step of a methodology dedicated to deduce automatically sequences of signals expressed by humans during an interaction. The aim is to link interpersonal stances with arrangements of social signals such as modulations of Action Units and prosody during a face-to-face exchange. The long-term goal is to infer association rules of signals. We plan to use them as an input to the animation of an Embodied Conversational Agent (ECA). In this paper, we illustrate the proposed methodology to the SEMAINE-DB corpus from which we automatically extracted Action Units (AUs), head positions, turn-taking and prosody information. We have applied the data mining algorithm that is used to find the sequences of social signals featuring different social stances. We finally discuss our primary results focusing on given AUs (smiles and eyebrows) and the perspectives of this method.

在本文中，我们提出了一种方法的第一步，该方法致力于在交互过程中自动推断人类表达的信号序列。目的是在面对面交流时将人际立场与社会信号的安排联系起来，如动作单位和韵律的变化。长期目标是推断信号的关联规则。我们计划将它们用作具体化会话代理(ECA)动画的输入。在本文中，我们将所提出的方法应用于SEMAINE-DB语料库，从中我们自动提取动作单元(au)，头部位置，轮流和韵律信息。我们应用了数据挖掘算法，该算法用于寻找具有不同社会立场的社会信号序列。我们最后讨论了我们的主要结果，集中在给定的au(微笑和眉毛)和该方法的观点。

引用次数: 5

Interact: Tightly-coupling Multimodal Dialog with an Interactive Virtual Assistant 交互:与交互式虚拟助手紧密耦合的多模态对话

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823301

Ethan Selfridge, Michael Johnston

Interact is a mobile virtual assistant that uses multimodal dialog to enable an interactive concierge experience over multiple application domains including hotel, restaurants, events, and TV search. Interact demonstrates how multi- modal interaction combined with conversational dialog en- ables a richer and more natural user experience. This demonstration will highlight incremental recognition and under- standing, multimodal speech and gesture input, context track- ing over multiple simultaneous domains, and the use of multimodal interface techniques to enable disambiguation of erors and online personalization.

Interact是一个移动虚拟助手，它使用多模式对话框在多个应用领域(包括酒店、餐厅、活动和电视搜索)实现交互式礼宾体验。交互演示了多模态交互如何与对话对话相结合，从而实现更丰富、更自然的用户体验。该演示将强调增量识别和理解，多模态语音和手势输入，多个同时域的上下文跟踪，以及使用多模态接口技术来消除错误歧义和在线个性化。

引用次数: 3

Affect Recognition using Key Frame Selection based on Minimum Sparse Reconstruction 基于最小稀疏重建的关键帧选择影响识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830594

M. Kayaoglu, Ç. Erdem

In this paper, we present the methods used for Bahcesehir University team's submissions to the 2015 Emotion Recognition in the Wild Challenge. The challenge consists of categorical emotion recognition in short video clips extracted from movies based on emotional keywords in the subtitles. The video clips mostly contain expressive faces (single or multiple) and also audio which contains the speech of the person in the clip as well as other human voices or background sounds/music. We use an audio-visual method based on video summarization by key frame selection. The key frame selection uses a minimum sparse reconstruction approach with the goal of representing the original video in the best possible way. We extract the LPQ features of the key frames and average them to determine a single feature vector that will represent the video component of the clip. In order to represent the temporal variations of the facial expression, we also use the LBP-TOP features extracted from the whole video. The audio features are extracted using OpenSMILE or RASTA-PLP methods. Video and audio features are classified using SVM classifiers and fused at the score level. We tested eight different combinations of audio and visual features on the AFEW 5.0 (Acted Facial Expressions in the Wild) database provided by the challenge organizers. The best visual and audio-visual accuracies obtained on the test set are 45.1% and 49.9% respectively, whereas the video-based baseline for the challenge is given as 39.3%.

在本文中，我们介绍了Bahcesehir大学团队向2015年野生挑战中的情感识别提交的方法。挑战包括基于字幕中的情感关键词对从电影中提取的短视频片段进行分类情感识别。视频片段大多包含表情丰富的面孔(单个或多个)，还有音频，其中包含剪辑中人物的讲话以及其他人声或背景声音/音乐。我们采用了一种基于关键帧选择的视频摘要的视听方法。关键帧选择使用最小稀疏重建方法，目标是以最好的方式表示原始视频。我们提取关键帧的LPQ特征，并对它们进行平均，以确定一个单一的特征向量，该特征向量将代表剪辑的视频分量。为了表示面部表情的时间变化，我们还使用了从整个视频中提取的LBP-TOP特征。使用OpenSMILE或RASTA-PLP方法提取音频特征。使用SVM分类器对视频和音频特征进行分类，并在分数水平上进行融合。我们在挑战组织者提供的AFEW 5.0(野外面部表情)数据库上测试了八种不同的音频和视觉特征组合。在测试集上获得的最佳视觉和视听准确度分别为45.1%和49.9%，而基于视频的挑战基线为39.3%。

{"title":"Affect Recognition using Key Frame Selection based on Minimum Sparse Reconstruction","authors":"M. Kayaoglu, Ç. Erdem","doi":"10.1145/2818346.2830594","DOIUrl":"https://doi.org/10.1145/2818346.2830594","url":null,"abstract":"In this paper, we present the methods used for Bahcesehir University team's submissions to the 2015 Emotion Recognition in the Wild Challenge. The challenge consists of categorical emotion recognition in short video clips extracted from movies based on emotional keywords in the subtitles. The video clips mostly contain expressive faces (single or multiple) and also audio which contains the speech of the person in the clip as well as other human voices or background sounds/music. We use an audio-visual method based on video summarization by key frame selection. The key frame selection uses a minimum sparse reconstruction approach with the goal of representing the original video in the best possible way. We extract the LPQ features of the key frames and average them to determine a single feature vector that will represent the video component of the clip. In order to represent the temporal variations of the facial expression, we also use the LBP-TOP features extracted from the whole video. The audio features are extracted using OpenSMILE or RASTA-PLP methods. Video and audio features are classified using SVM classifiers and fused at the score level. We tested eight different combinations of audio and visual features on the AFEW 5.0 (Acted Facial Expressions in the Wild) database provided by the challenge organizers. The best visual and audio-visual accuracies obtained on the test set are 45.1% and 49.9% respectively, whereas the video-based baseline for the challenge is given as 39.3%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89616053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Wir im Kiez: Multimodal App for Mutual Help Among Elderly Neighbours Wir im Kiez:老年邻居互助的多模式应用程序

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823300

S. Schmeier, Aaron Ruß, Norbert Reithinger

Elderly people often need support in everyday situations -- e.g. common daily life activities like taking care of house and garden, or caring for an animal are often not possible without a larger support circle. However, especially in larger western cities, local social networks may not be very tight, friends may have moved away or died, and the traditional support structures found in so-called multi-generational families do not exist anymore. As a result, the quality of life for elderly people suffers crucially. On the other hand, people from the broader neighborhood would often gladly help and respond quickly. With the project Wir im Kiez we developed and tested a multimodal social network app equipped with a conversational interface that addresses these issues. In the demonstration, we especially focus on the needs and restrictions of seniors, both in their physical and psychological limitations.

老年人在日常生活中经常需要帮助，例如，如果没有一个更大的支持圈，像照顾房子和花园或照顾动物这样的日常生活活动往往是不可能的。然而，特别是在较大的西方城市，当地的社交网络可能不是很紧密，朋友可能已经搬走或去世，所谓的几代同堂家庭中的传统支持结构已经不复存在。因此，老年人的生活质量受到严重影响。另一方面，来自更广泛社区的人通常会很乐意帮助并迅速作出反应。在Wir im Kiez项目中，我们开发并测试了一个多模式社交网络应用程序，该应用程序配备了一个对话界面，可以解决这些问题。在演示中，我们特别关注老年人的需求和限制，包括他们的生理和心理限制。

引用次数: 2

Implicit User-centric Personality Recognition Based on Physiological Responses to Emotional Videos 基于情感视频生理反应的内隐用户中心人格识别

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820736

Julia Wache, Subramanian Ramanathan, M. K. Abadi, R. Vieriu, N. Sebe, Stefan Winkler

We present a novel framework for recognizing personality traits based on users' physiological responses to affective movie clips. Extending studies that have correlated explicit/implicit affective user responses with Extraversion and Neuroticism traits, we perform single-trial recognition of the big-five traits from Electrocardiogram (ECG), Galvanic Skin Response (GSR), Electroencephalogram (EEG) and facial emotional responses compiled from 36 users using off-the-shelf sensors. Firstly, we examine relationships among personality scales and (explicit) affective user ratings acquired in the context of prior observations. Secondly, we isolate physiological correlates of personality traits. Finally, unimodal and multimodal personality recognition results are presented. Personality differences are better revealed while analyzing responses to emotionally homogeneous (e.g., high valence, high arousal) clips, and significantly above-chance recognition is achieved for all five traits.

我们提出了一个基于用户对情感电影片段的生理反应来识别人格特征的新框架。在将外显/内隐情感用户反应与外向性和神经质特征相关联的研究基础上，我们使用现成的传感器对36名用户的心电图(ECG)、皮肤电反应(GSR)、脑电图(EEG)和面部情绪反应进行了单次试验识别。首先，我们研究了在先前观察的背景下获得的人格量表和(明确的)情感用户评级之间的关系。其次，我们分离出人格特质的生理关联。最后给出了单模态和多模态人格识别结果。在分析对情感同质(例如，高效价、高唤醒)片段的反应时，人格差异得到了更好的揭示，并且对所有五个特征的识别都显著高于机会。

引用次数: 16

Session details: Keynote Address 2 会议详情:主题演讲2

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/3252444

P. Cohen

引用次数: 0

Individuality-Preserving Voice Reconstruction for Articulation Disorders Using Text-to-Speech Synthesis 基于文本-语音合成的保留个性的发音障碍重建

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820770

Reina Ueda, T. Takiguchi, Y. Ariki

This paper presents a speech synthesis method for people with articulation disorders. Because the movements of such speakers are limited by their athetoid symptoms, their prosody is often unstable and their speech rate differs from that of a physically unimpaired person, which causes their speech to be less intelligible and, consequently, makes communication with physically unimpaired persons difficult. In order to deal with these problems, this paper describes a Hidden Markov Model(HMM)-based text-to-speech synthesis approach that preserves the individuality of a person with an articulation disorder and aids them in their communication. In our method, a duration model of a physically unimpaired person is used for the HMM synthesis system and an F0 model in the system is trained using the F0 patterns of the physically unimpaired person, with the average F0 being converted to the target F0 in advance. In order to preserve the target speaker's individuality, a spectral model is built from target spectra. Through experimental evaluations, we have confirmed that the proposed method successfully synthesizes intelligible speech while maintaining the target speaker's individuality.

本文提出了一种针对发音障碍患者的语音合成方法。由于这些说话者的运动受到他们的动脉状突症状的限制，他们的韵律往往不稳定，他们的说话速度与身体健全的人不同，这导致他们的讲话不太容易理解，因此，与身体健全的人交流很困难。为了解决这些问题，本文描述了一种基于隐马尔可夫模型(HMM)的文本到语音合成方法，该方法保留了发音障碍患者的个性并帮助他们进行交流。在我们的方法中，HMM综合系统使用一个身体健全者的持续时间模型，并使用身体健全者的F0模式训练系统中的F0模型，并将平均F0提前转换为目标F0。为了保持目标说话人的个性，根据目标说话人的谱建立了一个谱模型。实验结果表明，该方法在保持目标说话人个性的同时，成功地合成了可理解语音。

{"title":"Individuality-Preserving Voice Reconstruction for Articulation Disorders Using Text-to-Speech Synthesis","authors":"Reina Ueda, T. Takiguchi, Y. Ariki","doi":"10.1145/2818346.2820770","DOIUrl":"https://doi.org/10.1145/2818346.2820770","url":null,"abstract":"This paper presents a speech synthesis method for people with articulation disorders. Because the movements of such speakers are limited by their athetoid symptoms, their prosody is often unstable and their speech rate differs from that of a physically unimpaired person, which causes their speech to be less intelligible and, consequently, makes communication with physically unimpaired persons difficult. In order to deal with these problems, this paper describes a Hidden Markov Model(HMM)-based text-to-speech synthesis approach that preserves the individuality of a person with an articulation disorder and aids them in their communication. In our method, a duration model of a physically unimpaired person is used for the HMM synthesis system and an F0 model in the system is trained using the F0 patterns of the physically unimpaired person, with the average F0 being converted to the target F0 in advance. In order to preserve the target speaker's individuality, a spectral model is built from target spectra. Through experimental evaluations, we have confirmed that the proposed method successfully synthesizes intelligible speech while maintaining the target speaker's individuality.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80138812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Session details: Oral Session 4: Communication Dynamics 会话内容:口语会话4:交流动态

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/3252449

Louis-Philippe Morency

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀