首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
Transductive Transfer LDA with Riesz-based Volume LBP for Emotion Recognition in The Wild 基于riesz的体积LBP在野外情绪识别中的转导传递LDA
Yuan Zong, Wenming Zheng, Xiaohua Huang, Jingwei Yan, T. Zhang
In this paper, we propose the method using Transductive Transfer Linear Discriminant Analysis (TTLDA) and Riesz-based Volume Local Binary Patterns (RVLBP) for image based static facial expression recognition challenge of the Emotion Recognition in the Wild Challenge (EmotiW 2015). The task of this challenge is to assign facial expression labels to frames of some movies containing a face under the real word environment. In our method, we firstly employ a multi-scale image partition scheme to divide each face image into some image blocks and use RVLBP features extracted from each block to describe each facial image. Then, we adopt the TTLDA approach based on RVLBP to cope with the expression recognition task. The experiments on the testing data of SFEW 2.0 database, which is used for image based static facial expression challenge, demonstrate that our method achieves the accuracy of 50%. This result has a 10.87% improvement over the baseline provided by this challenge organizer.
在本文中,我们提出了使用转导传递线性判别分析(TTLDA)和基于riesz的体积局部二值模式(RVLBP)的方法来进行基于图像的静态面部表情识别挑战的情绪识别在野生挑战(EmotiW 2015)。这个挑战的任务是在真实的文字环境下为一些包含人脸的电影帧分配面部表情标签。该方法首先采用一种多尺度图像分割方案,将人脸图像分成若干图像块,并利用从每个图像块中提取的RVLBP特征来描述每幅人脸图像。然后,我们采用基于RVLBP的TTLDA方法来处理表情识别任务。在基于图像的静态面部表情挑战的SFEW 2.0数据库测试数据上进行的实验表明,我们的方法达到了50%的准确率。这个结果比这个挑战组织者提供的基线提高了10.87%。
{"title":"Transductive Transfer LDA with Riesz-based Volume LBP for Emotion Recognition in The Wild","authors":"Yuan Zong, Wenming Zheng, Xiaohua Huang, Jingwei Yan, T. Zhang","doi":"10.1145/2818346.2830584","DOIUrl":"https://doi.org/10.1145/2818346.2830584","url":null,"abstract":"In this paper, we propose the method using Transductive Transfer Linear Discriminant Analysis (TTLDA) and Riesz-based Volume Local Binary Patterns (RVLBP) for image based static facial expression recognition challenge of the Emotion Recognition in the Wild Challenge (EmotiW 2015). The task of this challenge is to assign facial expression labels to frames of some movies containing a face under the real word environment. In our method, we firstly employ a multi-scale image partition scheme to divide each face image into some image blocks and use RVLBP features extracted from each block to describe each facial image. Then, we adopt the TTLDA approach based on RVLBP to cope with the expression recognition task. The experiments on the testing data of SFEW 2.0 database, which is used for image based static facial expression challenge, demonstrate that our method achieves the accuracy of 50%. This result has a 10.87% improvement over the baseline provided by this challenge organizer.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82596697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Experiment on the Feasibility of Spatial Acquisition using a Moving Auditory Cue for Pedestrian Navigation 基于运动听觉线索的行人导航空间获取可行性实验
Yeseul Park, Kyle Koh, Heonjin Park, Jinwook Seo
We conducted a feasibility study on the use of a moving auditory cue for spatial acquisition for pedestrian navigation by comparing its performance with a static auditory cue, the use of which has been investigated in previous studies. To investigate the performance of human sound azimuthal localization, we designed and conducted a controlled experiment with 15 participants and found that performance was statistically significantly more accurate with an auditory source moving from the opposite direction over users' heads to the target direction than with a static sound. Based on this finding, we designed a bimodal pedestrian navigation system using both visual and auditory feedback. We evaluated the system by conducting a field study with four users and received overall positive feedback.
我们通过将移动听觉线索与静态听觉线索的性能进行比较,对行人导航中使用移动听觉线索进行空间获取的可行性进行了研究。为了研究人类声音方位定位的表现,我们设计并进行了一个有15名参与者的对照实验,发现与静态声音相比,声源从用户头部的相反方向移动到目标方向时,表现的准确性在统计上显著提高。基于这一发现,我们设计了一个使用视觉和听觉反馈的双峰行人导航系统。我们通过与四位用户进行实地研究来评估该系统,并获得了总体上的积极反馈。
{"title":"An Experiment on the Feasibility of Spatial Acquisition using a Moving Auditory Cue for Pedestrian Navigation","authors":"Yeseul Park, Kyle Koh, Heonjin Park, Jinwook Seo","doi":"10.1145/2818346.2820779","DOIUrl":"https://doi.org/10.1145/2818346.2820779","url":null,"abstract":"We conducted a feasibility study on the use of a moving auditory cue for spatial acquisition for pedestrian navigation by comparing its performance with a static auditory cue, the use of which has been investigated in previous studies. To investigate the performance of human sound azimuthal localization, we designed and conducted a controlled experiment with 15 participants and found that performance was statistically significantly more accurate with an auditory source moving from the opposite direction over users' heads to the target direction than with a static sound. Based on this finding, we designed a bimodal pedestrian navigation system using both visual and auditory feedback. We evaluated the system by conducting a field study with four users and received overall positive feedback.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84069178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Record, Transform & Reproduce Social Encounters in Immersive VR: An Iterative Approach 在沉浸式VR中记录、转换和再现社交遭遇:一种迭代方法
Jan Kolkmeier
Immersive Virtual Reality Environments that can be accessed through multimodal natural interfaces will bring new affordances to mediated interaction with virtual embodied agents and avatars. Such interfaces will measure, amongst others, users' poses and motion which can be copied to an embodied avatar representation of the user that is situated in a virtual or augmented reality space shared with autonomous virtual agents and human controlled or semi-autonomous avatars. Designers of such environments will be challenged to facilitate believable social interactions by creating agents or semi-autonomous avatars that can respond meaningfully to users' natural behaviors, as captured by these interfaces. In our future research, we aim to realize such interactions to create rich social encounters in immersive Virtual Reality. In this current work, we present the approach we envisage to analyze and learn agent behavior from human-agent interaction in an iterative fashion. We specifically look at small-scale, `regulative' nonverbal behaviors. Agents inform their behavior on previous observations, observing responses that these behaviors elicit in new users, thus iteratively generating corpora of short, situated human-agent interaction sequences that are to be analyzed, annotated and processed to generate socially intelligent agent behavior. Some choices and challenges of this approach are discussed.
可以通过多模态自然界面访问的沉浸式虚拟现实环境将为与虚拟实体代理和虚拟化身的中介交互带来新的启示。这些界面将测量用户的姿势和动作,这些姿势和动作可以复制到用户的具体化身表示中,该化身位于与自主虚拟代理和人类控制或半自主虚拟化身共享的虚拟或增强现实空间中。这种环境的设计师将面临挑战,通过创建代理或半自主的化身来促进可信的社交互动,这些化身可以对用户的自然行为做出有意义的反应,就像这些界面所捕获的那样。在未来的研究中,我们的目标是实现这种互动,在沉浸式虚拟现实中创造丰富的社交体验。在当前的工作中,我们提出了我们设想的方法,以迭代的方式从人-agent交互中分析和学习agent行为。我们特别关注小规模的、“调节的”非语言行为。智能体根据之前的观察来告知自己的行为,观察这些行为在新用户中引发的反应,从而迭代地生成简短的、定位的人-智能体交互序列的语料库,这些语料库将被分析、注释和处理,以生成具有社会智能的智能体行为。讨论了该方法的一些选择和挑战。
{"title":"Record, Transform & Reproduce Social Encounters in Immersive VR: An Iterative Approach","authors":"Jan Kolkmeier","doi":"10.1145/2818346.2823314","DOIUrl":"https://doi.org/10.1145/2818346.2823314","url":null,"abstract":"Immersive Virtual Reality Environments that can be accessed through multimodal natural interfaces will bring new affordances to mediated interaction with virtual embodied agents and avatars. Such interfaces will measure, amongst others, users' poses and motion which can be copied to an embodied avatar representation of the user that is situated in a virtual or augmented reality space shared with autonomous virtual agents and human controlled or semi-autonomous avatars. Designers of such environments will be challenged to facilitate believable social interactions by creating agents or semi-autonomous avatars that can respond meaningfully to users' natural behaviors, as captured by these interfaces. In our future research, we aim to realize such interactions to create rich social encounters in immersive Virtual Reality. In this current work, we present the approach we envisage to analyze and learn agent behavior from human-agent interaction in an iterative fashion. We specifically look at small-scale, `regulative' nonverbal behaviors. Agents inform their behavior on previous observations, observing responses that these behaviors elicit in new users, thus iteratively generating corpora of short, situated human-agent interaction sequences that are to be analyzed, annotated and processed to generate socially intelligent agent behavior. Some choices and challenges of this approach are discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80740630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns 基于卷积神经网络和映射二进制模式的野外情绪识别
Gil Levi, Tal Hassner
We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).
我们提出了一种从静态面部图像中分类情绪的新方法。我们的方法利用了卷积神经网络(CNN)最近在人脸识别问题上的成功。与通常假设的设置不同,用于训练情绪分类系统的标记数据通常要少得多。因此,我们的方法旨在通过消除输入图像中的混淆因素来简化问题域,并强调图像照明的变化。这是为了减少有效训练深度CNN模型所需的数据量。为此,我们提出了新的图像强度到三维空间的变换,设计成对单调光度变换不变。这些应用于CASIA Webface图像,然后用于在多个表示上训练多个架构cnn的集合。然后用有限的情感标记训练数据对每个模型进行微调,以获得最终的分类模型。我们的方法在野生挑战中的情绪识别(EmotiW 2015)、静态面部表情识别子挑战(SFEW)上进行了测试,结果显示,与基线结果相比,我们的方法提供了15.36%的显著改进(性能提高40%)。
{"title":"Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns","authors":"Gil Levi, Tal Hassner","doi":"10.1145/2818346.2830587","DOIUrl":"https://doi.org/10.1145/2818346.2830587","url":null,"abstract":"We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78837247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 303
The Grenoble System for the Social Touch Challenge at ICMI 2015 2015年ICMI社交接触挑战的格勒诺布尔系统
Viet-Cuong Ta, W. Johal, Maxime Portaz, Eric Castelli, D. Vaufreydaz
New technologies and especially robotics is going towards more natural user interfaces. Works have been done in different modality of interaction such as sight (visual computing), and audio (speech and audio recognition) but some other modalities are still less researched. The touch modality is one of the less studied in HRI but could be valuable for naturalistic interaction. However touch signals can vary in semantics. It is therefore necessary to be able to recognize touch gestures in order to make human-robot interaction even more natural. We propose a method to recognize touch gestures. This method was developed on the CoST corpus and then directly applied on the HAART dataset as a participation of the Social Touch Challenge at ICMI 2015. Our touch gesture recognition process is detailed in this article to make it reproducible by other research teams. Besides features set description, we manually filtered the training corpus to produce 2 datasets. For the challenge, we submitted 6 different systems. A Support Vector Machine and a Random Forest classifiers for the HAART dataset. For the CoST dataset, the same classifiers are tested in two conditions: using all or filtered training datasets. As reported by organizers, our systems have the best correct rate in this year's challenge (70.91% on HAART, 61.34% on CoST). Our performances are slightly better that other participants but stay under previous reported state-of-the-art results.
新技术,尤其是机器人技术正朝着更自然的用户界面发展。在视觉(视觉计算)和音频(语音和音频识别)等不同的交互模式下已经进行了工作,但其他一些模式的研究仍然较少。触摸方式是HRI中研究较少的方式之一,但对于自然交互可能有价值。然而,触摸信号在语义上是不同的。因此,为了使人机交互更加自然,有必要能够识别触摸手势。我们提出了一种识别触摸手势的方法。该方法是在成本语料库上开发的,然后直接应用于HAART数据集,作为ICMI 2015社交接触挑战的一部分。我们的触摸手势识别过程在这篇文章中详细说明,使其可复制的其他研究小组。除了特征集描述外,我们还对训练语料库进行了手动过滤,生成了2个数据集。为了挑战,我们提交了6个不同的系统。HAART数据集的支持向量机和随机森林分类器。对于CoST数据集,在两种情况下测试相同的分类器:使用全部或过滤的训练数据集。根据组织者的报告,我们的系统在今年的挑战中有最好的正确率(HAART为70.91%,CoST为61.34%)。我们的表现略好于其他参与者,但仍处于先前报告的最先进水平。
{"title":"The Grenoble System for the Social Touch Challenge at ICMI 2015","authors":"Viet-Cuong Ta, W. Johal, Maxime Portaz, Eric Castelli, D. Vaufreydaz","doi":"10.1145/2818346.2830598","DOIUrl":"https://doi.org/10.1145/2818346.2830598","url":null,"abstract":"New technologies and especially robotics is going towards more natural user interfaces. Works have been done in different modality of interaction such as sight (visual computing), and audio (speech and audio recognition) but some other modalities are still less researched. The touch modality is one of the less studied in HRI but could be valuable for naturalistic interaction. However touch signals can vary in semantics. It is therefore necessary to be able to recognize touch gestures in order to make human-robot interaction even more natural. We propose a method to recognize touch gestures. This method was developed on the CoST corpus and then directly applied on the HAART dataset as a participation of the Social Touch Challenge at ICMI 2015. Our touch gesture recognition process is detailed in this article to make it reproducible by other research teams. Besides features set description, we manually filtered the training corpus to produce 2 datasets. For the challenge, we submitted 6 different systems. A Support Vector Machine and a Random Forest classifiers for the HAART dataset. For the CoST dataset, the same classifiers are tested in two conditions: using all or filtered training datasets. As reported by organizers, our systems have the best correct rate in this year's challenge (70.91% on HAART, 61.34% on CoST). Our performances are slightly better that other participants but stay under previous reported state-of-the-art results.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86553547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Software Techniques for Multimodal Input Processing in Realtime Interactive Systems 实时交互系统中多模态输入处理软件技术
Martin Fischbach
Multimodal interaction frameworks are an efficient means of utilizing many existing processing and fusion techniques in a wide variety of application areas, even by non-experts. However, the application of these frameworks to highly interactive application areas like VR, AR, MR, and computer games in a reusable, modifiable, and modular manner is not straightforward. It currently lacks some software technical solutions that (1) preserve the general decoupling principle of platforms and at the same time (2) provide the required close temporal as well as semantic coupling of involved software modules and multimodal processing steps. This thesis approches current challenges and aims at providing the research community with a framework that fosters repeatability of scientific achievements and the ability to built on previous results.
多模态交互框架是一种在广泛的应用领域中利用许多现有处理和融合技术的有效手段,即使是非专家也是如此。然而,将这些框架以可重用、可修改和模块化的方式应用于VR、AR、MR和计算机游戏等高度交互的应用领域并不简单。目前缺乏一些软件技术解决方案(1)保留平台的一般解耦原则,同时(2)提供所涉及的软件模块和多模态处理步骤所需的紧密时间耦合和语义耦合。本文探讨当前的挑战,旨在为研究界提供一个框架,以促进科学成就的可重复性和建立在先前结果基础上的能力。
{"title":"Software Techniques for Multimodal Input Processing in Realtime Interactive Systems","authors":"Martin Fischbach","doi":"10.1145/2818346.2823308","DOIUrl":"https://doi.org/10.1145/2818346.2823308","url":null,"abstract":"Multimodal interaction frameworks are an efficient means of utilizing many existing processing and fusion techniques in a wide variety of application areas, even by non-experts. However, the application of these frameworks to highly interactive application areas like VR, AR, MR, and computer games in a reusable, modifiable, and modular manner is not straightforward. It currently lacks some software technical solutions that (1) preserve the general decoupling principle of platforms and at the same time (2) provide the required close temporal as well as semantic coupling of involved software modules and multimodal processing steps. This thesis approches current challenges and aims at providing the research community with a framework that fosters repeatability of scientific achievements and the ability to built on previous results.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88977095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Effects of Good Speaking Techniques on Audience Engagement 良好的演讲技巧对听众参与的影响
Keith Curtis, G. Jones, N. Campbell
Understanding audience engagement levels for presentations has the potential to enable richer and more focused interaction with audio-visual recordings. We describe an investigation into automated analysis of multimodal recordings of scientific talks where the use of modalities most typically associated with engagement such as eye-gaze is not feasible. We first study visual and acoustic features to identify those most commonly associated with good speaking techniques. To understand audience interpretation of good speaking techniques, we angaged human annotators to rate the qualities of the speaker for a series of 30-second video segments taken from a corpus of 9 hours of presentations from an academic conference. Our annotators also watched corresponding video recordings of the audience to presentations to estimate the level of audience engagement for each talk. We then explored the effectiveness of multimodal features extracted from the presentation video against Likert-scale ratings of each speaker as assigned by the annotators. and on manually labelled audience engagement levels. These features were used to build a classifier to rate the qualities of a new speaker. This was able classify a rating for a presenter over an 8-class range with an accuracy of 52%. By combining these classes to a 4-class range accuracy increases to 73%. We analyse linear correlations with individual speaker-based modalities and actual audience engagement levels to understand the corresponding effect on audience engagement. A further classifier was then built to predict the level of audience engagement to a presentation by analysing the speaker's use of acoustic and visual cues. Using these speaker based modalities pre-fused with speaker ratings only, we are able to predict actual audience engagement levels with an accuracy of 68%. By combining with basic visual features from the audience as whole, we are able to improve this to an accuracy of 70%.
了解听众对演讲的参与程度,有可能实现更丰富、更集中的视听录音互动。我们描述了一项对科学谈话的多模态录音自动分析的调查,其中使用最典型的与参与相关的模态(如眼睛注视)是不可行的。我们首先研究视觉和听觉特征,以确定那些与良好的说话技巧最相关的特征。为了了解观众对良好演讲技巧的理解,我们聘请了人类注释员,对演讲者的质量进行评分,这些视频片段取自一次学术会议上9小时的演讲语料库,时长30秒。我们的注释员还观看了听众对演讲的相应视频记录,以估计听众对每次演讲的参与程度。然后,我们探索了从演示视频中提取的多模态特征与注释者分配的每个演讲者的李克特量表评分的有效性。以及人工标记的观众参与水平。这些特征被用来建立一个分类器来评价一个新说话者的品质。它能够在8级范围内对演讲者进行分类,准确率为52%。通过将这些等级组合为4级,射程精度提高到73%。我们分析了基于个人演讲者的模式和实际听众参与水平之间的线性相关性,以了解对听众参与的相应影响。然后建立了一个进一步的分类器,通过分析演讲者使用的声音和视觉线索来预测听众对演讲的参与程度。使用这些基于演讲者的模式,仅与演讲者评级预融合,我们能够以68%的准确率预测实际听众参与水平。通过结合观众整体的基本视觉特征,我们能够将准确率提高到70%。
{"title":"Effects of Good Speaking Techniques on Audience Engagement","authors":"Keith Curtis, G. Jones, N. Campbell","doi":"10.1145/2818346.2820766","DOIUrl":"https://doi.org/10.1145/2818346.2820766","url":null,"abstract":"Understanding audience engagement levels for presentations has the potential to enable richer and more focused interaction with audio-visual recordings. We describe an investigation into automated analysis of multimodal recordings of scientific talks where the use of modalities most typically associated with engagement such as eye-gaze is not feasible. We first study visual and acoustic features to identify those most commonly associated with good speaking techniques. To understand audience interpretation of good speaking techniques, we angaged human annotators to rate the qualities of the speaker for a series of 30-second video segments taken from a corpus of 9 hours of presentations from an academic conference. Our annotators also watched corresponding video recordings of the audience to presentations to estimate the level of audience engagement for each talk. We then explored the effectiveness of multimodal features extracted from the presentation video against Likert-scale ratings of each speaker as assigned by the annotators. and on manually labelled audience engagement levels. These features were used to build a classifier to rate the qualities of a new speaker. This was able classify a rating for a presenter over an 8-class range with an accuracy of 52%. By combining these classes to a 4-class range accuracy increases to 73%. We analyse linear correlations with individual speaker-based modalities and actual audience engagement levels to understand the corresponding effect on audience engagement. A further classifier was then built to predict the level of audience engagement to a presentation by analysing the speaker's use of acoustic and visual cues. Using these speaker based modalities pre-fused with speaker ratings only, we are able to predict actual audience engagement levels with an accuracy of 68%. By combining with basic visual features from the audience as whole, we are able to improve this to an accuracy of 70%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79555617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multiple Models Fusion for Emotion Recognition in the Wild 基于多模型融合的野外情绪识别
Jianlong Wu, Zhouchen Lin, H. Zha
Emotion recognition in the wild is a very challenging task. In this paper, we propose a multiple models fusion method to automatically recognize the expression in the video clip as part of the third Emotion Recognition in the Wild Challenge (EmotiW 2015). In our method, we first extract dense SIFT, LBP-TOP and audio features from each video clip. For dense SIFT features, we use the bag of features (BoF) model with two different encoding methods (locality-constrained linear coding and group saliency based coding) to further represent it. During the classification process, we use partial least square regression to calculate the regression value of each model. By learning the optimal weight of each model based on the regression value, we fuse these models together. We conduct experiments on the given validation and test datasets, and achieve superior performance. The best recognition accuracy of our fusion method is 52.50% on the test dataset, which is 13.17% higher than the challenge baseline accuracy of 39.33%.
在野外进行情绪识别是一项非常具有挑战性的任务。在本文中,我们提出了一种多模型融合方法来自动识别视频片段中的表情,作为第三次野生挑战中的情感识别(EmotiW 2015)的一部分。在我们的方法中,我们首先从每个视频片段中提取密集的SIFT, LBP-TOP和音频特征。对于密集SIFT特征,采用两种不同编码方法(位置约束线性编码和基于群显著性编码)的特征包(BoF)模型对其进行进一步表示。在分类过程中,我们使用偏最小二乘回归来计算每个模型的回归值。通过学习每个模型基于回归值的最优权值,将这些模型融合在一起。我们在给定的验证和测试数据集上进行了实验,并取得了优异的性能。在测试数据集上,我们的融合方法的最佳识别准确率为52.50%,比挑战基线的39.33%提高了13.17%。
{"title":"Multiple Models Fusion for Emotion Recognition in the Wild","authors":"Jianlong Wu, Zhouchen Lin, H. Zha","doi":"10.1145/2818346.2830582","DOIUrl":"https://doi.org/10.1145/2818346.2830582","url":null,"abstract":"Emotion recognition in the wild is a very challenging task. In this paper, we propose a multiple models fusion method to automatically recognize the expression in the video clip as part of the third Emotion Recognition in the Wild Challenge (EmotiW 2015). In our method, we first extract dense SIFT, LBP-TOP and audio features from each video clip. For dense SIFT features, we use the bag of features (BoF) model with two different encoding methods (locality-constrained linear coding and group saliency based coding) to further represent it. During the classification process, we use partial least square regression to calculate the regression value of each model. By learning the optimal weight of each model based on the regression value, we fuse these models together. We conduct experiments on the given validation and test datasets, and achieve superior performance. The best recognition accuracy of our fusion method is 52.50% on the test dataset, which is 13.17% higher than the challenge baseline accuracy of 39.33%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81101790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Gait and Postural Sway Analysis, A Multi-Modal System 步态和姿态摇摆分析,一个多模态系统
Hafsa Ismail
Detecting a fall before it actually happens will positively affect lives of the elderly. While the main causes of falling are related to postural sway and walking, determining abnormalities in one of these activities or both of them would be informative to predicting the fall probability. A need exists for a portable gait and postural sway analysis system that can provide individuals with real-time information about changes and quality of gait in the real world, not just in a laboratory. In this research project I aim to build a multi-modal system that finds the correlation between vision extracted features and accelerometer and force plate data to determine a general gait and body sway pattern. Then this information is used to assess a difference to normative age and gender relevant patterns as well as any changes over time. This could provide a core indicator of broader health and function in ageing and disease.
在跌倒发生之前发现它将对老年人的生活产生积极影响。虽然跌倒的主要原因与姿势摇摆和行走有关,但确定其中一项或两项活动的异常情况将有助于预测跌倒的可能性。需要一种便携式步态和姿势摆动分析系统,它可以为个人提供真实世界中步态变化和质量的实时信息,而不仅仅是在实验室中。在这个研究项目中,我的目标是建立一个多模态系统,找到视觉提取的特征与加速度计和力板数据之间的相关性,以确定一般的步态和身体摆动模式。然后,这些信息被用来评估与标准年龄和性别相关模式的差异,以及随时间的任何变化。这可以提供一个更广泛的健康和衰老和疾病功能的核心指标。
{"title":"Gait and Postural Sway Analysis, A Multi-Modal System","authors":"Hafsa Ismail","doi":"10.1145/2818346.2823310","DOIUrl":"https://doi.org/10.1145/2818346.2823310","url":null,"abstract":"Detecting a fall before it actually happens will positively affect lives of the elderly. While the main causes of falling are related to postural sway and walking, determining abnormalities in one of these activities or both of them would be informative to predicting the fall probability. A need exists for a portable gait and postural sway analysis system that can provide individuals with real-time information about changes and quality of gait in the real world, not just in a laboratory. In this research project I aim to build a multi-modal system that finds the correlation between vision extracted features and accelerometer and force plate data to determine a general gait and body sway pattern. Then this information is used to assess a difference to normative age and gender relevant patterns as well as any changes over time. This could provide a core indicator of broader health and function in ageing and disease.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88636158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Poster Session 会议详情:海报会议
R. Horaud, D. Bohus
{"title":"Session details: Poster Session","authors":"R. Horaud, D. Bohus","doi":"10.1145/3252452","DOIUrl":"https://doi.org/10.1145/3252452","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85623048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1