首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition 基于指数加权决策融合的深度cnn分层委员会静态面部表情识别
Bo-Kyeong Kim, Hwaran Lee, Jihyeon Roh, Soo-Young Lee
We present a pattern recognition framework to improve committee machines of deep convolutional neural networks (deep CNNs) and its application to static facial expression recognition in the wild (SFEW). In order to generate enough diversity of decisions, we trained multiple deep CNNs by varying network architectures, input normalization, and weight initialization as well as by adopting several learning strategies to use large external databases. Moreover, with these deep models, we formed hierarchical committees using the validation-accuracy-based exponentially-weighted average (VA-Expo-WA) rule. Through extensive experiments, the great strengths of our committee machines were demonstrated in both structural and decisional ways. On the SFEW2.0 dataset released for the 3rd Emotion Recognition in the Wild (EmotiW) sub-challenge, a test accuracy of 57.3% was obtained from the best single deep CNN, while the single-level committees yielded 58.3% and 60.5% with the simple average rule and with the VA-Expo-WA rule, respectively. Our final submission based on the 3-level hierarchy using the VA-Expo-WA achieved 61.6%, significantly higher than the SFEW baseline of 39.1%.
我们提出了一个模式识别框架,以改进深度卷积神经网络(deep cnn)的委员会机及其在静态面部表情识别(SFEW)中的应用。为了产生足够的决策多样性,我们通过不同的网络架构、输入归一化和权重初始化以及采用多种学习策略来使用大型外部数据库来训练多个深度cnn。此外,通过这些深度模型,我们使用基于验证精度的指数加权平均(VA-Expo-WA)规则组建了分层委员会。通过广泛的实验,我们的委员会机器在结构和决策方面的巨大优势得到了证明。在面向第三次情感识别(EmotiW)子挑战发布的SFEW2.0数据集上,最佳单深度CNN的测试准确率为57.3%,而使用简单平均规则和VA-Expo-WA规则的单级别委员会的测试准确率分别为58.3%和60.5%。我们使用VA-Expo-WA基于三级层次的最终提交获得了61.6%,显著高于SFEW基线的39.1%。
{"title":"Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition","authors":"Bo-Kyeong Kim, Hwaran Lee, Jihyeon Roh, Soo-Young Lee","doi":"10.1145/2818346.2830590","DOIUrl":"https://doi.org/10.1145/2818346.2830590","url":null,"abstract":"We present a pattern recognition framework to improve committee machines of deep convolutional neural networks (deep CNNs) and its application to static facial expression recognition in the wild (SFEW). In order to generate enough diversity of decisions, we trained multiple deep CNNs by varying network architectures, input normalization, and weight initialization as well as by adopting several learning strategies to use large external databases. Moreover, with these deep models, we formed hierarchical committees using the validation-accuracy-based exponentially-weighted average (VA-Expo-WA) rule. Through extensive experiments, the great strengths of our committee machines were demonstrated in both structural and decisional ways. On the SFEW2.0 dataset released for the 3rd Emotion Recognition in the Wild (EmotiW) sub-challenge, a test accuracy of 57.3% was obtained from the best single deep CNN, while the single-level committees yielded 58.3% and 60.5% with the simple average rule and with the VA-Expo-WA rule, respectively. Our final submission based on the 3-level hierarchy using the VA-Expo-WA achieved 61.6%, significantly higher than the SFEW baseline of 39.1%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79678040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
Session details: Oral Session 5: Interaction Techniques 会话内容:会话5:互动技巧
S. Oviatt
{"title":"Session details: Oral Session 5: Interaction Techniques","authors":"S. Oviatt","doi":"10.1145/3252450","DOIUrl":"https://doi.org/10.1145/3252450","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84746040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Keynote Address 1 会议详情:主题演讲
Zhengyou Zhang
{"title":"Session details: Keynote Address 1","authors":"Zhengyou Zhang","doi":"10.1145/3252443","DOIUrl":"https://doi.org/10.1145/3252443","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"46 5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83182668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nakama: A Companion for Non-verbal Affective Communication Nakama:非语言情感交流的伴侣
Christian J. A. M. Willemse, G. M. Munters, J. V. Erp, D. Heylen
We present "Nakama": A communication device that supports affective communication between a child and its - geographically separated - parent. Nakama consists of a control unit at the parent's end and an actuated teddy bear for the child. The bear contains several communication channels, including social touch, temperature, and vibrotactile heartbeats; all aimed at increasing the sense of presence. The current version of Nakama is suitable for user evaluations in lab settings, with which we aim to gain a more thorough understanding of the opportunities and limitations of these less traditional communication channels.
我们提出了“Nakama”:一种通信设备,支持儿童和地理上分离的父母之间的情感交流。Nakama由父母一端的控制单元和孩子的驱动泰迪熊组成。熊有几个沟通渠道,包括社交触摸、温度和振动触觉心跳;所有这些都是为了增加存在感。Nakama的当前版本适合在实验室环境中进行用户评估,我们的目标是更全面地了解这些不太传统的通信渠道的机会和局限性。
{"title":"Nakama: A Companion for Non-verbal Affective Communication","authors":"Christian J. A. M. Willemse, G. M. Munters, J. V. Erp, D. Heylen","doi":"10.1145/2818346.2823299","DOIUrl":"https://doi.org/10.1145/2818346.2823299","url":null,"abstract":"We present \"Nakama\": A communication device that supports affective communication between a child and its - geographically separated - parent. Nakama consists of a control unit at the parent's end and an actuated teddy bear for the child. The bear contains several communication channels, including social touch, temperature, and vibrotactile heartbeats; all aimed at increasing the sense of presence. The current version of Nakama is suitable for user evaluations in lab settings, with which we aim to gain a more thorough understanding of the opportunities and limitations of these less traditional communication channels.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83638041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Session details: Oral Session 6: Mobile and Wearable 会议详情:口头会议6:移动和可穿戴
M. Johnston
{"title":"Session details: Oral Session 6: Mobile and Wearable","authors":"M. Johnston","doi":"10.1145/3252451","DOIUrl":"https://doi.org/10.1145/3252451","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88942069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting and Identifying Tactile Gestures using Deep Autoencoders, Geometric Moments and Gesture Level Features
Dana Hughes, N. Farrow, Halley P. Profita, N. Correll
While several sensing modalities and transduction approaches have been developed for tactile sensing in robotic skins, there has been much less work towards extracting features for or identifying high-level gestures performed on the skin. In this paper, we investigate using deep neural networks with hidden Markov models (DNN-HMMs), geometric moments and gesture level features to identify a set of gestures performed on robotic skins. We demonstrate that these features are useful for identifying gestures, and predict a set of gestures from a 14-class dataset with 56% accuracy, and a 7-class dataset with 71% accuracy.
虽然已经开发了几种用于机器人皮肤触觉传感的传感模式和转导方法,但在提取特征或识别在皮肤上执行的高级手势方面的工作要少得多。在本文中,我们研究了使用具有隐马尔可夫模型(dnn - hmm),几何矩和手势水平特征的深度神经网络来识别机器人皮肤上执行的一组手势。我们证明了这些特征对识别手势很有用,并从14类数据集中预测了一组手势,准确率为56%,从7类数据集中预测了一组手势,准确率为71%。
{"title":"Detecting and Identifying Tactile Gestures using Deep Autoencoders, Geometric Moments and Gesture Level Features","authors":"Dana Hughes, N. Farrow, Halley P. Profita, N. Correll","doi":"10.1145/2818346.2830601","DOIUrl":"https://doi.org/10.1145/2818346.2830601","url":null,"abstract":"While several sensing modalities and transduction approaches have been developed for tactile sensing in robotic skins, there has been much less work towards extracting features for or identifying high-level gestures performed on the skin. In this paper, we investigate using deep neural networks with hidden Markov models (DNN-HMMs), geometric moments and gesture level features to identify a set of gestures performed on robotic skins. We demonstrate that these features are useful for identifying gestures, and predict a set of gestures from a 14-class dataset with 56% accuracy, and a 7-class dataset with 71% accuracy.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76547778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Multimodal Interaction with a Bifocal View on Mobile Devices 在移动设备上使用双焦点视图的多模式交互
S. Pelurson, L. Nigay
On a mobile device, the intuitive Focus+Context layout of a detailed view (focus) and perspective/distorted panels on either side (context) is particularly suitable for maximizing the utilization of the limited available display area. Interacting with such a bifocal view requires both fast access to data in the context view and high precision interaction with data in the detailed focus view. We introduce combined modalities that solve this problem by combining the well-known flick-drag gesture-based precise modality with modalities for fast access to data in the context view. The modalities for fast access to data in the context view include direct touch in the context view as well as navigation based on drag gestures, on tilting the device, on side-pressure inputs or by spatially moving the device (dynamic peephole). Results of a comparison experiment of the combined modalities show that the performance can be analyzed according to a 3-phase model of the task: a focus-targeting phase, a transition phase (modality switch) and a cursor-pointing phase. Moreover modalities of the focus-targeting phase based on a discrete mode of navigation control (direct access, pressure sensors as discrete navigation controller) require a long transition phase: this is mainly due to disorientation induced by the loss of control in movements. This effect is significantly more pronounced than the articulatory time for changing the position of the fingers between the two modalities ("homing" time).
在移动设备上,直观的焦点+上下文布局的详细视图(焦点)和两侧的透视/扭曲面板(上下文)特别适合最大化利用有限的可用显示区域。与这样的双焦点视图交互既需要快速访问上下文视图中的数据,又需要与详细焦点视图中的数据进行高精度交互。我们引入了组合模式,通过将众所周知的基于弹移手势的精确模式与在上下文视图中快速访问数据的模式相结合,解决了这个问题。在上下文视图中快速访问数据的方式包括在上下文视图中直接触摸,以及基于拖动手势、倾斜设备、侧压输入或通过空间移动设备(动态窥视孔)进行导航。组合模态的对比实验结果表明,可以根据任务的三个阶段模型进行性能分析:焦点瞄准阶段、过渡阶段(模态切换)和指针指向阶段。此外,基于离散导航控制模式(直接访问,压力传感器作为离散导航控制器)的焦点瞄准阶段模式需要很长的过渡阶段:这主要是由于运动失去控制引起的定向障碍。这种效果明显比在两种模式之间改变手指位置的发音时间(“归位”时间)更为明显。
{"title":"Multimodal Interaction with a Bifocal View on Mobile Devices","authors":"S. Pelurson, L. Nigay","doi":"10.1145/2818346.2820731","DOIUrl":"https://doi.org/10.1145/2818346.2820731","url":null,"abstract":"On a mobile device, the intuitive Focus+Context layout of a detailed view (focus) and perspective/distorted panels on either side (context) is particularly suitable for maximizing the utilization of the limited available display area. Interacting with such a bifocal view requires both fast access to data in the context view and high precision interaction with data in the detailed focus view. We introduce combined modalities that solve this problem by combining the well-known flick-drag gesture-based precise modality with modalities for fast access to data in the context view. The modalities for fast access to data in the context view include direct touch in the context view as well as navigation based on drag gestures, on tilting the device, on side-pressure inputs or by spatially moving the device (dynamic peephole). Results of a comparison experiment of the combined modalities show that the performance can be analyzed according to a 3-phase model of the task: a focus-targeting phase, a transition phase (modality switch) and a cursor-pointing phase. Moreover modalities of the focus-targeting phase based on a discrete mode of navigation control (direct access, pressure sensors as discrete navigation controller) require a long transition phase: this is mainly due to disorientation induced by the loss of control in movements. This effect is significantly more pronounced than the articulatory time for changing the position of the fingers between the two modalities (\"homing\" time).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"55 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83334731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild 捕捉au感知的面部特征及其在野外情绪识别中的潜在关系
Anbang Yao, Junchao Shao, Ningning Ma, Yurong Chen
The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to explore the significance of the latent relations among changing features resulted from facial muscle motions. In this paper, we study this recognition challenge from the perspective of analyzing the relations among expression-specific facial features in an explicit manner. Our method has three key components. First, we propose a pair-wise learning strategy to automatically seek a set of facial image patches which are important for discriminating two particular emotion categories. We found these learnt local patches are in part consistent with the locations of expression-specific Action Units (AUs), thus the features extracted from such kind of facial patches are named AU-aware facial features. Second, in each pair-wise task, we use an undirected graph structure, which takes learnt facial patches as individual vertices, to encode feature relations between any two learnt facial patches. Finally, a robust emotion representation is constructed by concatenating all task-specific graph-structured facial feature relations sequentially. Extensive experiments on the EmotiW 2015 Challenge testify the efficacy of the proposed approach. Without using additional data, our final submissions achieved competitive results on both sub-challenges including the image based static facial expression recognition (we got 55.38% recognition accuracy outperforming the baseline 39.13% with a margin of 16.25%) and the audio-video based emotion recognition (we got 53.80% recognition accuracy outperforming the baseline 39.33% and the 2014 winner team's final result 50.37% with the margins of 14.47% and 3.43%, respectively).
“野生情绪识别挑战赛”已经举办了三年。之前的获奖团队主要专注于设计特定的深度神经网络或融合各种手工制作和深度卷积特征。他们都忽视了对面部肌肉运动所导致的特征变化之间潜在关系的探讨。在本文中,我们从明确分析表情特征之间关系的角度来研究这一识别挑战。我们的方法有三个关键组成部分。首先,我们提出了一种成对学习策略来自动寻找一组面部图像补丁,这些补丁对于区分两种特定的情绪类别很重要。我们发现这些学习到的局部斑块与表达特异性动作单元(expression-specific Action Units, au)的位置部分一致,因此从这种面部斑块中提取的特征被称为au感知面部特征。其次,在每个成对任务中,我们使用无向图结构,将学习到的面部斑块作为单个顶点,编码任何两个学习到的面部斑块之间的特征关系。最后,通过顺序连接所有特定任务的图形结构面部特征关系,构建了鲁棒的情感表示。EmotiW 2015挑战赛上的大量实验证明了所提出方法的有效性。在没有使用额外数据的情况下,我们最终提交的作品在基于图像的静态面部表情识别(我们获得了55.38%的识别准确率,优于基线的39.13%,差值为16.25%)和基于音频视频的情感识别(我们获得了53.80%的识别准确率,优于基线的39.33%,2014年冠军团队的最终结果为50.37%,差值分别为14.47%和3.43%)这两个子挑战上都取得了有竞争力的结果。
{"title":"Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild","authors":"Anbang Yao, Junchao Shao, Ningning Ma, Yurong Chen","doi":"10.1145/2818346.2830585","DOIUrl":"https://doi.org/10.1145/2818346.2830585","url":null,"abstract":"The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to explore the significance of the latent relations among changing features resulted from facial muscle motions. In this paper, we study this recognition challenge from the perspective of analyzing the relations among expression-specific facial features in an explicit manner. Our method has three key components. First, we propose a pair-wise learning strategy to automatically seek a set of facial image patches which are important for discriminating two particular emotion categories. We found these learnt local patches are in part consistent with the locations of expression-specific Action Units (AUs), thus the features extracted from such kind of facial patches are named AU-aware facial features. Second, in each pair-wise task, we use an undirected graph structure, which takes learnt facial patches as individual vertices, to encode feature relations between any two learnt facial patches. Finally, a robust emotion representation is constructed by concatenating all task-specific graph-structured facial feature relations sequentially. Extensive experiments on the EmotiW 2015 Challenge testify the efficacy of the proposed approach. Without using additional data, our final submissions achieved competitive results on both sub-challenges including the image based static facial expression recognition (we got 55.38% recognition accuracy outperforming the baseline 39.13% with a margin of 16.25%) and the audio-video based emotion recognition (we got 53.80% recognition accuracy outperforming the baseline 39.33% and the 2014 winner team's final result 50.37% with the margins of 14.47% and 3.43%, respectively).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"219 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77767764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 基于视频和图像的情感识别挑战:EmotiW 2015
Abhinav Dhall, O. V. R. Murthy, Roland Göcke, Jyoti Joshi, Tom Gedeon
The third Emotion Recognition in the Wild (EmotiW) challenge 2015 consists of an audio-video based emotion and static image based facial expression classification sub-challenges, which mimics real-world conditions. The two sub-challenges are based on the Acted Facial Expression in the Wild (AFEW) 5.0 and the Static Facial Expression in the Wild (SFEW) 2.0 databases, respectively. The paper describes the data, baseline method, challenge protocol and the challenge results. A total of 12 and 17 teams participated in the video based emotion and image based expression sub-challenges, respectively.
2015年第三次野外情绪识别挑战赛(EmotiW)由基于音频视频的情绪和基于静态图像的面部表情分类子挑战组成,模拟了现实世界的情况。这两个子挑战分别基于野生面部表情(AFEW) 5.0和静态面部表情(SFEW) 2.0数据库。本文介绍了数据、基线方法、挑战方案和挑战结果。共有12支队伍和17支队伍分别参加了基于视频的情感子挑战和基于图像的表达子挑战。
{"title":"Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015","authors":"Abhinav Dhall, O. V. R. Murthy, Roland Göcke, Jyoti Joshi, Tom Gedeon","doi":"10.1145/2818346.2829994","DOIUrl":"https://doi.org/10.1145/2818346.2829994","url":null,"abstract":"The third Emotion Recognition in the Wild (EmotiW) challenge 2015 consists of an audio-video based emotion and static image based facial expression classification sub-challenges, which mimics real-world conditions. The two sub-challenges are based on the Acted Facial Expression in the Wild (AFEW) 5.0 and the Static Facial Expression in the Wild (SFEW) 2.0 databases, respectively. The paper describes the data, baseline method, challenge protocol and the challenge results. A total of 12 and 17 teams participated in the video based emotion and image based expression sub-challenges, respectively.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"278 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80072845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 281
Image based Static Facial Expression Recognition with Multiple Deep Network Learning 基于图像的静态面部表情识别与多重深度网络学习
Zhiding Yu, Cha Zhang
We report our image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015. We focus on the sub-challenge of the SFEW 2.0 dataset, where one seeks to automatically classify a set of static images into 7 basic emotions. The proposed method contains a face detection module based on the ensemble of three state-of-the-art face detectors, followed by a classification module with the ensemble of multiple deep convolutional neural networks (CNN). Each CNN model is initialized randomly and pre-trained on a larger dataset provided by the Facial Expression Recognition (FER) Challenge 2013. The pre-trained models are then fine-tuned on the training set of SFEW 2.0. To combine multiple CNN models, we present two schemes for learning the ensemble weights of the network responses: by minimizing the log likelihood loss, and by minimizing the hinge loss. Our proposed method generates state-of-the-art result on the FER dataset. It also achieves 55.96% and 61.29% respectively on the validation and test set of SFEW 2.0, surpassing the challenge baseline of 35.96% and 39.13% with significant gains.
我们报告了基于图像的静态面部表情识别方法,用于野生挑战(EmotiW) 2015中的情绪识别。我们专注于SFEW 2.0数据集的子挑战,其中一个目标是将一组静态图像自动分类为7种基本情绪。该方法包含一个基于三个最先进的人脸检测器集成的人脸检测模块,然后是一个基于多个深度卷积神经网络(CNN)集成的分类模块。每个CNN模型都是随机初始化的,并在2013年面部表情识别挑战赛(FER)提供的更大数据集上进行预训练。然后在SFEW 2.0的训练集上对预训练模型进行微调。为了结合多个CNN模型,我们提出了两种方案来学习网络响应的集成权值:通过最小化对数似然损失和最小化铰链损失。我们提出的方法在FER数据集上生成最先进的结果。在SFEW 2.0的验证集和测试集上分别达到55.96%和61.29%,超过了挑战基线的35.96%和39.13%,取得了显著的进步。
{"title":"Image based Static Facial Expression Recognition with Multiple Deep Network Learning","authors":"Zhiding Yu, Cha Zhang","doi":"10.1145/2818346.2830595","DOIUrl":"https://doi.org/10.1145/2818346.2830595","url":null,"abstract":"We report our image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015. We focus on the sub-challenge of the SFEW 2.0 dataset, where one seeks to automatically classify a set of static images into 7 basic emotions. The proposed method contains a face detection module based on the ensemble of three state-of-the-art face detectors, followed by a classification module with the ensemble of multiple deep convolutional neural networks (CNN). Each CNN model is initialized randomly and pre-trained on a larger dataset provided by the Facial Expression Recognition (FER) Challenge 2013. The pre-trained models are then fine-tuned on the training set of SFEW 2.0. To combine multiple CNN models, we present two schemes for learning the ensemble weights of the network responses: by minimizing the log likelihood loss, and by minimizing the hinge loss. Our proposed method generates state-of-the-art result on the FER dataset. It also achieves 55.96% and 61.29% respectively on the validation and test set of SFEW 2.0, surpassing the challenge baseline of 35.96% and 39.13% with significant gains.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82912237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 537
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1