Bo Sun, Liandong Li, Guoyan Zhou, Xuewen Wu, Jun He, Lejun Yu, Dongxue Li, Qinglan Wei
{"title":"Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild","authors":"Bo Sun, Liandong Li, Guoyan Zhou, Xuewen Wu, Jun He, Lejun Yu, Dongxue Li, Qinglan Wei","doi":"10.1145/2818346.2830586","DOIUrl":null,"url":null,"abstract":"In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"287 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818346.2830586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48
Abstract
In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.