{"title":"Touching the Limits of a Dataset in Video-Based Facial Expression Recognition","authors":"E. Churaev, A. Savchenko","doi":"10.1109/RusAutoCon52004.2021.9537388","DOIUrl":null,"url":null,"abstract":"In this paper, we examine the issue of video-based facial emotion recognition algorithms which show excellent performance on some benchmarks, but have much worse accuracy in practical applications. For example, the typical error rate of contemporary deep neural networks on the RAVDESS dataset is less than 5%. We argue that such results are obtained only if the split of the whole dataset is incorrect, so that the same persons are present in both training and test sets. It is claimed that it is more frankly to use the actor-based split, in which persons in the training and test sets are disjoint. It is experimentally demonstrated that the near state-of-the-art neural network model pre-trained on the AffectNet dataset achieves 99% accuracy on conventional split of the RAVDESS dataset. However, when we split the dataset by the actors and training and testing sets have only unique persons then the accuracy will be 20-30% lower.","PeriodicalId":106150,"journal":{"name":"2021 International Russian Automation Conference (RusAutoCon)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Russian Automation Conference (RusAutoCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RusAutoCon52004.2021.9537388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we examine the issue of video-based facial emotion recognition algorithms which show excellent performance on some benchmarks, but have much worse accuracy in practical applications. For example, the typical error rate of contemporary deep neural networks on the RAVDESS dataset is less than 5%. We argue that such results are obtained only if the split of the whole dataset is incorrect, so that the same persons are present in both training and test sets. It is claimed that it is more frankly to use the actor-based split, in which persons in the training and test sets are disjoint. It is experimentally demonstrated that the near state-of-the-art neural network model pre-trained on the AffectNet dataset achieves 99% accuracy on conventional split of the RAVDESS dataset. However, when we split the dataset by the actors and training and testing sets have only unique persons then the accuracy will be 20-30% lower.