Touching the Limits of a Dataset in Video-Based Facial Expression Recognition

2021 International Russian Automation Conference (RusAutoCon) Pub Date : 2021-09-05 DOI:10.1109/RusAutoCon52004.2021.9537388

E. Churaev, A. Savchenko

引用次数: 2

Abstract

In this paper, we examine the issue of video-based facial emotion recognition algorithms which show excellent performance on some benchmarks, but have much worse accuracy in practical applications. For example, the typical error rate of contemporary deep neural networks on the RAVDESS dataset is less than 5%. We argue that such results are obtained only if the split of the whole dataset is incorrect, so that the same persons are present in both training and test sets. It is claimed that it is more frankly to use the actor-based split, in which persons in the training and test sets are disjoint. It is experimentally demonstrated that the near state-of-the-art neural network model pre-trained on the AffectNet dataset achieves 99% accuracy on conventional split of the RAVDESS dataset. However, when we split the dataset by the actors and training and testing sets have only unique persons then the accuracy will be 20-30% lower.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

触及基于视频的面部表情识别数据集的极限

在本文中，我们研究了基于视频的面部情感识别算法的问题，这些算法在一些基准测试中表现出色，但在实际应用中准确性差得多。例如，当代深度神经网络在RAVDESS数据集上的典型错误率小于5%。我们认为，只有在整个数据集的分割不正确的情况下才能获得这样的结果，这样训练集和测试集中都有相同的人。在训练集和测试集中的人是不相交的情况下，使用基于行动者的分割方法更为坦率。实验证明，在AffectNet数据集上预训练的最先进的神经网络模型在RAVDESS数据集的常规分割上达到99%的准确率。然而，当我们按角色分割数据集并且训练集和测试集只有唯一的人时，准确率将降低20-30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Russian Automation Conference (RusAutoCon)

自引率

0.00%

发文量