基于动作的RGBD+时间数据集用于自发和非自发面部表情识别

2015 International Conference on Biometrics (ICB) Pub Date : 2015-05-19 DOI:10.1109/ICB.2015.7139081

S. Aly, Andrea Trubanova, A. L. Abbott, S. White, A. Youssef

{"title":"基于动作的RGBD+时间数据集用于自发和非自发面部表情识别","authors":"S. Aly, Andrea Trubanova, A. L. Abbott, S. White, A. Youssef","doi":"10.1109/ICB.2015.7139081","DOIUrl":null,"url":null,"abstract":"Human facial expressions have been extensively studied using 2D static images or 2D video sequences. The main limitations of 2D-based analysis are problems associated with large variations in pose and illumination. Therefore, an alternative is to utilize depth information, captured from 3D sensors, which is both pose and illumination invariant. The Kinect sensor is an inexpensive, portable, and fast way to capture the depth information. However, only a few researchers have utilized the Kinect sensor for the automatic recognition of facial expressions. This is partly due to the lack of a Kinect-based publicly available RGBD facial expression recognition (FER) dataset that contains the relevant facial expressions and their associated semantic labels. This paper addresses this problem by presenting the first publicly available RGBD+time facial expression recognition dataset using the Kinect 1.0 sensor in both scripted (acted) and unscripted (spontaneous) scenarios. Our fully annotated dataset includes seven expressions (happiness, sadness, surprise, disgust, fear, anger, and neutral) for 32 subjects (males and females) aged from 10 to 30 and with different skin tones. Both human and machine evaluation were conducted. Each scripted expression was ranked quantitatively by two research assistants in the Psychology department. Baseline machine evaluation resulted in average recognition accuracy levels of 60% and 58.3% for 6 expressions and 7 expressions recognition, respectively, when features from 2D and 3D data were combined.","PeriodicalId":237372,"journal":{"name":"2015 International Conference on Biometrics (ICB)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"VT-KFER: A Kinect-based RGBD+time dataset for spontaneous and non-spontaneous facial expression recognition\",\"authors\":\"S. Aly, Andrea Trubanova, A. L. Abbott, S. White, A. Youssef\",\"doi\":\"10.1109/ICB.2015.7139081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human facial expressions have been extensively studied using 2D static images or 2D video sequences. The main limitations of 2D-based analysis are problems associated with large variations in pose and illumination. Therefore, an alternative is to utilize depth information, captured from 3D sensors, which is both pose and illumination invariant. The Kinect sensor is an inexpensive, portable, and fast way to capture the depth information. However, only a few researchers have utilized the Kinect sensor for the automatic recognition of facial expressions. This is partly due to the lack of a Kinect-based publicly available RGBD facial expression recognition (FER) dataset that contains the relevant facial expressions and their associated semantic labels. This paper addresses this problem by presenting the first publicly available RGBD+time facial expression recognition dataset using the Kinect 1.0 sensor in both scripted (acted) and unscripted (spontaneous) scenarios. Our fully annotated dataset includes seven expressions (happiness, sadness, surprise, disgust, fear, anger, and neutral) for 32 subjects (males and females) aged from 10 to 30 and with different skin tones. Both human and machine evaluation were conducted. Each scripted expression was ranked quantitatively by two research assistants in the Psychology department. Baseline machine evaluation resulted in average recognition accuracy levels of 60% and 58.3% for 6 expressions and 7 expressions recognition, respectively, when features from 2D and 3D data were combined.\",\"PeriodicalId\":237372,\"journal\":{\"name\":\"2015 International Conference on Biometrics (ICB)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Biometrics (ICB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICB.2015.7139081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Biometrics (ICB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICB.2015.7139081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

人类的面部表情已经被广泛研究使用二维静态图像或二维视频序列。基于2d的分析的主要限制是与姿态和照明的大变化相关的问题。因此，另一种方法是利用从3D传感器捕获的深度信息，这是姿势和照明不变的。Kinect传感器是一种廉价、便携、快速获取深度信息的方法。然而，利用Kinect传感器自动识别面部表情的研究人员很少。这部分是由于缺乏一个基于kinect的公开可用的RGBD面部表情识别(FER)数据集，该数据集包含相关的面部表情及其相关的语义标签。本文提出了第一个公开可用的RGBD+时间面部表情识别数据集，该数据集使用Kinect 1.0传感器在脚本(表演)和非脚本(自发)场景中使用。我们的完整注释数据集包括32名年龄在10到30岁之间、肤色不同的受试者(男性和女性)的七种表情(快乐、悲伤、惊讶、厌恶、恐惧、愤怒和中性)。进行了人和机器评估。每个脚本表达由心理学系的两名研究助理进行定量排序。当结合2D和3D数据的特征时，基线机器评估的6个表情和7个表情识别的平均准确率分别为60%和58.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VT-KFER: A Kinect-based RGBD+time dataset for spontaneous and non-spontaneous facial expression recognition

Human facial expressions have been extensively studied using 2D static images or 2D video sequences. The main limitations of 2D-based analysis are problems associated with large variations in pose and illumination. Therefore, an alternative is to utilize depth information, captured from 3D sensors, which is both pose and illumination invariant. The Kinect sensor is an inexpensive, portable, and fast way to capture the depth information. However, only a few researchers have utilized the Kinect sensor for the automatic recognition of facial expressions. This is partly due to the lack of a Kinect-based publicly available RGBD facial expression recognition (FER) dataset that contains the relevant facial expressions and their associated semantic labels. This paper addresses this problem by presenting the first publicly available RGBD+time facial expression recognition dataset using the Kinect 1.0 sensor in both scripted (acted) and unscripted (spontaneous) scenarios. Our fully annotated dataset includes seven expressions (happiness, sadness, surprise, disgust, fear, anger, and neutral) for 32 subjects (males and females) aged from 10 to 30 and with different skin tones. Both human and machine evaluation were conducted. Each scripted expression was ranked quantitatively by two research assistants in the Psychology department. Baseline machine evaluation resulted in average recognition accuracy levels of 60% and 58.3% for 6 expressions and 7 expressions recognition, respectively, when features from 2D and 3D data were combined.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Biometrics (ICB)

自引率

0.00%

发文量

期刊最新文献

Fast and robust self-training beard/moustache detection and segmentation Composite sketch recognition via deep network - a transfer learning approach Cross-sensor iris verification applying robust fused segmentation algorithms Multi-modal authentication system for smartphones using face, iris and periocular An efficient approach for clustering face images