Shamman Noor, Ehsan Ahmed Dhrubo, A. T. Minhaz, C. Shahnaz, S. Fattah
{"title":"Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet Domain Features","authors":"Shamman Noor, Ehsan Ahmed Dhrubo, A. T. Minhaz, C. Shahnaz, S. Fattah","doi":"10.1109/WIECON-ECE.2017.8468871","DOIUrl":null,"url":null,"abstract":"The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting 6 classes of emotions (Happy, Sad, Fear, Disgust, Angry, and Surprise) of 44 different subjects from eNTERFACE05 database are used. As video feature, Horizontal and Vertical Cross Correlation (HCCR and VCCR) signals, extracted from regions-eye and mouth, are used. As Speech feature, Perceptual Linear Predictive Coefficients (PLPC) and Mel-frequency Cepstral Coefficients (MFCC), extracted from Wavelet Packet Coefficients, are used in conjunction with PLPC and MFCC extracted from original signal. For both types of feature, K-Nearest Neighbour (KNN) multiclass classification method is applied separately for identifying emotions expressed in speech and through facial movement. Emotion expressed in a video file is identified by concatenating the Speech and video features and applying KNN classification method.","PeriodicalId":188031,"journal":{"name":"2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIECON-ECE.2017.8468871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting 6 classes of emotions (Happy, Sad, Fear, Disgust, Angry, and Surprise) of 44 different subjects from eNTERFACE05 database are used. As video feature, Horizontal and Vertical Cross Correlation (HCCR and VCCR) signals, extracted from regions-eye and mouth, are used. As Speech feature, Perceptual Linear Predictive Coefficients (PLPC) and Mel-frequency Cepstral Coefficients (MFCC), extracted from Wavelet Packet Coefficients, are used in conjunction with PLPC and MFCC extracted from original signal. For both types of feature, K-Nearest Neighbour (KNN) multiclass classification method is applied separately for identifying emotions expressed in speech and through facial movement. Emotion expressed in a video file is identified by concatenating the Speech and video features and applying KNN classification method.
机器越能实现非语言的交流方式,如情感,就能达到更好的人机交互水平。本文描述了一种从人类语音和视觉数据中识别情感的方法,以供机器理解。特征提取使用eNTERFACE05数据库中44个不同受试者的6类情绪(Happy, Sad, Fear, Disgust, Angry, and Surprise)视频。视频特征采用了从眼睛和嘴巴区域提取的水平和垂直互相关信号(HCCR和VCCR)。语音特征采用从小波包系数中提取的感知线性预测系数(PLPC)和Mel-frequency倒谱系数(MFCC)与从原始信号中提取的PLPC和MFCC相结合的方法。对于这两种类型的特征,分别应用k -最近邻(KNN)多类分类方法来识别语音中表达的情绪和通过面部运动表达的情绪。将视频文件中的语音特征和视频特征拼接起来,应用KNN分类方法对视频文件中的情感进行识别。