Dongyan Wang, Jing Dong, D. Zhou, Xiaopeng Wei, Qiang Zhang
{"title":"Speech Emotion Recognition Based on Image Enhancement","authors":"Dongyan Wang, Jing Dong, D. Zhou, Xiaopeng Wei, Qiang Zhang","doi":"10.1109/ISKE47853.2019.9170442","DOIUrl":null,"url":null,"abstract":"The performance of an emotion recognition system is determined by the quality of emotional features. In this paper, we propose a feature optimization algorithm based on image enhancement and present a convolutional recurrent model to realize emotional recognition of natural speech. For three-dimensional (3-D) log-Mel spectrum and 3-D spectrogram features, the fast gamma transformation with an adaptive threshold is adopted for feature enhancement to make full use of the dynamic characteristics of non-stationary speech signals. Meanwhile, the model combining Convolutional Neural Network (CNN) with the rectangular kernels and Long Short-Term Memory (LSTM) is used to complete speech emotion recognition tasks. Experiments are carried out on two public emotional datasets, and results demonstrate the good generalization ability and recognition performance of our proposed model.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The performance of an emotion recognition system is determined by the quality of emotional features. In this paper, we propose a feature optimization algorithm based on image enhancement and present a convolutional recurrent model to realize emotional recognition of natural speech. For three-dimensional (3-D) log-Mel spectrum and 3-D spectrogram features, the fast gamma transformation with an adaptive threshold is adopted for feature enhancement to make full use of the dynamic characteristics of non-stationary speech signals. Meanwhile, the model combining Convolutional Neural Network (CNN) with the rectangular kernels and Long Short-Term Memory (LSTM) is used to complete speech emotion recognition tasks. Experiments are carried out on two public emotional datasets, and results demonstrate the good generalization ability and recognition performance of our proposed model.