Dongyan Wang, Jing Dong, D. Zhou, Xiaopeng Wei, Qiang Zhang
{"title":"基于图像增强的语音情感识别","authors":"Dongyan Wang, Jing Dong, D. Zhou, Xiaopeng Wei, Qiang Zhang","doi":"10.1109/ISKE47853.2019.9170442","DOIUrl":null,"url":null,"abstract":"The performance of an emotion recognition system is determined by the quality of emotional features. In this paper, we propose a feature optimization algorithm based on image enhancement and present a convolutional recurrent model to realize emotional recognition of natural speech. For three-dimensional (3-D) log-Mel spectrum and 3-D spectrogram features, the fast gamma transformation with an adaptive threshold is adopted for feature enhancement to make full use of the dynamic characteristics of non-stationary speech signals. Meanwhile, the model combining Convolutional Neural Network (CNN) with the rectangular kernels and Long Short-Term Memory (LSTM) is used to complete speech emotion recognition tasks. Experiments are carried out on two public emotional datasets, and results demonstrate the good generalization ability and recognition performance of our proposed model.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech Emotion Recognition Based on Image Enhancement\",\"authors\":\"Dongyan Wang, Jing Dong, D. Zhou, Xiaopeng Wei, Qiang Zhang\",\"doi\":\"10.1109/ISKE47853.2019.9170442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of an emotion recognition system is determined by the quality of emotional features. In this paper, we propose a feature optimization algorithm based on image enhancement and present a convolutional recurrent model to realize emotional recognition of natural speech. For three-dimensional (3-D) log-Mel spectrum and 3-D spectrogram features, the fast gamma transformation with an adaptive threshold is adopted for feature enhancement to make full use of the dynamic characteristics of non-stationary speech signals. Meanwhile, the model combining Convolutional Neural Network (CNN) with the rectangular kernels and Long Short-Term Memory (LSTM) is used to complete speech emotion recognition tasks. Experiments are carried out on two public emotional datasets, and results demonstrate the good generalization ability and recognition performance of our proposed model.\",\"PeriodicalId\":399084,\"journal\":{\"name\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE47853.2019.9170442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Emotion Recognition Based on Image Enhancement
The performance of an emotion recognition system is determined by the quality of emotional features. In this paper, we propose a feature optimization algorithm based on image enhancement and present a convolutional recurrent model to realize emotional recognition of natural speech. For three-dimensional (3-D) log-Mel spectrum and 3-D spectrogram features, the fast gamma transformation with an adaptive threshold is adopted for feature enhancement to make full use of the dynamic characteristics of non-stationary speech signals. Meanwhile, the model combining Convolutional Neural Network (CNN) with the rectangular kernels and Long Short-Term Memory (LSTM) is used to complete speech emotion recognition tasks. Experiments are carried out on two public emotional datasets, and results demonstrate the good generalization ability and recognition performance of our proposed model.