{"title":"基于CNN和CRNN的语音情感识别比较研究","authors":"Nan Jiang, Junwei Jia, Dongmei Shao","doi":"10.1109/ICMLC51923.2020.9469540","DOIUrl":null,"url":null,"abstract":"This paper compares and analyzes the training effect of Convolutional Recurrent Neural Network (CRNN) and Convolutional Neural Network (CNN) in speech emotion recognition. In order to solve the problem that CNN lacks the extraction of temporal information and the general temporal model is insufficient to represent the spatial information, CRNN is applied to speech emotion recognition. Taking Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC) as the input features of the model, the recognition performances of CRNN and CNN in speech emotion recognition are compared and analyzed. The research shows that CRNN has higher accuracy for both features, which effectively improves the computing power of speech emotion model and provides a theoretical basis and optimization direction for improving the accuracy of speech emotion recognition.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparative Study of Speech Emotion Recognition Based On CNN and CRNN\",\"authors\":\"Nan Jiang, Junwei Jia, Dongmei Shao\",\"doi\":\"10.1109/ICMLC51923.2020.9469540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper compares and analyzes the training effect of Convolutional Recurrent Neural Network (CRNN) and Convolutional Neural Network (CNN) in speech emotion recognition. In order to solve the problem that CNN lacks the extraction of temporal information and the general temporal model is insufficient to represent the spatial information, CRNN is applied to speech emotion recognition. Taking Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC) as the input features of the model, the recognition performances of CRNN and CNN in speech emotion recognition are compared and analyzed. The research shows that CRNN has higher accuracy for both features, which effectively improves the computing power of speech emotion model and provides a theoretical basis and optimization direction for improving the accuracy of speech emotion recognition.\",\"PeriodicalId\":170815,\"journal\":{\"name\":\"2020 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"volume\":\"144 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC51923.2020.9469540\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC51923.2020.9469540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Study of Speech Emotion Recognition Based On CNN and CRNN
This paper compares and analyzes the training effect of Convolutional Recurrent Neural Network (CRNN) and Convolutional Neural Network (CNN) in speech emotion recognition. In order to solve the problem that CNN lacks the extraction of temporal information and the general temporal model is insufficient to represent the spatial information, CRNN is applied to speech emotion recognition. Taking Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC) as the input features of the model, the recognition performances of CRNN and CNN in speech emotion recognition are compared and analyzed. The research shows that CRNN has higher accuracy for both features, which effectively improves the computing power of speech emotion model and provides a theoretical basis and optimization direction for improving the accuracy of speech emotion recognition.