{"title":"基于高斯混合模型和深度神经网络的语音情感识别","authors":"I. Tashev, Zhong-Qiu Wang, Keith W. Godin","doi":"10.1109/ITA.2017.8023477","DOIUrl":null,"url":null,"abstract":"Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongoing assessment of interaction system performance and utility. Interaction utterances are very short, and we assume the speaker's emotion is constant throughout a given utterance. This paper investigates combinations of a GMM-based low-level feature extractor with a neural network serving as a high level feature extractor. The advantage of this system architecture is that it combines the fast developing neural network-based solutions with the classic statistical approaches applied to emotion recognition. Experiments on a Mandarin data set compare different solutions under the same or close conditions.","PeriodicalId":305510,"journal":{"name":"2017 Information Theory and Applications Workshop (ITA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks\",\"authors\":\"I. Tashev, Zhong-Qiu Wang, Keith W. Godin\",\"doi\":\"10.1109/ITA.2017.8023477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongoing assessment of interaction system performance and utility. Interaction utterances are very short, and we assume the speaker's emotion is constant throughout a given utterance. This paper investigates combinations of a GMM-based low-level feature extractor with a neural network serving as a high level feature extractor. The advantage of this system architecture is that it combines the fast developing neural network-based solutions with the classic statistical approaches applied to emotion recognition. Experiments on a Mandarin data set compare different solutions under the same or close conditions.\",\"PeriodicalId\":305510,\"journal\":{\"name\":\"2017 Information Theory and Applications Workshop (ITA)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Information Theory and Applications Workshop (ITA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITA.2017.8023477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Information Theory and Applications Workshop (ITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITA.2017.8023477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks
Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongoing assessment of interaction system performance and utility. Interaction utterances are very short, and we assume the speaker's emotion is constant throughout a given utterance. This paper investigates combinations of a GMM-based low-level feature extractor with a neural network serving as a high level feature extractor. The advantage of this system architecture is that it combines the fast developing neural network-based solutions with the classic statistical approaches applied to emotion recognition. Experiments on a Mandarin data set compare different solutions under the same or close conditions.