{"title":"基于MLP-SVM-PCA分类器的语音情感识别研究","authors":"Kabir Jain, Anjali Chaturvedi, Jahnvi Dua, Ramesh Kumar Bhukya","doi":"10.1109/UPCON56432.2022.9986457","DOIUrl":null,"url":null,"abstract":"Sound localization by human listeners are capable of identifying a particular speaker, by listening to the voice of the speaker over the telephone or an entrance-way out of sight. Machines are incapable of understanding and expressing emotions. Emotions play a important role in today's digital world of remote communication. Emotion recognition can be defined as an act of predicting human's emotion through their voice samples and get the accuracy of prediction thus creating a better Human-Computer Interaction (HCI). There are various states to predict human's emotion based on behaviour, expression, pitch, tone, etc. Few of the emotions are considered to recognize the emotions of a speaker behind the speech. This research was conducted to test an speech emotion recognition (SER) system based on voice samples in two-stage approach, namely feature extraction and classification engine. The first one, the key features used for classification of emotions such as extraction of Mel Frequency Cepstral Coefficients (MFCCs), Mel Spectrogram along with Chroma features. Secondly, we use the Multilayer Perceptron (MLP) classifier, elementary classifying Support Vector Machines (SVM) and dimensionality reductionPrincipal Component Analysis (PCA) as classification methods. The research work is considered on the Toronto Emotional Speech Set (TESS) dataset. The proposed approaches gives us 94.17%, 93.43% and 97.86% classification accuracy respectively.","PeriodicalId":185782,"journal":{"name":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigation Using MLP-SVM-PCA Classifiers on Speech Emotion Recognition\",\"authors\":\"Kabir Jain, Anjali Chaturvedi, Jahnvi Dua, Ramesh Kumar Bhukya\",\"doi\":\"10.1109/UPCON56432.2022.9986457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sound localization by human listeners are capable of identifying a particular speaker, by listening to the voice of the speaker over the telephone or an entrance-way out of sight. Machines are incapable of understanding and expressing emotions. Emotions play a important role in today's digital world of remote communication. Emotion recognition can be defined as an act of predicting human's emotion through their voice samples and get the accuracy of prediction thus creating a better Human-Computer Interaction (HCI). There are various states to predict human's emotion based on behaviour, expression, pitch, tone, etc. Few of the emotions are considered to recognize the emotions of a speaker behind the speech. This research was conducted to test an speech emotion recognition (SER) system based on voice samples in two-stage approach, namely feature extraction and classification engine. The first one, the key features used for classification of emotions such as extraction of Mel Frequency Cepstral Coefficients (MFCCs), Mel Spectrogram along with Chroma features. Secondly, we use the Multilayer Perceptron (MLP) classifier, elementary classifying Support Vector Machines (SVM) and dimensionality reductionPrincipal Component Analysis (PCA) as classification methods. The research work is considered on the Toronto Emotional Speech Set (TESS) dataset. The proposed approaches gives us 94.17%, 93.43% and 97.86% classification accuracy respectively.\",\"PeriodicalId\":185782,\"journal\":{\"name\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON56432.2022.9986457\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON56432.2022.9986457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigation Using MLP-SVM-PCA Classifiers on Speech Emotion Recognition
Sound localization by human listeners are capable of identifying a particular speaker, by listening to the voice of the speaker over the telephone or an entrance-way out of sight. Machines are incapable of understanding and expressing emotions. Emotions play a important role in today's digital world of remote communication. Emotion recognition can be defined as an act of predicting human's emotion through their voice samples and get the accuracy of prediction thus creating a better Human-Computer Interaction (HCI). There are various states to predict human's emotion based on behaviour, expression, pitch, tone, etc. Few of the emotions are considered to recognize the emotions of a speaker behind the speech. This research was conducted to test an speech emotion recognition (SER) system based on voice samples in two-stage approach, namely feature extraction and classification engine. The first one, the key features used for classification of emotions such as extraction of Mel Frequency Cepstral Coefficients (MFCCs), Mel Spectrogram along with Chroma features. Secondly, we use the Multilayer Perceptron (MLP) classifier, elementary classifying Support Vector Machines (SVM) and dimensionality reductionPrincipal Component Analysis (PCA) as classification methods. The research work is considered on the Toronto Emotional Speech Set (TESS) dataset. The proposed approaches gives us 94.17%, 93.43% and 97.86% classification accuracy respectively.