Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, Isra Zaman
{"title":"基于分类帧投票机制的语音信号情感检测","authors":"Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, Isra Zaman","doi":"10.1109/ICREST.2019.8644168","DOIUrl":null,"url":null,"abstract":"Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.","PeriodicalId":108842,"journal":{"name":"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames\",\"authors\":\"Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, Isra Zaman\",\"doi\":\"10.1109/ICREST.2019.8644168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.\",\"PeriodicalId\":108842,\"journal\":{\"name\":\"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICREST.2019.8644168\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICREST.2019.8644168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames
Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.