{"title":"Statistical and Deep Convolutional Feature Fusion for Emotion Detection from Audio Signal","authors":"Durgesh Ameta, Vinay Gupta, Rohit Pilakkottil Sathian, Laxmidhar Behera, Tushar Sandhan","doi":"10.1109/ICBSII58188.2023.10181060","DOIUrl":null,"url":null,"abstract":"Speech serves as a crucial mode of expression for individuals to articulate their thoughts and can offer valuable insight into their emotional state. Various research has been conducted to identify metrics that can be used to determine the emotional sentiment hidden in an audio signal. This paper presents an exploratory analysis of various audio features, including Chroma features, MFCCs, Spectral features, and flattened spectrogram features (obtained using VGG-19 convolutional neural network) for sentiment analysis in the audio signals. This study evaluates the effectiveness of combining various audio features in determining emotional states expressed in a speech using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Baseline techniques such as Random Forest, Multi-Layer Perceptron (MLP), Logistic Regression, XgBoost, and Support Vector Machine (SVM) are used to compare the performance of the features. The results obtained from the study provide insight into the potential of utilizing these audio features to determine emotional states expressed in speech.","PeriodicalId":388866,"journal":{"name":"2023 International Conference on Bio Signals, Images, and Instrumentation (ICBSII)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Bio Signals, Images, and Instrumentation (ICBSII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSII58188.2023.10181060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech serves as a crucial mode of expression for individuals to articulate their thoughts and can offer valuable insight into their emotional state. Various research has been conducted to identify metrics that can be used to determine the emotional sentiment hidden in an audio signal. This paper presents an exploratory analysis of various audio features, including Chroma features, MFCCs, Spectral features, and flattened spectrogram features (obtained using VGG-19 convolutional neural network) for sentiment analysis in the audio signals. This study evaluates the effectiveness of combining various audio features in determining emotional states expressed in a speech using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Baseline techniques such as Random Forest, Multi-Layer Perceptron (MLP), Logistic Regression, XgBoost, and Support Vector Machine (SVM) are used to compare the performance of the features. The results obtained from the study provide insight into the potential of utilizing these audio features to determine emotional states expressed in speech.