{"title":"Audio spectrogram analysis in IoT paradigm for the classification of psychological-emotional characteristics","authors":"Ankit Kumar, Sushil Kumar Singh, Indu Bhardwaj, Prakash Kumar Singh, Ashish Khanna, Biswajit Brahma","doi":"10.1007/s41870-024-02166-5","DOIUrl":null,"url":null,"abstract":"<p>Psychological activities have various dimensions in which they correlate with their respective behavior generated by the human body. Understanding the relationship of psychological events from external action units is one of the research subjects to explore various human behavior and their dependencies. The study of psychological analysis in the medical field is very time-consuming and costly. It requires constant monitoring of the patient for some time and various interrogation sessions to finalize the emotional severity of an individual. The challenges in exploring human emotions propel the requirement of computer vision techniques in this field. The proposed study explicitly evaluates the recognition of psychological-emotional activities with the help of an audio spectrogram of people fetched through an IoT (Internet of Things) device comprising a microphone to investigate its correlation with psychological events. The audio samples are collected in an asymmetric environment where the chances of the noise are random. Noise cancellation, low power consumption, and sensitivity controls are some of the prominent features of the microphone IoT that have been used to extract raw audio samples. The proposed system follows the extraction of features such as mel-frequency cepstral coefficients (MFCC), harmonic to noise rate (HNR), zero crossing rate (ZCR), and Generative Adversarial Networks (GAN) from the audio spectrogram. The study uses a deep learning-based model containing a convolutional neural network model to recognize and classify different psychological-emotional stages including happiness, anger, disgust, surprise, fear, and sadness from audio spectrogram features. The average accuracy of the classification model for the recognition of all emotions is found to be 99.42% in a maximum of 312 iterations. The model is found to be robust for various applications such as preventing suicidal cases, improving decision-making in the diagnosis of depression patients, improves the overall mental healthcare system.</p>","PeriodicalId":14138,"journal":{"name":"International Journal of Information Technology","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-024-02166-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Psychological activities have various dimensions in which they correlate with their respective behavior generated by the human body. Understanding the relationship of psychological events from external action units is one of the research subjects to explore various human behavior and their dependencies. The study of psychological analysis in the medical field is very time-consuming and costly. It requires constant monitoring of the patient for some time and various interrogation sessions to finalize the emotional severity of an individual. The challenges in exploring human emotions propel the requirement of computer vision techniques in this field. The proposed study explicitly evaluates the recognition of psychological-emotional activities with the help of an audio spectrogram of people fetched through an IoT (Internet of Things) device comprising a microphone to investigate its correlation with psychological events. The audio samples are collected in an asymmetric environment where the chances of the noise are random. Noise cancellation, low power consumption, and sensitivity controls are some of the prominent features of the microphone IoT that have been used to extract raw audio samples. The proposed system follows the extraction of features such as mel-frequency cepstral coefficients (MFCC), harmonic to noise rate (HNR), zero crossing rate (ZCR), and Generative Adversarial Networks (GAN) from the audio spectrogram. The study uses a deep learning-based model containing a convolutional neural network model to recognize and classify different psychological-emotional stages including happiness, anger, disgust, surprise, fear, and sadness from audio spectrogram features. The average accuracy of the classification model for the recognition of all emotions is found to be 99.42% in a maximum of 312 iterations. The model is found to be robust for various applications such as preventing suicidal cases, improving decision-making in the diagnosis of depression patients, improves the overall mental healthcare system.