An Enhanced Human Speech Based Emotion Recognition

Dr. M. Narendra, Lankala Suvarchala
{"title":"An Enhanced Human Speech Based Emotion Recognition","authors":"Dr. M. Narendra, Lankala Suvarchala","doi":"10.32628/ijsrst24113128","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.","PeriodicalId":14387,"journal":{"name":"International Journal of Scientific Research in Science and Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Scientific Research in Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32628/ijsrst24113128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语音的增强型情感识别
语音情感识别(SER)是一个机器学习(ML)课题,吸引了大量研究人员的关注,尤其是在情感计算领域。这是因为其潜力不断增长、算法不断改进以及在现实世界中的应用。音高、音强和梅尔频率倒频谱系数(MFCC)是量化变量的示例,可用来表示人类语音中的副语言信息。要实现 SER,通常需要经过数据处理、特征选择/提取和基于基本情感特征的分类这三个主要过程。这些过程的性质以及人类语音的独特特征都支持使用 ML 技术来实现 SER。在最近的情感计算研究项目中,有几种 ML 技术被用于 SER 任务;然而,其中只有少数几种技术充分传达了可用于支持 SER 实施的三个基本阶段的基本战略和策略。此外,这些著作要么忽略了完成这些任务所涉及的困难,要么只是简要说明了克服这些困难所采用的前沿方法。在本研究中,我们以三个 SER 实施过程为重点,对过去十年间从机器学习角度应对 SER 挑战的研究进行了全面评估。其中详细介绍了一些难题,包括与说话者无关的实验分类准确率低的问题及相关解决方案。综述还提供了 SER 评估的原则,强调了可进行实验的指标和通用基线。本文旨在提供一本详尽的手册,供 SER 研究人员使用 ML 技术构建 SER 解决方案,激发对当前 SER 模型的潜在升级,或激发开发新方法以提高 SER 性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Radiation Dose Rate and Evaluation of Whole Body Scan SPECT/CT Images in Thyroid Carcinoma Radioablation Patients Using Radioisotope 131I Biodistribution and Absorption of Radiopharmaceutical 99mTc MDP in Various Bones of Lung Cancer Patients Using SPECT/CT Modalities Study of Intermolecular Interaction by Ultrasonic Measurements of 1-Butanol-Pyridine and Toluene-Pyridine at 303.15 To 323.15 K and Statistical Analysis of Liquid State Theories Review about Organic-Inorganic Perovskite Single Crystal : Synthesis Methods, Properties and Applications Machine Learning Based Liver Cirrhosis Detection Using Different Algorithm : A Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1