Speech emotion classification using semi-supervised LSTM

Nattipon Itponjaroen, Kumpee Apsornpasakorn, Eakarat Pimthai, Khwanchai Kaewkaisorn, Shularp Panitchart, Thitirat Siriborvornratanakul
{"title":"Speech emotion classification using semi-supervised LSTM","authors":"Nattipon Itponjaroen,&nbsp;Kumpee Apsornpasakorn,&nbsp;Eakarat Pimthai,&nbsp;Khwanchai Kaewkaisorn,&nbsp;Shularp Panitchart,&nbsp;Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00059-x","DOIUrl":null,"url":null,"abstract":"<div><p>Speech mood analysis is a challenging task with unclear optimal feature selection. The nature of the dataset, whether it is from an infant or adult, is crucial to consider. In this study, the characteristics of speech were investigated using Mel-frequency cepstral coefficients (MFCC) to analyze audio files. The CREMA-D dataset, which includes six different mood states (normal, angry, happy, sad, scared, and irritated), was employed to identify mood states from speech files. A mood classification system was proposed that integrates Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to increase the number of labeled data in small datasets and improve classification accuracy.</p><p>A semi-supervised model was proposed in this study to improve the accuracy of speech mood classification systems. The approach was tested on a classification model that used SVM and LSTM, and it was found that the semi-supervised model outperforms both SVM and LSTM models, achieving a validation accuracy of 89.72%. This result surpasses the accuracy achieved by SVM and LSTM models alone. Moreover, the semi-supervised method was observed to accelerate the training process of the model. These outcomes illustrate the efficacy of the proposed model and its potential to enhance speech mood analysis techniques.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"3 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-023-00059-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech mood analysis is a challenging task with unclear optimal feature selection. The nature of the dataset, whether it is from an infant or adult, is crucial to consider. In this study, the characteristics of speech were investigated using Mel-frequency cepstral coefficients (MFCC) to analyze audio files. The CREMA-D dataset, which includes six different mood states (normal, angry, happy, sad, scared, and irritated), was employed to identify mood states from speech files. A mood classification system was proposed that integrates Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to increase the number of labeled data in small datasets and improve classification accuracy.

A semi-supervised model was proposed in this study to improve the accuracy of speech mood classification systems. The approach was tested on a classification model that used SVM and LSTM, and it was found that the semi-supervised model outperforms both SVM and LSTM models, achieving a validation accuracy of 89.72%. This result surpasses the accuracy achieved by SVM and LSTM models alone. Moreover, the semi-supervised method was observed to accelerate the training process of the model. These outcomes illustrate the efficacy of the proposed model and its potential to enhance speech mood analysis techniques.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于半监督LSTM的语音情感分类
语音情绪分析是一项具有挑战性的任务,最优特征选择不明确。数据集的性质,无论是来自婴儿还是成人,都是至关重要的。在本研究中,使用梅尔频率倒谱系数(MFCC)来分析音频文件,以研究语音的特征。CREMA-D数据集包括六种不同的情绪状态(正常、愤怒、快乐、悲伤、害怕和愤怒),用于从语音文件中识别情绪状态。提出了一种结合支持向量机(SVM)和长短期记忆(LSTM)模型的情绪分类系统,以增加小数据集中的标记数据数量,提高分类精度。为了提高语音情绪分类系统的准确性,本文提出了一种半监督模型。该方法在一个使用SVM和LSTM的分类模型上进行了测试,发现半监督模型优于SVM和LSTM模型,验证准确率达到89.72%。这一结果超过了单独使用SVM和LSTM模型的准确率。此外,观察到半监督方法加速了模型的训练过程。这些结果说明了所提出的模型的有效性及其增强语音情绪分析技术的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Non-linear machine learning with sample perturbation augments leukemia relapse prognostics from single-cell proteomics measurements ARBP: antibiotic-resistant bacteria propagation bio-inspired algorithm and its performance on benchmark functions Detection and classification of diabetic retinopathy based on ensemble learning Office real estate price index forecasts through Gaussian process regressions for ten major Chinese cities Systematic micro-breaks affect concentration during cognitive comparison tasks: quantitative and qualitative measurements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1