Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information and Telecommunication Pub Date : 2023-03-17 DOI:10.1080/24751839.2023.2187278
Nhat Truong Pham, Sy Dzung Nguyen, Vu Song Thuy Nguyen, Bich Ngoc Hong Pham, Duc Ngoc Minh Dang
{"title":"Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network","authors":"Nhat Truong Pham, Sy Dzung Nguyen, Vu Song Thuy Nguyen, Bich Ngoc Hong Pham, Duc Ngoc Minh Dang","doi":"10.1080/24751839.2023.2187278","DOIUrl":null,"url":null,"abstract":"ABSTRACT Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"7 1","pages":"317 - 335"},"PeriodicalIF":2.7000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2187278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于重叠滑动窗和Shapley加性可解释深度神经网络的语音情感识别
摘要语音情感识别(SER)具有多种应用,如电子学习、人机交互、客户服务和医疗保健系统。尽管研究人员已经研究了许多提高SER准确性的技术,但在特征提取、分类器方案和计算成本方面一直存在挑战。为了解决上述问题,我们在本研究中提出了一组新的1D特征,通过使用重叠滑动窗口(OSW)技术提取SER。此外,还设计了一种基于深度神经网络的分类器方案,称为深度模式识别网络(PRN),用于从新的1D特征集中对情绪状态进行分类。我们在包含几种不同情绪状态的Emo DB和AESSD数据集上评估了所提出的方法。实验结果表明,该方法在Emo-DB和AESSD数据集上的准确率分别为98.5%和87.1%。它在精度上也比在SER的相同数据集上使用1D特征的最先进和当前方法更具可比性,并且更好。此外,SHAP(SHapley Additive exPlanations)分析被用于解释预测模型,以帮助系统开发人员选择最佳特征以集成到期望的系统中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
0.00%
发文量
18
审稿时长
27 weeks
期刊最新文献
Utilizing deep learning in chipless RFID tag detection: an investigation on high-precision mm-wave spatial tag estimation from 2D virtual imaging On the performance of outage probability in cognitive NOMA random networks with hardware impairments Relay-assisted communication over a fluctuating two-ray fading channel Modified Caesar Cipher and Card Deck Shuffle Rearrangement Algorithm for Image Encryption Application of data envelopment analysis to IT project evaluation, with special emphasis on the choice of inputs and outputs in the context of the organization in question
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1