Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques

IF 3.7 4区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Romanian Journal of Information Science and Technology Pub Date : 2023-09-28 DOI:10.59277/romjist.2023.3-4.10
Serban MIHALACHE, Dragos BURILEANU
{"title":"Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques","authors":"Serban MIHALACHE, Dragos BURILEANU","doi":"10.59277/romjist.2023.3-4.10","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition (SER) is the task of determining the affective content present in speech, a promising research area of great interest in recent years, with important applications especially in the field of forensic speech and law enforcement operations, among others. In this paper, systems based on deep neural networks (DNNs) spanning five levels of complexity are proposed, developed, and tested, including systems leveraging transfer learning (TL) for the top modern image recognition deep learning models, as well as several ensemble classification techniques that lead to significant performance increases. The systems were tested on the most relevant SER datasets: EMODB, CREMAD, and IEMOCAP, in the context of: (i) classification: using the standard full sets of emotion classes, as well as additional negative emotion subsets relevant for forensic speech applications; and (ii) regression: using the continuously valued 2D arousal-valence affect space. The proposed systems achieved state-of-the-art results for the full class subset for EMODB (up to 83% accuracy) and performance comparable to other published research for the full class subsets for CREMAD and IEMOCAP (up to 55% and 62% accuracy). For the class subsets focusing only on negative affective content, the proposed solutions offered top performance vs. previously published state of the art results.","PeriodicalId":54448,"journal":{"name":"Romanian Journal of Information Science and Technology","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Romanian Journal of Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59277/romjist.2023.3-4.10","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Speech emotion recognition (SER) is the task of determining the affective content present in speech, a promising research area of great interest in recent years, with important applications especially in the field of forensic speech and law enforcement operations, among others. In this paper, systems based on deep neural networks (DNNs) spanning five levels of complexity are proposed, developed, and tested, including systems leveraging transfer learning (TL) for the top modern image recognition deep learning models, as well as several ensemble classification techniques that lead to significant performance increases. The systems were tested on the most relevant SER datasets: EMODB, CREMAD, and IEMOCAP, in the context of: (i) classification: using the standard full sets of emotion classes, as well as additional negative emotion subsets relevant for forensic speech applications; and (ii) regression: using the continuously valued 2D arousal-valence affect space. The proposed systems achieved state-of-the-art results for the full class subset for EMODB (up to 83% accuracy) and performance comparable to other published research for the full class subsets for CREMAD and IEMOCAP (up to 55% and 62% accuracy). For the class subsets focusing only on negative affective content, the proposed solutions offered top performance vs. previously published state of the art results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用深度神经网络、迁移学习和集成分类技术的语音情感识别
语音情感识别(SER)是确定语音中存在的情感内容的任务,是近年来备受关注的一个有前途的研究领域,特别是在法医语音和执法行动等领域具有重要的应用。本文提出、开发和测试了基于深度神经网络(dnn)的系统,该系统跨越了五个复杂级别,包括利用迁移学习(TL)的顶级现代图像识别深度学习模型的系统,以及几种导致性能显著提高的集成分类技术。这些系统在最相关的SER数据集上进行了测试:EMODB, CREMAD和IEMOCAP,在以下背景下:(i)分类:使用标准的完整情感类别集,以及与法医语音应用相关的额外负面情绪子集;(ii)回归:利用连续值二维唤醒效价影响空间。所提出的系统在EMODB的全类子集上取得了最先进的结果(准确率高达83%),其性能可与其他已发表的CREMAD和IEMOCAP的全类子集研究(准确率高达55%和62%)相媲美。对于只关注负面情感内容的类子集,所提出的解决方案提供了与先前发布的最先进结果相比的最佳性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Romanian Journal of Information Science and Technology
Romanian Journal of Information Science and Technology 工程技术-计算机:理论方法
CiteScore
5.50
自引率
8.60%
发文量
0
审稿时长
>12 weeks
期刊介绍: The primary objective of this journal is the publication of original results of research in information science and technology. There is no restriction on the addressed topics, the only acceptance criterion being the originality and quality of the articles, proved by independent reviewers. Contributions to recently emerging areas are encouraged. Romanian Journal of Information Science and Technology (a publication of the Romanian Academy) is indexed and abstracted in the following Thomson Reuters products and information services: • Science Citation Index Expanded (also known as SciSearch®), • Journal Citation Reports/Science Edition.
期刊最新文献
XOR-Based Detector of Different Decisions on Anomalies in the Computer Network Traffic Twitter's Mirroring of the 2022 Energy Crisis: What It Teaches Decision-Makers - A Preliminary Study Binary Anarchic Society Optimization for Feature Selection Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques Using Swear Words Increases the Irritability – a Study Using AI Algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1