Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2023-12-13 DOI:10.1016/j.datak.2023.102262
Jonghwan Hyeon, Yung-Hwan Oh, Young-Jun Lee, Ho-Jin Choi
{"title":"Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts","authors":"Jonghwan Hyeon,&nbsp;Yung-Hwan Oh,&nbsp;Young-Jun Lee,&nbsp;Ho-Jin Choi","doi":"10.1016/j.datak.2023.102262","DOIUrl":null,"url":null,"abstract":"<div><p>Speech Emotion Recognition (SER) is an important area of research in speech processing that aims to identify and classify emotional states conveyed through speech signals. Recent studies have shown considerable performance in SER by exploiting deep contextualized speech representations from self-supervised learning (SSL) models. However, SSL models pre-trained on clean speech data may not perform well on emotional speech data due to the domain shift problem. To address this problem, this paper proposes a novel approach that simultaneously exploits an SSL model and a domain-agnostic spectral feature (SF) through the Mixture of Experts (MoE) technique. The proposed approach achieves the state-of-the-art performance on weighted accuracy compared to other methods in the IEMOCAP dataset. Moreover, this paper demonstrates the existence of the domain shift problem of SSL models in the SER task.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102262"},"PeriodicalIF":2.7000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001222/pdfft?md5=48b44d06659bb1ef2a62c484d7369d5b&pid=1-s2.0-S0169023X23001222-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23001222","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Speech Emotion Recognition (SER) is an important area of research in speech processing that aims to identify and classify emotional states conveyed through speech signals. Recent studies have shown considerable performance in SER by exploiting deep contextualized speech representations from self-supervised learning (SSL) models. However, SSL models pre-trained on clean speech data may not perform well on emotional speech data due to the domain shift problem. To address this problem, this paper proposes a novel approach that simultaneously exploits an SSL model and a domain-agnostic spectral feature (SF) through the Mixture of Experts (MoE) technique. The proposed approach achieves the state-of-the-art performance on weighted accuracy compared to other methods in the IEMOCAP dataset. Moreover, this paper demonstrates the existence of the domain shift problem of SSL models in the SER task.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过专家混合物融合自监督学习和频谱特征,提高语音情感识别能力
语音情绪识别(SER)是语音处理领域的一个重要研究领域,旨在识别和分类通过语音信号传递的情绪状态。最近的研究表明,通过利用来自自监督学习(SSL)模型的深度上下文化语音表示,在SER中取得了相当大的性能。然而,由于域移位问题,在干净语音数据上预训练的SSL模型在情感语音数据上可能表现不佳。为了解决这个问题,本文提出了一种新的方法,通过混合专家(MoE)技术同时利用SSL模型和领域不可知论光谱特征(SF)。与IEMOCAP数据集的其他方法相比,该方法在加权精度方面达到了最先进的性能。此外,本文还证明了SSL模型在SER任务中存在领域转移问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
期刊最新文献
Goal modelling in aeronautics: Practical applications for aircraft and manufacturing designs Ethical reasoning methods for ICT: What they are and when to use them SSQTKG: A Subgraph-based Semantic Query Approach for Temporal Knowledge Graph NoSQL document data migration strategy in the context of schema evolution VarClaMM: A reference meta-model to understand DNA variant classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1