Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS IEEE Transactions on Computational Social Systems Pub Date : 2024-03-27 DOI:10.1109/TCSS.2024.3362690

Yan Zhao;Yuan Zong;Jincen Wang;Hailun Lian;Cheng Lu;Li Zhao;Wenming Zheng

{"title":"Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition","authors":"Yan Zhao;Yuan Zong;Jincen Wang;Hailun Lian;Cheng Lu;Li Zhao;Wenming Zheng","doi":"10.1109/TCSS.2024.3362690","DOIUrl":null,"url":null,"abstract":"In this article, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDANs) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDANs), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adapted extension consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe preassumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-the-art explicit unsupervised DA methods in tackling cross-corpus SER tasks.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 4","pages":"5419-5430"},"PeriodicalIF":4.5000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10480414/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDANs) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDANs), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adapted extension consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe preassumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-the-art explicit unsupervised DA methods in tackling cross-corpus SER tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于跨语料库语音情感识别的层适配隐式分布对齐网络

在本文中，我们提出了一种新的无监督领域适应（DA）方法，称为层适应隐式分布对齐网络（LIDANs），以应对跨语料库语音情感识别（SER）的挑战。LIDAN 扩展了我们之前的 ICASSP 工作，即深度隐式分布对齐网络（DIDANs），其主要贡献在于引入了一个名为隐式分布对齐（IDA）的新型正则化项。该术语允许在源（训练）语音样本上训练的 DIDAN 继续适用于预测目标（测试）语音样本的情感标签，而无需考虑跨语料库 SER 中的语料库差异。为了进一步增强这种方法，我们将 IDA 扩展为层适配 IDA（LIDA），从而产生了 LIDAN。这种层适应扩展由三个修改后的 IDA 术语组成，这些术语考虑了不同粒度水平的情感标签。这些术语被战略性地安排在 LIDAN 的不同全连接层中，与层深度不断增加的情感判别能力保持一致。与 DIDAN 相比，这种安排使 LIDAN 能够更有效地学习各种语料库中 SER 的情感判别和语料不变特征。值得一提的还有，与大多数依赖估计统计矩来描述预设显式分布的现有方法不同，IDA 和 LIDA 都采用了不同的方法。它们利用目标样本重构的理念直接弥补了特征分布的差距，而无需对其分布类型做出假设。因此，DIDAN 和 LIDAN 可被视为隐式跨语料库 SER 方法。为了评估 LIDAN，我们在 EmoDB、eNTERFACE 和 CASIA 语料库上进行了广泛的跨语料库 SER 实验。实验结果表明，在处理跨语料库 SER 任务方面，LIDAN 超越了最近最先进的显式无监督 DA 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.