Phonetically-Anchored Domain Adaptation for Cross-Lingual Speech Emotion Recognition

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-01-15 DOI:10.1109/TAFFC.2025.3530105
Shreya G. Upadhyay;Luz Martinez-Lucas;William Katz;Carlos Busso;Chi-Chun Lee
{"title":"Phonetically-Anchored Domain Adaptation for Cross-Lingual Speech Emotion Recognition","authors":"Shreya G. Upadhyay;Luz Martinez-Lucas;William Katz;Carlos Busso;Chi-Chun Lee","doi":"10.1109/TAFFC.2025.3530105","DOIUrl":null,"url":null,"abstract":"The prevalence of cross-lingual <italic>speech emotion recognition</i> (SER) modeling has significantly increased due to its wide range of applications. Previous studies have primarily focused on technical strategies to adapt features, domains, and labels across languages, often overlooking the underlying commonalities between the languages. In this study, we address the language adaptation challenge in cross-lingual scenarios by incorporating vowel-phonetic constraints. Our approach is structured in two main parts. First, we investigate the vowel-phonetic commonalities associated with specific emotions across languages, particularly focusing on common vowels that prove to be valuable for SER modeling. Second, we utilize these identified common vowels as anchors to facilitate cross-lingual SER. To demonstrate the effectiveness of our approach, we conduct case studies using <italic>American English</i> and <italic>Taiwanese Mandarin</i> with two naturalistic emotional speech corpora: the MSP-Podcast and BIIC-Podcast corpora. The approach leverages evidence that certain vowels, including monophthongs and diphthongs, exhibit emotion-specific commonality across languages, serving as phonetic anchors to enhance unsupervised cross-lingual SER learning. The proposed model surpasses baseline performance, highlighting the importance of phonetic similarities for effective language adaptation in cross-lingual SER scenarios.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1631-1645"},"PeriodicalIF":9.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10842508/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The prevalence of cross-lingual speech emotion recognition (SER) modeling has significantly increased due to its wide range of applications. Previous studies have primarily focused on technical strategies to adapt features, domains, and labels across languages, often overlooking the underlying commonalities between the languages. In this study, we address the language adaptation challenge in cross-lingual scenarios by incorporating vowel-phonetic constraints. Our approach is structured in two main parts. First, we investigate the vowel-phonetic commonalities associated with specific emotions across languages, particularly focusing on common vowels that prove to be valuable for SER modeling. Second, we utilize these identified common vowels as anchors to facilitate cross-lingual SER. To demonstrate the effectiveness of our approach, we conduct case studies using American English and Taiwanese Mandarin with two naturalistic emotional speech corpora: the MSP-Podcast and BIIC-Podcast corpora. The approach leverages evidence that certain vowels, including monophthongs and diphthongs, exhibit emotion-specific commonality across languages, serving as phonetic anchors to enhance unsupervised cross-lingual SER learning. The proposed model surpasses baseline performance, highlighting the importance of phonetic similarities for effective language adaptation in cross-lingual SER scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于跨语言语音情感识别的音素辅助领域适应技术
由于其广泛的应用,跨语言语音情感识别(SER)建模的普及程度显著提高。以前的研究主要集中在跨语言适应特征、领域和标签的技术策略上,往往忽略了语言之间潜在的共性。在本研究中,我们通过结合元音-语音约束来解决跨语言情境下的语言适应挑战。我们的方法主要分为两个部分。首先,我们研究了跨语言中与特定情绪相关的元音-语音共性,特别关注对SER建模有价值的常见元音。其次,我们利用这些确定的常见元音作为锚点来促进跨语言SER。为了证明我们的方法的有效性,我们使用美国英语和台湾普通话进行了两个自然主义情感语音语料库的案例研究:MSP-Podcast和BIIC-Podcast语料库。该方法利用证据表明,某些元音,包括单元音和双元音,在不同语言中表现出特定情感的共性,作为语音锚点来增强无监督的跨语言SER学习。所提出的模型超越了基线性能,突出了语音相似性对跨语言SER场景中有效语言适应的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
AnxietyFaceTrack: A Smartphone-Based Non-Intrusive Approach for Detecting State Anxiety Using Facial Features Personalized Federated Learning for Session-based Affective Interaction Modeling Hierarchical Vision-Language Interaction for Facial Action Unit Detection SENSE-7: Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations Assessing the Representation of Suicidal Ideation in Social Media Datasets Relative to Suicide Notes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1