{"title":"Individual-Aware Attention Modulation for Unseen Speaker Emotion Recognition","authors":"Yuanbo Fang;Xiaofen Xing;Zhaojie Chu;Yifeng Du;Xiangmin Xu","doi":"10.1109/TAFFC.2024.3498937","DOIUrl":null,"url":null,"abstract":"In practical human-computer interaction (HCI) applications, robust speech emotion recognition (SER) for unseen speakers is crucial. Prior research has primarily focused on extracting common representations to enhance the generalization of cross-individual SER. However, most methods ignore the positive effects of individual characteristics. Actually, each speaker can be regarded as an independent individual domain. Personalized SER can be improved if the emotional expressions of individual speech characteristics are effectively utilized. To address the challenges in recognizing emotions for unseen speakers, this paper proposes a novel individual-aware attention modulation (IAM) model. Specifically, the IAM uses meta-learning techniques to extract modulation parameters for obtaining individual-related emotion expressions from individual characteristics. The base model is then modulated to facilitate the transfer of the common emotion representation space to an individual-specific emotion representation space. This transformation is achieved by applying attention modulation within the transformer-based model developed in this paper. In addition, we employ a meta-learning-based method to optimize model parameters, enhancing the adaptability of the model to unseen speakers, and a control factor is introduced to regulate the degree of individual modulation, thus enhancing the robustness of the modulation process. Experimental results demonstrate that the proposed model achieves significantly improved cross-individual SER performance.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"1205-1218"},"PeriodicalIF":9.8000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753506/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In practical human-computer interaction (HCI) applications, robust speech emotion recognition (SER) for unseen speakers is crucial. Prior research has primarily focused on extracting common representations to enhance the generalization of cross-individual SER. However, most methods ignore the positive effects of individual characteristics. Actually, each speaker can be regarded as an independent individual domain. Personalized SER can be improved if the emotional expressions of individual speech characteristics are effectively utilized. To address the challenges in recognizing emotions for unseen speakers, this paper proposes a novel individual-aware attention modulation (IAM) model. Specifically, the IAM uses meta-learning techniques to extract modulation parameters for obtaining individual-related emotion expressions from individual characteristics. The base model is then modulated to facilitate the transfer of the common emotion representation space to an individual-specific emotion representation space. This transformation is achieved by applying attention modulation within the transformer-based model developed in this paper. In addition, we employ a meta-learning-based method to optimize model parameters, enhancing the adaptability of the model to unseen speakers, and a control factor is introduced to regulate the degree of individual modulation, thus enhancing the robustness of the modulation process. Experimental results demonstrate that the proposed model achieves significantly improved cross-individual SER performance.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.