Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-03-03 DOI:10.1109/TAFFC.2025.3547218
Anna Derington;Hagen Wierstorf;Ali Özkil;Florian Eyben;Felix Burkhardt;Björn W. Schuller
{"title":"Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models","authors":"Anna Derington;Hagen Wierstorf;Ali Özkil;Florian Eyben;Felix Burkhardt;Björn W. Schuller","doi":"10.1109/TAFFC.2025.3547218","DOIUrl":null,"url":null,"abstract":"Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It also provides a method for automatically specifying test thresholds for fairness tests, based on the datasets used, and recommendations on how to select the remaining test thresholds. We evaluated a xLSTM-based and nine transformer-based acoustic foundation models against a convolutional baseline model, testing their performance on arousal, valence, dominance, and emotional category classification. The test results highlight, that models with high correlation or recall might rely on shortcuts – such as text sentiment –, and differ in terms of fairness.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1929-1941"},"PeriodicalIF":9.8000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908859/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It also provides a method for automatically specifying test thresholds for fairness tests, based on the datasets used, and recommendations on how to select the remaining test thresholds. We evaluated a xLSTM-based and nine transformer-based acoustic foundation models against a convolutional baseline model, testing their performance on arousal, valence, dominance, and emotional category classification. The test results highlight, that models with high correlation or recall might rely on shortcuts – such as text sentiment –, and differ in terms of fairness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
测试语音情感识别模型的正确性、公平性和鲁棒性
语音情感识别(SER)的机器学习模型可以针对不同的任务进行训练,并且通常基于每个任务的几个可用数据集进行评估。任务可能包括唤醒、效价、支配、情绪类别或语调。这些模型主要是根据相关性或召回来评估的,并且总是在预测中显示出一些错误。错误在模型行为中表现出来,即使模型实现了相同的召回或相关性,在不同的维度上也可能有很大的不同。本文引入了一个测试框架来研究语音情感识别模型的行为,通过要求不同的指标达到一定的阈值才能通过测试。测试指标可以根据正确性、公平性和健壮性进行分组。它还提供了一种方法,可根据所使用的数据集自动指定公平性测试的测试阈值,并提供了关于如何选择剩余测试阈值的建议。我们根据卷积基线模型评估了基于xlstm和9个基于变压器的声学基础模型,测试了它们在唤醒、效价、优势和情感类别分类方面的表现。测试结果强调,高相关性或回忆的模型可能依赖于捷径——比如文本情感——并且在公平性方面有所不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting Weakly Supervised Learning for Facial Affective Behavior Analysis: a Review CWEFS: Brain volume conduction effects inspired channel-wise EEG feature selection for multi-dimensional emotion recognition LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space Modeling Continuous Weak Temporal Trends for Video-based Micro-Expression Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1