Embracing and Exploiting Annotator Emotional Subjectivity: An Affective Rater Ensemble Model

2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) Pub Date : 2021-09-28 DOI:10.1109/aciiw52867.2021.9666407

Lukas Stappen, Lea Schumann, A. Batliner, Björn Schuller

{"title":"Embracing and Exploiting Annotator Emotional Subjectivity: An Affective Rater Ensemble Model","authors":"Lukas Stappen, Lea Schumann, A. Batliner, Björn Schuller","doi":"10.1109/aciiw52867.2021.9666407","DOIUrl":null,"url":null,"abstract":"Automated recognition of continuous emotions in audio-visual data is a growing area of study that aids in understanding human-machine interaction. Training such systems presupposes human annotation of the data. The annotation process, however, is laborious and expensive given that several human ratings are required for every data sample to compensate for the subjectivity of emotion perception. As a consequence, labelled data for emotion recognition are rare and the existing corpora are limited when compared to other state-of-the-art deep learning datasets. In this study, we explore different ways in which existing emotion annotations can be utilised more effectively to exploit available labelled information to the fullest. To reach this objective, we exploit individual raters’ opinions by employing an ensemble of rater-specific models, one for each annotator, by that reducing the loss of information which is a byproduct of annotation aggregation; we find that individual models can indeed infer subjective opinions. Furthermore, we explore the fusion of such ensemble predictions using different fusion techniques. Our ensemble model with only two annotators outperforms the regular Arousal baseline on the test set of the MuSe-CaR corpus. While no considerable improvements on valence could be obtained, using all annotators increases the prediction performance of arousal by up to. 07 Concordance Correlation Coefficient absolute improvement on test - solely trained on rate-specific models and fused by an attention-enhanced Long-short Term Memory-Recurrent Neural Network.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aciiw52867.2021.9666407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automated recognition of continuous emotions in audio-visual data is a growing area of study that aids in understanding human-machine interaction. Training such systems presupposes human annotation of the data. The annotation process, however, is laborious and expensive given that several human ratings are required for every data sample to compensate for the subjectivity of emotion perception. As a consequence, labelled data for emotion recognition are rare and the existing corpora are limited when compared to other state-of-the-art deep learning datasets. In this study, we explore different ways in which existing emotion annotations can be utilised more effectively to exploit available labelled information to the fullest. To reach this objective, we exploit individual raters’ opinions by employing an ensemble of rater-specific models, one for each annotator, by that reducing the loss of information which is a byproduct of annotation aggregation; we find that individual models can indeed infer subjective opinions. Furthermore, we explore the fusion of such ensemble predictions using different fusion techniques. Our ensemble model with only two annotators outperforms the regular Arousal baseline on the test set of the MuSe-CaR corpus. While no considerable improvements on valence could be obtained, using all annotators increases the prediction performance of arousal by up to. 07 Concordance Correlation Coefficient absolute improvement on test - solely trained on rate-specific models and fused by an attention-enhanced Long-short Term Memory-Recurrent Neural Network.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

拥抱和开发注释者情感主体性:一个情感评注者的整体模型

对视听数据中连续情绪的自动识别是一个不断发展的研究领域，有助于理解人机交互。训练这样的系统的前提是人类对数据进行注释。然而，由于每个数据样本都需要几个人工评级来弥补情感感知的主观性，因此注释过程既费力又昂贵。因此，与其他最先进的深度学习数据集相比，用于情感识别的标记数据很少，现有的语料库也很有限。在这项研究中，我们探索了不同的方法，在这些方法中，现有的情感注释可以更有效地利用现有的标记信息。为了达到这一目标，我们通过使用一个特定于评分者的模型集合来利用个人评分者的意见，每个评分者一个模型，通过减少注释聚合的副产品信息损失;我们发现单个模型确实可以推断主观意见。此外，我们使用不同的融合技术探索了这些集合预测的融合。我们的集成模型只有两个注释器，在MuSe-CaR语料库的测试集上优于常规的唤醒基线。虽然在效价上没有明显的改善，但使用所有注释器可以使唤醒的预测性能提高高达。07一致性相关系数在测试中的绝对改进-仅在特定率模型上进行训练，并由注意增强的长短期记忆循环神经网络融合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

自引率

0.00%

发文量