Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666309
Jacky Casas, Samuel Torche, Karl Daher, E. Mugellini, Omar Abou Khaled
Emotion style transfer is a recent and challenging problem in Natural Language Processing (NLP). Transformer-based language models are becoming extremely powerful, so one wonders if it would be possible to leverage them to perform emotion style transfer. So far, previous work has not used transformer-based models for this task. To address this task, we fine-tune a GPT-2 model with corrupted emotional data. This will train the model to increase the emotional intensity of the input sentence. Coupled with a paraphrasing model, we develop a system capable of transferring an emotion into a paraphrase. We conducted a qualitative study with human judges, as well as a quantitative evaluation. Although the paraphrase metrics show poor performance compared to the state of the art, the transfer of emotion proved to be effective, especially for the emotions fear, sadness, and disgust. The perception of these emotions were improved both in the automatic and human evaluations. Such technology can significantly facilitate the automatic creation of training sentences for natural language understanding (NLU) systems, but it can also be integrated into an emotional or empathic dialogue architecture.
{"title":"Emotional Paraphrasing Using Pre-trained Language Models","authors":"Jacky Casas, Samuel Torche, Karl Daher, E. Mugellini, Omar Abou Khaled","doi":"10.1109/aciiw52867.2021.9666309","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666309","url":null,"abstract":"Emotion style transfer is a recent and challenging problem in Natural Language Processing (NLP). Transformer-based language models are becoming extremely powerful, so one wonders if it would be possible to leverage them to perform emotion style transfer. So far, previous work has not used transformer-based models for this task. To address this task, we fine-tune a GPT-2 model with corrupted emotional data. This will train the model to increase the emotional intensity of the input sentence. Coupled with a paraphrasing model, we develop a system capable of transferring an emotion into a paraphrase. We conducted a qualitative study with human judges, as well as a quantitative evaluation. Although the paraphrase metrics show poor performance compared to the state of the art, the transfer of emotion proved to be effective, especially for the emotions fear, sadness, and disgust. The perception of these emotions were improved both in the automatic and human evaluations. Such technology can significantly facilitate the automatic creation of training sentences for natural language understanding (NLU) systems, but it can also be integrated into an emotional or empathic dialogue architecture.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133979716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666289
Iyonna Tynes, Shaun J. Canavan
Emotion recognition is a quickly growing field due to the increased interest in building systems which can classify and respond to emotions. Recent medical crises, such as the opioid overdose epidemic in the United States and the global COVID-19 pandemic has emphasized the importance of emotion recognition applications is areas like Telehealth services. Considering this, we propose an approach to real-time ubiquitous pain recognition from facial images. We have conducted offline experiments using the BP4D dataset, where we investigate the impact of gender and data imbalance. This paper proposes an affordable and easily accessible system which can perform pain recognition inferences. The results from this study found a balanced dataset, in terms of class and gender, results in the highest accuracies for pain recognition. We also detail the difficulties of pain recognition using facial images and propose some future work that can be investigated for this challenging problem.
{"title":"Real-time Ubiquitous Pain Recognition","authors":"Iyonna Tynes, Shaun J. Canavan","doi":"10.1109/aciiw52867.2021.9666289","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666289","url":null,"abstract":"Emotion recognition is a quickly growing field due to the increased interest in building systems which can classify and respond to emotions. Recent medical crises, such as the opioid overdose epidemic in the United States and the global COVID-19 pandemic has emphasized the importance of emotion recognition applications is areas like Telehealth services. Considering this, we propose an approach to real-time ubiquitous pain recognition from facial images. We have conducted offline experiments using the BP4D dataset, where we investigate the impact of gender and data imbalance. This paper proposes an affordable and easily accessible system which can perform pain recognition inferences. The results from this study found a balanced dataset, in terms of class and gender, results in the highest accuracies for pain recognition. We also detail the difficulties of pain recognition using facial images and propose some future work that can be investigated for this challenging problem.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130209673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666344
N. Kalashnikova
Nudges, techniques that indirectly influence human decision making, are little studied in spoken interactions. However, the limits of human-computer spoken interactions are not controlled, allowing machines realize bad nudges. In this context a framework for detecting nudges is needed to enhance the ethics of HCI. The work proposed in this PhD thesis is based on the hypothesis that the detection of nudges lies in measuring linguistic, paralinguistic and emotional alignments between interlocutors. Therefore, this PhD thesis aims to answer two research questions. First, does a high level of linguistic and paralinguistic alignement influence human's potential to be nudged? Second, if a person is resistant to other's emotions is she or he less sensible to be nudged? To get a better understanding of the correlation between alignment and nudges, but also a human's potential to be nudged knowing their level of alignment, we will conduct a series of experiments.
{"title":"Detection of Nudges and Measuring of Alignment in Spoken Interactions","authors":"N. Kalashnikova","doi":"10.1109/aciiw52867.2021.9666344","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666344","url":null,"abstract":"Nudges, techniques that indirectly influence human decision making, are little studied in spoken interactions. However, the limits of human-computer spoken interactions are not controlled, allowing machines realize bad nudges. In this context a framework for detecting nudges is needed to enhance the ethics of HCI. The work proposed in this PhD thesis is based on the hypothesis that the detection of nudges lies in measuring linguistic, paralinguistic and emotional alignments between interlocutors. Therefore, this PhD thesis aims to answer two research questions. First, does a high level of linguistic and paralinguistic alignement influence human's potential to be nudged? Second, if a person is resistant to other's emotions is she or he less sensible to be nudged? To get a better understanding of the correlation between alignment and nudges, but also a human's potential to be nudged knowing their level of alignment, we will conduct a series of experiments.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666272
Stanisław Saganowski, Maciej Behnke, Joanna Komoszyńska, Dominika Kunc, Bartosz Perz, Przemyslaw Kazienko
Several obstacles have to be overcome in order to recognize emotions and affect in daily life. One of them is collecting a large amount of emotionally annotated data necessary to create data-greedy machine learning-based predictive models. Hence, we propose the Emognition system supporting the collection of rich emotional samples in everyday-life scenarios. The system utilizes smart-wearables to record physiological signals unobtrusively and smartphones to gather self-assessments. We have performed a two-week pilot study with 15 participants and devices available on the market to validate the system. The outcomes of the study, alongside the discussion and lessons learned, are provided.
{"title":"A system for collecting emotionally annotated physiological signals in daily life using wearables","authors":"Stanisław Saganowski, Maciej Behnke, Joanna Komoszyńska, Dominika Kunc, Bartosz Perz, Przemyslaw Kazienko","doi":"10.1109/aciiw52867.2021.9666272","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666272","url":null,"abstract":"Several obstacles have to be overcome in order to recognize emotions and affect in daily life. One of them is collecting a large amount of emotionally annotated data necessary to create data-greedy machine learning-based predictive models. Hence, we propose the Emognition system supporting the collection of rich emotional samples in everyday-life scenarios. The system utilizes smart-wearables to record physiological signals unobtrusively and smartphones to gather self-assessments. We have performed a two-week pilot study with 15 participants and devices available on the market to validate the system. The outcomes of the study, alongside the discussion and lessons learned, are provided.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122444296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666292
Gabriel Haddon-Hill, Keerthy Kusumam, M. Valstar
Video-to-video synthesis methods provide increasingly accessible solutions for training models on privacy-sensitive and limited-size datasets frequently encountered in domains such as affect analysis. However, there are no existing baselines that explicitly measure the extent of reliable expression transfer or privacy preservation in the generated data. In this paper, we evaluate a general-purpose video transfer method, vid2vid, on these two key tasks: expression transfer and anonymisation of identities, as well as its suitability for training affect prediction models. We provide results that form a strong baseline for future comparisons, and further motivate the need for purpose-built methods for conducting expression-preserving video transfer. Our results indicate that a significant limitation of vid2vid's expression transfer arises from conditioning on facial landmarks and optical flow, which do not carry sufficient information to preserve facial expressions. Finally, we demonstrate that vid2vid can adequately anonymise videos in some cases, though not consistently, and that the anonymisation can be improved by applying random perturbations to input landmarks, at the cost of reduced expression transfer.
{"title":"A simple baseline for evaluating Expression Transfer and Anonymisation in Video Transfer","authors":"Gabriel Haddon-Hill, Keerthy Kusumam, M. Valstar","doi":"10.1109/aciiw52867.2021.9666292","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666292","url":null,"abstract":"Video-to-video synthesis methods provide increasingly accessible solutions for training models on privacy-sensitive and limited-size datasets frequently encountered in domains such as affect analysis. However, there are no existing baselines that explicitly measure the extent of reliable expression transfer or privacy preservation in the generated data. In this paper, we evaluate a general-purpose video transfer method, vid2vid, on these two key tasks: expression transfer and anonymisation of identities, as well as its suitability for training affect prediction models. We provide results that form a strong baseline for future comparisons, and further motivate the need for purpose-built methods for conducting expression-preserving video transfer. Our results indicate that a significant limitation of vid2vid's expression transfer arises from conditioning on facial landmarks and optical flow, which do not carry sufficient information to preserve facial expressions. Finally, we demonstrate that vid2vid can adequately anonymise videos in some cases, though not consistently, and that the anonymisation can be improved by applying random perturbations to input landmarks, at the cost of reduced expression transfer.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Facial expressions convey emotions. However, not all people are good at understanding such expressions in daily communication. To address this issue, we herein use “SUGO-MIMI,” a lightweight device that does not require a power source to expand facial expressions. SUGO-MIMI expands eyebrow movements by connecting the eyebrows to thin plates, imitating cat ears attached to the headband with wires. In our experiment of conveying facial expressions, the emotion denoting happiness was conveyed better when wearing SUGO-MIMI.
{"title":"SUGO-MIMI: A Waggle Ear-Type Device Linked to Eyebrows","authors":"Shoko Kimura, Ayaka Fujii, Seiichi Harata, Takuto Sakuma, Shohei Kato","doi":"10.1109/aciiw52867.2021.9666368","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666368","url":null,"abstract":"Facial expressions convey emotions. However, not all people are good at understanding such expressions in daily communication. To address this issue, we herein use “SUGO-MIMI,” a lightweight device that does not require a power source to expand facial expressions. SUGO-MIMI expands eyebrow movements by connecting the eyebrows to thin plates, imitating cat ears attached to the headband with wires. In our experiment of conveying facial expressions, the emotion denoting happiness was conveyed better when wearing SUGO-MIMI.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123271422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cognitive restructuring is a therapeutic technique of cognitive behavior therapy that helps convert negative automatic thoughts to balanced thoughts. Automatic thoughts can be converted by the patient/participant's own objective evaluation from a different perspective. Human therapists ask questions intending to guide the evaluation of automatic thoughts more effectively. Virtual agents, as therapists, have a great potential to support cognitive restructuring. We investigated how a virtual agent could affect the participant's mood when asked questions to evaluate automatic thoughts in cognitive restructuring. We implemented a virtual agent that performs scenario-based dialogue with two types of dialogue scenarios: with and without questions to evaluate automatic thoughts. We conducted a dialogue experiment with 20 healthy graduate students and divided them into two groups of ten, finding that the participant's negative mood significantly improved when the virtual agent asked questions to evaluate the automatic thoughts. Furthermore, the number of helpful questions was significantly correlated with the degree of mood change (ρ=0.81). The results suggest that it is important to provide appropriate questions for cognitive restructuring and that the number of helpful questions reflects the dialogue's effectiveness.
{"title":"Relationship between Mood Improvement and Questioning to Evaluate Automatic Thoughts in Cognitive Restructuring with a Virtual Agent","authors":"Kazuhiro Shidara, Hiroki Tanaka, Hiroyoshi Adachi, D. Kanayama, Yukako Sakagami, Takashi Kudo, Satoshi Nakamura","doi":"10.1109/aciiw52867.2021.9666312","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666312","url":null,"abstract":"Cognitive restructuring is a therapeutic technique of cognitive behavior therapy that helps convert negative automatic thoughts to balanced thoughts. Automatic thoughts can be converted by the patient/participant's own objective evaluation from a different perspective. Human therapists ask questions intending to guide the evaluation of automatic thoughts more effectively. Virtual agents, as therapists, have a great potential to support cognitive restructuring. We investigated how a virtual agent could affect the participant's mood when asked questions to evaluate automatic thoughts in cognitive restructuring. We implemented a virtual agent that performs scenario-based dialogue with two types of dialogue scenarios: with and without questions to evaluate automatic thoughts. We conducted a dialogue experiment with 20 healthy graduate students and divided them into two groups of ten, finding that the participant's negative mood significantly improved when the virtual agent asked questions to evaluate the automatic thoughts. Furthermore, the number of helpful questions was significantly correlated with the degree of mood change (ρ=0.81). The results suggest that it is important to provide appropriate questions for cognitive restructuring and that the number of helpful questions reflects the dialogue's effectiveness.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128962814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666306
Lars Steinert
Dementia places an immeasurable burden on affected individuals and caregivers. In non-pharmacological therapy, physical, social, and cognitive activation of People with Dementia (PwD) is known to be crucial. However, effective activation requires sustained engagement. Technical activation systems thus require a means to automatically recognize if the user is engaged. While research has shown that engagement can be automatically recognized in healthy individuals, this task is especially challenging for PwD who might suffer from aphasia or blunted affect. In this project, I aim to investigate whether PwD provide sufficient verbal and non-verbal signals for the automatic recognition of engagement. Next, I aim to build a multimodal engagement recognition system for PwD using a technical activation system. Lastly, I aim to leverage this knowledge to build and evaluate an engagement-aware recommender system to promote the usage of engaging activation contents.
{"title":"A Multimodal Engagement-Aware Recommender System for People with Dementia","authors":"Lars Steinert","doi":"10.1109/aciiw52867.2021.9666306","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666306","url":null,"abstract":"Dementia places an immeasurable burden on affected individuals and caregivers. In non-pharmacological therapy, physical, social, and cognitive activation of People with Dementia (PwD) is known to be crucial. However, effective activation requires sustained engagement. Technical activation systems thus require a means to automatically recognize if the user is engaged. While research has shown that engagement can be automatically recognized in healthy individuals, this task is especially challenging for PwD who might suffer from aphasia or blunted affect. In this project, I aim to investigate whether PwD provide sufficient verbal and non-verbal signals for the automatic recognition of engagement. Next, I aim to build a multimodal engagement recognition system for PwD using a technical activation system. Lastly, I aim to leverage this knowledge to build and evaluate an engagement-aware recommender system to promote the usage of engaging activation contents.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"22 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130213632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666258
Samiha Samrose, E. Hoque
In this work, from YouTube News-show multimodal dataset with dyadic speakers having heated discussions, we analyze the toxicity through audio-visual signals. Firstly, as different speakers may contribute differently towards the toxicity, we propose a speaker-wise toxicity score revealing individual proportionate contribution. As discussions with disagreements may reflect some signals of toxicity, in order to identify discussions needing more attention we categorize discussions into binary high-low toxicity levels. By analyzing visual features, we show that the levels correlate with facial expressions as Upper Lid Raiser (associated with ‘surprise’), Dimpler (associated with ‘contempť), and Lip Corner Depressor (associated with ‘disgust’) remain statistically significant in separating high-low intensities of disrespect. Secondly, we investigate the impact of audio-based features such as pitch and intensity that can significantly elicit disrespect, and utilize the signals in classifying disrespect and non-disrespect samples by applying logistic regression model achieving 79.86% accuracy. Our findings shed light on the potential of utilizing audio-visual signals in adding important context towards understanding toxic discussions.
{"title":"Quantifying the Intensity of Toxicity for Discussions and Speakers","authors":"Samiha Samrose, E. Hoque","doi":"10.1109/aciiw52867.2021.9666258","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666258","url":null,"abstract":"In this work, from YouTube News-show multimodal dataset with dyadic speakers having heated discussions, we analyze the toxicity through audio-visual signals. Firstly, as different speakers may contribute differently towards the toxicity, we propose a speaker-wise toxicity score revealing individual proportionate contribution. As discussions with disagreements may reflect some signals of toxicity, in order to identify discussions needing more attention we categorize discussions into binary high-low toxicity levels. By analyzing visual features, we show that the levels correlate with facial expressions as Upper Lid Raiser (associated with ‘surprise’), Dimpler (associated with ‘contempť), and Lip Corner Depressor (associated with ‘disgust’) remain statistically significant in separating high-low intensities of disrespect. Secondly, we investigate the impact of audio-based features such as pitch and intensity that can significantly elicit disrespect, and utilize the signals in classifying disrespect and non-disrespect samples by applying logistic regression model achieving 79.86% accuracy. Our findings shed light on the potential of utilizing audio-visual signals in adding important context towards understanding toxic discussions.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123559581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1109/aciiw52867.2021.9666407
Lukas Stappen, Lea Schumann, A. Batliner, Björn Schuller
Automated recognition of continuous emotions in audio-visual data is a growing area of study that aids in understanding human-machine interaction. Training such systems presupposes human annotation of the data. The annotation process, however, is laborious and expensive given that several human ratings are required for every data sample to compensate for the subjectivity of emotion perception. As a consequence, labelled data for emotion recognition are rare and the existing corpora are limited when compared to other state-of-the-art deep learning datasets. In this study, we explore different ways in which existing emotion annotations can be utilised more effectively to exploit available labelled information to the fullest. To reach this objective, we exploit individual raters’ opinions by employing an ensemble of rater-specific models, one for each annotator, by that reducing the loss of information which is a byproduct of annotation aggregation; we find that individual models can indeed infer subjective opinions. Furthermore, we explore the fusion of such ensemble predictions using different fusion techniques. Our ensemble model with only two annotators outperforms the regular Arousal baseline on the test set of the MuSe-CaR corpus. While no considerable improvements on valence could be obtained, using all annotators increases the prediction performance of arousal by up to. 07 Concordance Correlation Coefficient absolute improvement on test - solely trained on rate-specific models and fused by an attention-enhanced Long-short Term Memory-Recurrent Neural Network.
{"title":"Embracing and Exploiting Annotator Emotional Subjectivity: An Affective Rater Ensemble Model","authors":"Lukas Stappen, Lea Schumann, A. Batliner, Björn Schuller","doi":"10.1109/aciiw52867.2021.9666407","DOIUrl":"https://doi.org/10.1109/aciiw52867.2021.9666407","url":null,"abstract":"Automated recognition of continuous emotions in audio-visual data is a growing area of study that aids in understanding human-machine interaction. Training such systems presupposes human annotation of the data. The annotation process, however, is laborious and expensive given that several human ratings are required for every data sample to compensate for the subjectivity of emotion perception. As a consequence, labelled data for emotion recognition are rare and the existing corpora are limited when compared to other state-of-the-art deep learning datasets. In this study, we explore different ways in which existing emotion annotations can be utilised more effectively to exploit available labelled information to the fullest. To reach this objective, we exploit individual raters’ opinions by employing an ensemble of rater-specific models, one for each annotator, by that reducing the loss of information which is a byproduct of annotation aggregation; we find that individual models can indeed infer subjective opinions. Furthermore, we explore the fusion of such ensemble predictions using different fusion techniques. Our ensemble model with only two annotators outperforms the regular Arousal baseline on the test set of the MuSe-CaR corpus. While no considerable improvements on valence could be obtained, using all annotators increases the prediction performance of arousal by up to. 07 Concordance Correlation Coefficient absolute improvement on test - solely trained on rate-specific models and fused by an attention-enhanced Long-short Term Memory-Recurrent Neural Network.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123786951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}