Ioannis Kansizoglou;Loukas Bampis;Antonios Gasteratos
{"title":"An Active Learning Paradigm for Online Audio-Visual Emotion Recognition","authors":"Ioannis Kansizoglou;Loukas Bampis;Antonios Gasteratos","doi":"10.1109/TAFFC.2019.2961089","DOIUrl":null,"url":null,"abstract":"The advancement of Human-Robot Interaction (HRI) drives research into the development of advanced emotion identification architectures that fathom audio-visual (A-V) modalities of human emotion. State-of-the-art methods in multi-modal emotion recognition mainly focus on the classification of complete video sequences, leading to systems with no online potentialities. Such techniques are capable of predicting emotions only when the videos are concluded, thus restricting their applicability in practical scenarios. This article provides a novel paradigm for online emotion classification, which exploits both audio and visual modalities and produces a responsive prediction when the system is confident enough. We propose two deep Convolutional Neural Network (CNN) models for extracting emotion features, one for each modality, and a Deep Neural Network (DNN) for their fusion. In order to conceive the temporal quality of human emotion in interactive scenarios, we train in cascade a Long Short-Term Memory (LSTM) layer and a Reinforcement Learning (RL) agent –which monitors the speaker– thus stopping feature extraction and making the final prediction. The comparison of our results on two publicly available A-V emotional datasets viz., RML and BAUM-1s, against other state-of-the-art models, demonstrates the beneficial capabilities of our work.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"13 2","pages":"756-768"},"PeriodicalIF":9.8000,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAFFC.2019.2961089","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8937495/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 54
Abstract
The advancement of Human-Robot Interaction (HRI) drives research into the development of advanced emotion identification architectures that fathom audio-visual (A-V) modalities of human emotion. State-of-the-art methods in multi-modal emotion recognition mainly focus on the classification of complete video sequences, leading to systems with no online potentialities. Such techniques are capable of predicting emotions only when the videos are concluded, thus restricting their applicability in practical scenarios. This article provides a novel paradigm for online emotion classification, which exploits both audio and visual modalities and produces a responsive prediction when the system is confident enough. We propose two deep Convolutional Neural Network (CNN) models for extracting emotion features, one for each modality, and a Deep Neural Network (DNN) for their fusion. In order to conceive the temporal quality of human emotion in interactive scenarios, we train in cascade a Long Short-Term Memory (LSTM) layer and a Reinforcement Learning (RL) agent –which monitors the speaker– thus stopping feature extraction and making the final prediction. The comparison of our results on two publicly available A-V emotional datasets viz., RML and BAUM-1s, against other state-of-the-art models, demonstrates the beneficial capabilities of our work.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.