Online Affect Tracking with Multimodal Kalman Filters

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge Pub Date : 2016-10-16 DOI:10.1145/2988257.2988259

Krishna Somandepalli, Rahul Gupta, Md. Nasir, Brandon M. Booth, Sungbok Lee, Shrikanth S. Narayanan

{"title":"Online Affect Tracking with Multimodal Kalman Filters","authors":"Krishna Somandepalli, Rahul Gupta, Md. Nasir, Brandon M. Booth, Sungbok Lee, Shrikanth S. Narayanan","doi":"10.1145/2988257.2988259","DOIUrl":null,"url":null,"abstract":"Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2988257.2988259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多模态卡尔曼滤波器在线影响跟踪

唤起和效价已被广泛地用于情感的维度表征和持续的时间测量。在本文中，我们引入了一个计算框架，用于从多模态数据中跟踪这些情感维度，作为2016年音频/视觉情感挑战和研讨会(AVEC2016)的多模态情感识别子挑战的入口。我们提出了一种线性动态系统方法，采用一种晚期融合方法来解释情感状态演变的动态(即唤醒或价态)。为此，单模态预测被建模为卡尔曼滤波公式中的观测值，以便连续跟踪每个情感维度。利用唤醒和效价之间的相互关系，我们使用预测唤醒作为一个额外的特征来改进效价预测。此外，我们还提出了一个条件框架，在跟踪时选择不同模态的卡尔曼滤波器。该框架使用语音概率和面部姿势线索来检测每种输入模态的存在或缺失。与AVEC2016的基线系统相比，我们在开发和测试集上的多模态融合结果在统计上有显著改善。所提出的方法可以潜在地扩展到其他具有相互关联行为维度的多模态任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

自引率

0.00%

发文量

期刊最新文献

Detecting Depression using Vocal, Facial and Semantic Communication Cues Multimodal Emotion Recognition for AVEC 2016 Challenge Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016 Session details: Depression recognition Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text