首页 > 最新文献

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge最新文献

英文 中文
Multimodal Emotion Recognition for AVEC 2016 Challenge AVEC 2016挑战赛多模态情绪识别
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988268
Filip Povolný, P. Matejka, Michal Hradiš, A. Popková, Lubomír Otrusina, P. Smrz, Ian D. Wood, Cécile Robin, L. Lamel
This paper describes a systems for emotion recognition and its application on the dataset from the AV+EC 2016 Emotion Recognition Challenge. The realized system was produced and submitted to the AV+EC 2016 evaluation, making use of all three modalities (audio, video, and physiological data). Our work primarily focused on features derived from audio. The original audio features were complement with bottleneck features and also text-based emotion recognition which is based on transcribing audio by an automatic speech recognition system and applying resources such as word embedding models and sentiment lexicons. Our multimodal fusion reached CCC=0.855 on dev set for arousal and 0.713 for valence. CCC on test set is 0.719 and 0.596 for arousal and valence respectively.
本文介绍了一种情绪识别系统及其在AV+EC 2016情绪识别挑战赛数据集上的应用。制作完成的系统并提交给AV+EC 2016评估,使用了所有三种模式(音频、视频和生理数据)。我们的工作主要集中在源自音频的功能上。在原有音频特征的基础上补充瓶颈特征和基于文本的情感识别,即通过自动语音识别系统转录音频,利用词嵌入模型和情感词汇等资源进行情感识别。我们的多模态融合在开发集上达到了CCC=0.855,在效价集上达到了0.713。唤醒和效价在测试集上的CCC值分别为0.719和0.596。
{"title":"Multimodal Emotion Recognition for AVEC 2016 Challenge","authors":"Filip Povolný, P. Matejka, Michal Hradiš, A. Popková, Lubomír Otrusina, P. Smrz, Ian D. Wood, Cécile Robin, L. Lamel","doi":"10.1145/2988257.2988268","DOIUrl":"https://doi.org/10.1145/2988257.2988268","url":null,"abstract":"This paper describes a systems for emotion recognition and its application on the dataset from the AV+EC 2016 Emotion Recognition Challenge. The realized system was produced and submitted to the AV+EC 2016 evaluation, making use of all three modalities (audio, video, and physiological data). Our work primarily focused on features derived from audio. The original audio features were complement with bottleneck features and also text-based emotion recognition which is based on transcribing audio by an automatic speech recognition system and applying resources such as word embedding models and sentiment lexicons. Our multimodal fusion reached CCC=0.855 on dev set for arousal and 0.713 for valence. CCC on test set is 0.719 and 0.596 for arousal and valence respectively.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114990473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Decision Tree Based Depression Classification from Audio Video and Language Information 基于决策树的音视频和语言信息抑郁分类
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988269
Le Yang, D. Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, H. Sahli
In order to improve the recognition accuracy of the Depression Classification Sub-Challenge (DCC) of the AVEC 2016, in this paper we propose a decision tree for depression classification. The decision tree is constructed according to the distribution of the multimodal prediction of PHQ-8 scores and participants' characteristics (PTSD/Depression Diagnostic, sleep-status, feeling and personality) obtained via the analysis of the transcript files of the participants. The proposed gender specific decision tree provides a way of fusing the upper level language information with the results obtained using low level audio and visual features. Experiments are carried out on the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) database, results show that the proposed depression classification schemes obtain very promising results on the development set, with F1 score reaching 0.857 for class depressed and 0.964 for class not depressed. Despite of the over-fitting problem in training the models of predicting the PHQ-8 scores, the classification schemes still obtain satisfying performance on the test set. The F1 score reaches 0.571 for class depressed and 0.877 for class not depressed, with the average 0.724 which is higher than the baseline result 0.700.
为了提高AVEC 2016抑郁症分类子挑战(DCC)的识别精度,本文提出了一种抑郁症分类决策树。根据PHQ-8分数的多模态预测与被试特征(PTSD/抑郁诊断、睡眠状态、感觉和人格)的分布,构建决策树。所提出的性别特定决策树提供了一种将上层语言信息与使用低级音频和视觉特征获得的结果融合的方法。在苦恼分析访谈语料-绿野仙踪(DAIC-WOZ)数据库上进行了实验,结果表明所提出的抑郁分类方案在开发集上取得了很好的效果,抑郁类的F1得分达到0.857,非抑郁类的F1得分达到0.964。尽管在训练PHQ-8分数预测模型时存在过拟合问题,但分类方案在测试集上仍然获得了令人满意的性能。抑郁班级的F1得分为0.571,未抑郁班级的F1得分为0.877,平均得分为0.724,高于基线结果0.700。
{"title":"Decision Tree Based Depression Classification from Audio Video and Language Information","authors":"Le Yang, D. Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, H. Sahli","doi":"10.1145/2988257.2988269","DOIUrl":"https://doi.org/10.1145/2988257.2988269","url":null,"abstract":"In order to improve the recognition accuracy of the Depression Classification Sub-Challenge (DCC) of the AVEC 2016, in this paper we propose a decision tree for depression classification. The decision tree is constructed according to the distribution of the multimodal prediction of PHQ-8 scores and participants' characteristics (PTSD/Depression Diagnostic, sleep-status, feeling and personality) obtained via the analysis of the transcript files of the participants. The proposed gender specific decision tree provides a way of fusing the upper level language information with the results obtained using low level audio and visual features. Experiments are carried out on the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) database, results show that the proposed depression classification schemes obtain very promising results on the development set, with F1 score reaching 0.857 for class depressed and 0.964 for class not depressed. Despite of the over-fitting problem in training the models of predicting the PHQ-8 scores, the classification schemes still obtain satisfying performance on the test set. The F1 score reaches 0.571 for class depressed and 0.877 for class not depressed, with the average 0.724 which is higher than the baseline result 0.700.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"66 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123966554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text 通过融合音频、视频和文本的高、低水平特征来评估抑郁症
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988266
A. Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, M. Pediaditis, D. Manousos, A. Roniotis, G. Giannakakis, F. Mériaudeau, P. Simos, K. Marias, Fan Yang, M. Tsiknakis
Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.
抑郁症是全世界致残的一个主要原因。本文报告了我们参与第六届视听情感挑战赛(AVEC 2016)抑郁子挑战的结果,该挑战赛旨在使用各种分类算法比较基于性别和性别独立模式下的特征模式(音频、视觉、基于采访转录的)。在我们的方法中,每个模态都评估了高水平和低水平特征。音频特征是从挑战赛组织者提供的低级描述符中提取的。提取并评估了几个视觉特征,包括面部元素的动态特征(使用地标运动历史直方图和地标运动幅度)、头部整体运动和眨眼。这些特征与从预提取的特征(情绪、动作单位、凝视和姿势)中统计得出的特征相结合。同时对语速和词级语义内容进行了评估。分类结果报告使用四种不同的分类方案:i)每个个体模态的基于性别的模型,ii)特征融合模型,ii)决策融合模型和iv)后验概率分类模型。提出的优于参考分类精度的方法包括利用低级音频特征的统计描述符的方法。该方法在发展集上识别抑郁个体的f1得分为0.59,识别非抑郁个体的f1得分为0.87,测试集的f1得分为0.52/0.81。
{"title":"Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text","authors":"A. Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, M. Pediaditis, D. Manousos, A. Roniotis, G. Giannakakis, F. Mériaudeau, P. Simos, K. Marias, Fan Yang, M. Tsiknakis","doi":"10.1145/2988257.2988266","DOIUrl":"https://doi.org/10.1145/2988257.2988266","url":null,"abstract":"Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116680869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction 用于持续情绪预测的多模态音频、视频和生理传感器学习
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988264
K. Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, W. Campbell, Charlie K. Dagli, Thomas S. Huang
The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
从多媒体内容中自动确定情绪状态是一个具有挑战性的问题,具有广泛的应用范围,包括生物医学诊断、多媒体检索和人机界面。2016年音频视频情绪挑战(AVEC)为开发和严格评估评估情绪觉醒和价态随时间变化的创新方法提供了一个明确的框架。它为研究包括音频、视频和生理传感器信号在内的多模式解决方案提供了机会。本文概述了我们的AVEC情绪挑战系统,该系统使用跨所有可用模式的多特征学习和融合。它包括许多技术贡献,包括开发用于在音频、视频和生理通道中建模情感的新颖的高级和低级特征。低级功能包括用最小的基于韵律的描述符在音频中建模唤醒。高级特征来源于基于稀疏编码和深度学习的有监督和无监督机器学习方法。最后,将状态空间估计方法应用于分数融合,证明了利用唤醒态和价态的时间序列特性的重要性。由此产生的系统在测试评估集上优于基线系统[10],其一致性相关系数(CCC)的唤醒值为0.770 vs 0.702(基线),效价为0.687 vs 0.638。未来的工作将侧重于利用多模态框架中单个渠道的时变特性。
{"title":"Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction","authors":"K. Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, W. Campbell, Charlie K. Dagli, Thomas S. Huang","doi":"10.1145/2988257.2988264","DOIUrl":"https://doi.org/10.1145/2988257.2988264","url":null,"abstract":"The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121099949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
High-Level Geometry-based Features of Video Modality for Emotion Prediction 基于高级几何的情感预测视频模态特征
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988262
Raphaël Weber, Vincent Barrielle, Catherine Soladié, R. Séguier
The automatic analysis of emotion remains a challenging task in unconstrained experimental conditions. In this paper, we present our contribution to the 6th Audio/Visual Emotion Challenge (AVEC 2016), which aims at predicting the continuous emotional dimensions of arousal and valence. First, we propose to improve the performance of the multimodal prediction with low-level features by adding high-level geometry-based features, namely head pose and expression signature. The head pose is estimated by fitting a reference 3D mesh to the 2D facial landmarks. The expression signature is the projection of the facial landmarks in an unsupervised person-specific model. Second, we propose to fuse the unimodal predictions trained on each training subject before performing the multimodal fusion. The results show that our high-level features improve the performance of the multimodal prediction of arousal and that the subjects fusion works well in unimodal prediction but generalizes poorly in multimodal prediction, particularly on valence.
在不受约束的实验条件下,情绪的自动分析仍然是一项具有挑战性的任务。在本文中,我们展示了我们对第六届视听情感挑战(AVEC 2016)的贡献,该挑战旨在预测唤醒和效价的连续情感维度。首先,我们提出通过添加基于几何的高级特征(即头部姿态和表情特征)来提高具有低级特征的多模态预测的性能。通过将参考3D网格拟合到2D面部地标来估计头部姿势。表情签名是在无监督的个人特定模型中面部标志的投影。其次,我们建议在进行多模态融合之前,先融合在每个训练主题上训练的单模态预测。结果表明,我们的高层次特征提高了唤醒的多模态预测的性能,并且被试融合在单模态预测中效果良好,但在多模态预测中泛化效果较差,尤其是在价态预测中。
{"title":"High-Level Geometry-based Features of Video Modality for Emotion Prediction","authors":"Raphaël Weber, Vincent Barrielle, Catherine Soladié, R. Séguier","doi":"10.1145/2988257.2988262","DOIUrl":"https://doi.org/10.1145/2988257.2988262","url":null,"abstract":"The automatic analysis of emotion remains a challenging task in unconstrained experimental conditions. In this paper, we present our contribution to the 6th Audio/Visual Emotion Challenge (AVEC 2016), which aims at predicting the continuous emotional dimensions of arousal and valence. First, we propose to improve the performance of the multimodal prediction with low-level features by adding high-level geometry-based features, namely head pose and expression signature. The head pose is estimated by fitting a reference 3D mesh to the 2D facial landmarks. The expression signature is the projection of the facial landmarks in an unsupervised person-specific model. Second, we propose to fuse the unimodal predictions trained on each training subject before performing the multimodal fusion. The results show that our high-level features improve the performance of the multimodal prediction of arousal and that the subjects fusion works well in unimodal prediction but generalizes poorly in multimodal prediction, particularly on valence.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128566505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Online Affect Tracking with Multimodal Kalman Filters 多模态卡尔曼滤波器在线影响跟踪
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988259
Krishna Somandepalli, Rahul Gupta, Md. Nasir, Brandon M. Booth, Sungbok Lee, Shrikanth S. Narayanan
Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.
唤起和效价已被广泛地用于情感的维度表征和持续的时间测量。在本文中,我们引入了一个计算框架,用于从多模态数据中跟踪这些情感维度,作为2016年音频/视觉情感挑战和研讨会(AVEC2016)的多模态情感识别子挑战的入口。我们提出了一种线性动态系统方法,采用一种晚期融合方法来解释情感状态演变的动态(即唤醒或价态)。为此,单模态预测被建模为卡尔曼滤波公式中的观测值,以便连续跟踪每个情感维度。利用唤醒和效价之间的相互关系,我们使用预测唤醒作为一个额外的特征来改进效价预测。此外,我们还提出了一个条件框架,在跟踪时选择不同模态的卡尔曼滤波器。该框架使用语音概率和面部姿势线索来检测每种输入模态的存在或缺失。与AVEC2016的基线系统相比,我们在开发和测试集上的多模态融合结果在统计上有显著改善。所提出的方法可以潜在地扩展到其他具有相互关联行为维度的多模态任务。
{"title":"Online Affect Tracking with Multimodal Kalman Filters","authors":"Krishna Somandepalli, Rahul Gupta, Md. Nasir, Brandon M. Booth, Sungbok Lee, Shrikanth S. Narayanan","doi":"10.1145/2988257.2988259","DOIUrl":"https://doi.org/10.1145/2988257.2988259","url":null,"abstract":"Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge AVEC 2016:抑郁,情绪和情绪识别研讨会和挑战
Pub Date : 2016-05-05 DOI: 10.1145/2988257.2988258
M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, D. Lalanne, M. Torres, Stefan Scherer, Giota Stratou, R. Cowie, M. Pantic
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
视听情感挑战及工作坊(AVEC 2016)“抑郁,情绪和情绪”将是第六次比赛,旨在比较多媒体处理和机器学习方法,用于自动音频,视觉和生理抑郁和情绪分析,所有参与者在严格相同的条件下比赛。挑战赛的目标是为多模态信息处理提供一个通用的基准测试集,并将抑郁症和情绪识别社区以及音频,视频和生理处理社区聚集在一起,在定义明确且严格可比的条件下,比较各种方法在抑郁症和情绪识别方面的相对优点,并确定方法融合的可能性和有益程度。本文介绍了挑战指南、使用的常用数据以及基线系统在这两个任务上的性能。
{"title":"AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge","authors":"M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, D. Lalanne, M. Torres, Stefan Scherer, Giota Stratou, R. Cowie, M. Pantic","doi":"10.1145/2988257.2988258","DOIUrl":"https://doi.org/10.1145/2988257.2988258","url":null,"abstract":"The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) \"Depression, Mood and Emotion\" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134475770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 522
Session details: Keynote Address 会议详情:主题演讲
M. Valstar
{"title":"Session details: Keynote Address","authors":"M. Valstar","doi":"10.1145/3255910","DOIUrl":"https://doi.org/10.1145/3255910","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124031827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge 第六届视听情感挑战国际研讨会论文集
M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, R. Cowie, M. Pantic
It is our great pleasure to welcome you to the 5th Audio-Visual Emotion recognition Challenge (AVEC 2015), held in conjunction with the ACM Multimedia 2015. This year's challenge and associated workshop continues to push the boundaries of audio-visual emotion recognition. The first AVEC challenge posed the problem of detecting discrete emotion classes on an extremely large set of natural behaviour data. The second AVEC extended this problem to the prediction of continuous valued dimensional affect on the same set of challenging data. In its third edition, we enlarged the problem even further to include the prediction of self-reported severity of depression. The fourth edition of AVEC focused on the study of depression and affect by narrowing down the number of tasks to be used, and enriching the annotation. Finally, this year we've focused the study of affect by including physiology, along with audio-visual data, in the dataset, making the very first emotion recognition challenge that bridges across audio, video and physiological data. The mission of AVEC challenge and workshop series is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio, video and -- for the first time ever -- physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data. As you will see, these goals have been reached with the selection of this year's data and the challenge contributions. The call for participation attracted 15 submissions from Asia, Europe, Oceania and North America. The programme committee accepted 9 papers in addition to the baseline paper for oral presentation. For the challenge, no less than 48 results submissions were made by 13 teams! We hope that these proceedings will serve as a valuable reference for researchers and developers in the area of audio-visual-physiological emotion recognition and analysis. We also encourage attendees to attend the keynote presentation. This valuable and insightful talk can and will guide us to a better understanding of the state of the field, and future direction: AVEC'15 Keynote Talk -- From Facial Expression Analysis to Multimodal Mood Analysis, Pr. Roland Goecke (University of Canberra, Australia)
我们非常高兴地欢迎您参加第五届视听情感识别挑战赛(AVEC 2015),该挑战赛与ACM多媒体2015联合举办。今年的挑战和相关的研讨会继续推动视听情感识别的边界。AVEC的第一个挑战提出了在一个非常大的自然行为数据集上检测离散情感类别的问题。第二次AVEC将该问题扩展到对同一组挑战性数据的连续量纲影响的预测。在第三版中,我们进一步扩大了这个问题,包括对自我报告的抑郁严重程度的预测。第四版AVEC通过缩小任务数量和丰富注释,将重点放在抑郁和情绪的研究上。最后,今年我们把重点放在了情感的研究上,包括生理学和视听数据,在数据集中,创造了第一个跨越音频、视频和生理数据的情感识别挑战。AVEC挑战和研讨会系列的使命是为个体多模态信息处理提供一个通用的基准测试集,并首次将音频,视频和生理情绪识别社区聚集在一起,比较三种方法在定义良好且严格可比的条件下的相对优点,并确定融合的程度。第二个动机是需要推进情感识别系统,以便能够处理大量未分割、非原型和非预选数据中的自然行为。正如您将看到的,随着今年数据的选择和挑战的贡献,这些目标已经实现。征集活动吸引了来自亚洲、欧洲、大洋洲和北美洲的15份意见书。方案委员会除了口头提出的基线文件外,还接受了9份文件。本次挑战赛共有13个团队提交了不少于48个结果!我们希望这些研究成果能够为视听生理情感识别和分析领域的研究人员和开发人员提供有价值的参考。我们也鼓励与会者参加主题演讲。这个有价值和深刻见解的演讲可以并将引导我们更好地了解该领域的现状和未来的方向:AVEC'15主题演讲-从面部表情分析到多模态情绪分析,Roland Goecke博士(澳大利亚堪培拉大学)
{"title":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","authors":"M. Valstar, J. Gratch, Björn Schuller, F. Ringeval, R. Cowie, M. Pantic","doi":"10.1145/2988257","DOIUrl":"https://doi.org/10.1145/2988257","url":null,"abstract":"It is our great pleasure to welcome you to the 5th Audio-Visual Emotion recognition Challenge (AVEC 2015), held in conjunction with the ACM Multimedia 2015. This year's challenge and associated workshop continues to push the boundaries of audio-visual emotion recognition. The first AVEC challenge posed the problem of detecting discrete emotion classes on an extremely large set of natural behaviour data. The second AVEC extended this problem to the prediction of continuous valued dimensional affect on the same set of challenging data. In its third edition, we enlarged the problem even further to include the prediction of self-reported severity of depression. The fourth edition of AVEC focused on the study of depression and affect by narrowing down the number of tasks to be used, and enriching the annotation. Finally, this year we've focused the study of affect by including physiology, along with audio-visual data, in the dataset, making the very first emotion recognition challenge that bridges across audio, video and physiological data. \u0000 \u0000The mission of AVEC challenge and workshop series is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio, video and -- for the first time ever -- physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data. As you will see, these goals have been reached with the selection of this year's data and the challenge contributions. \u0000 \u0000The call for participation attracted 15 submissions from Asia, Europe, Oceania and North America. The programme committee accepted 9 papers in addition to the baseline paper for oral presentation. For the challenge, no less than 48 results submissions were made by 13 teams! We hope that these proceedings will serve as a valuable reference for researchers and developers in the area of audio-visual-physiological emotion recognition and analysis. \u0000 \u0000We also encourage attendees to attend the keynote presentation. This valuable and insightful talk can and will guide us to a better understanding of the state of the field, and future direction: \u0000AVEC'15 Keynote Talk -- From Facial Expression Analysis to Multimodal Mood Analysis, Pr. Roland Goecke (University of Canberra, Australia)","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128268105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1