首页 > 最新文献

AVEC '14最新文献

英文 中文
Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video 视频中多维情感识别的多尺度时间建模
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661811
Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen
Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained
理解人机交互中的非语言行为是一项复杂而具有挑战性的任务。其中一个关键方面是准确识别人类的情绪状态。本文介绍了我们对视听情绪挑战(AVEC'14)的努力,其目标是预测情绪维度在每个时刻的唤醒,效价和优势的连续值。该方法利用基于深度信念网络的模型从音频和视觉模态中识别情绪状态。首先,我们利用深度神经网络中的时间池函数对特征中的动态信息进行编码,实现了第一次时间尺度的时间建模;其次,我们将不同模态的预测结果与情绪时间语境信息同时结合起来。所提出的多模态-时间融合实现了第二时间尺度下情绪状态的时间建模。实验结果表明,所提方法在各个关键点上都是有效的,取得了较好的效果
{"title":"Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1145/2661806.2661811","DOIUrl":"https://doi.org/10.1145/2661806.2661811","url":null,"abstract":"Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression 基于面部表情动态和回归的抑郁量表自动预测
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661812
Asim Jan, H. Meng, Y. F. A. Gaus, Fan Zhang, Saeed Turabzadeh
Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. In such a low mood, both the facial expression and voice appear different from the ones in normal states. In this paper, an automatic system is proposed to predict the scales of Beck Depression Inventory from naturalistic facial expression of the patients with depression. Firstly, features are extracted from corresponding video and audio signals to represent characteristics of facial and vocal expression under depression. Secondly, dynamic features generation method is proposed in the extracted video feature space based on the idea of Motion History Histogram (MHH) for 2-D video motion extraction. Thirdly, Partial Least Squares (PLS) and Linear regression are applied to learn the relationship between the dynamic features and depression scales using training data, and then to predict the depression scale for unseen ones. Finally, decision level fusion was done for combining predictions from both video and audio modalities. The proposed approach is evaluated on the AVEC2014 dataset and the experimental results demonstrate its effectiveness.
抑郁症是一种情绪低落、厌恶活动的状态,会影响一个人的思想、行为、感觉和幸福感。在这种情绪低落的情况下,面部表情和声音都与正常状态有所不同。本文提出了一种基于抑郁症患者自然面部表情的贝克抑郁量表自动预测系统。首先,从相应的视频和音频信号中提取特征,代表抑郁状态下的面部和声音表情特征;其次,基于二维视频运动提取的运动历史直方图(MHH)思想,在提取的视频特征空间中提出动态特征生成方法;第三,运用偏最小二乘和线性回归方法,利用训练数据学习动态特征与抑郁量表之间的关系,对未见特征进行抑郁量表预测。最后,进行决策级融合,以结合视频和音频模式的预测。在AVEC2014数据集上对该方法进行了验证,实验结果证明了该方法的有效性。
{"title":"Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression","authors":"Asim Jan, H. Meng, Y. F. A. Gaus, Fan Zhang, Saeed Turabzadeh","doi":"10.1145/2661806.2661812","DOIUrl":"https://doi.org/10.1145/2661806.2661812","url":null,"abstract":"Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. In such a low mood, both the facial expression and voice appear different from the ones in normal states. In this paper, an automatic system is proposed to predict the scales of Beck Depression Inventory from naturalistic facial expression of the patients with depression. Firstly, features are extracted from corresponding video and audio signals to represent characteristics of facial and vocal expression under depression. Secondly, dynamic features generation method is proposed in the extracted video feature space based on the idea of Motion History Histogram (MHH) for 2-D video motion extraction. Thirdly, Partial Least Squares (PLS) and Linear regression are applied to learn the relationship between the dynamic features and depression scales using training data, and then to predict the depression scale for unseen ones. Finally, decision level fusion was done for combining predictions from both video and audio modalities. The proposed approach is evaluated on the AVEC2014 dataset and the experimental results demonstrate its effectiveness.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116909684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions 人机交互中情感维度与抑郁的多模态预测
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661810
Rahul Gupta, Nikos Malandrakis, Bo Xiao, T. Guha, Maarten Van Segbroeck, M. Black, A. Potamianos, Shrikanth S. Narayanan
Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.
抑郁症是最常见的情绪障碍之一。通过强大的建模和跟踪与抑郁症相关的复杂行为线索(例如,言语、语言、面部表情、头部运动、肢体语言),技术有可能帮助筛查和治疗抑郁症患者。同样,稳健的情感识别是另一个挑战,它将从建模这些线索中受益。音频/视觉情感挑战(AVEC)旨在理解这两种现象,并通过几种方式建立它们与可观察线索的相关性。在本文中,我们使用多模态信号处理方法来解决这两个问题,使用来自人机交互的数据。我们开发了预测抑郁水平和情感维度的独立系统,试验了几种方法来组合多模态信息。提出的抑郁预测系统使用基于音频、视觉和语言线索的特征选择方法来预测每个会话的抑郁评分。同样,我们使用多个系统在音频和视觉线索上进行训练,以预测连续时间的情感维度。我们的情感识别系统在逐帧推理过程中考虑上下文,并对来自视听系统的结果进行线性融合。对于这两个问题,我们提出的系统都优于基于视频特征的基线系统。作为这项工作的一部分,我们分析了每种模态在预测目标变量中所起的作用,并提供了分析见解。
{"title":"Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions","authors":"Rahul Gupta, Nikos Malandrakis, Bo Xiao, T. Guha, Maarten Van Segbroeck, M. Black, A. Potamianos, Shrikanth S. Narayanan","doi":"10.1145/2661806.2661810","DOIUrl":"https://doi.org/10.1145/2661806.2661810","url":null,"abstract":"Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114673047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Building a Database of Political Speech: Does Culture Matter in Charisma Annotations? 政治演讲数据库的构建:文化在魅力注释中起作用吗?
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661808
Ailbhe Cullen, Andrew Hines, N. Harte
For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.
对于政治家个人和政党而言,互联网已成为自我宣传和思想传播的重要工具。流媒体的兴起使政治辩论和演讲能够接触到全球观众。在本文中,我们探讨了政治演讲中的魅力的本质,以期自动检测。为此,我们从YouTube和其他在线资源中收集了一个新的政治演讲数据库。注释由本地侦听器和Amazon Mechanical Turk (AMT)工作人员执行。详细分析表明,两种标签集具有相同的可靠性。研究结果支持对演讲者的特征(如政治演讲中的魅力)使用众包标签,即使在存在文化微妙因素的情况下也是如此。本文还研究了这些不同的注释对政治演讲魅力预测的影响。
{"title":"Building a Database of Political Speech: Does Culture Matter in Charisma Annotations?","authors":"Ailbhe Cullen, Andrew Hines, N. Harte","doi":"10.1145/2661806.2661808","DOIUrl":"https://doi.org/10.1145/2661806.2661808","url":null,"abstract":"For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132840347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge AVEC 2014: 3D维度影响和抑郁症识别挑战
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661807
M. Valstar, Björn Schuller, Kirsty Smith, Timur R. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic
Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In addition to structured self-report questionnaires, psychologists and psychiatrists use in their evaluation of a patient's level of depression the observation of facial expressions and vocal cues. It is in this context that we present the fourth Audio-Visual Emotion recognition Challenge (AVEC 2014). This edition of the challenge uses a subset of the tasks used in a previous challenge, allowing for more focussed studies. In addition, labels for a third dimension (Dominance) have been added and the number of annotators per clip has been increased to a minimum of three, with most clips annotated by 5. The challenge has two goals logically organised as sub-challenges: the first is to predict the continuous values of the affective dimensions valence, arousal and dominance at each moment in time. The second is to predict the value of a single self-reported severity of depression indicator for each recording in the dataset. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
情绪障碍本质上与情绪有关。特别是,患有情绪障碍(如单极抑郁症)的人的行为与情感维度效价、觉醒和支配有很强的时间相关性。除了结构化的自我报告问卷外,心理学家和精神病学家还通过观察患者的面部表情和声音线索来评估患者的抑郁程度。正是在这种背景下,我们提出了第四届视听情感识别挑战赛(AVEC 2014)。这个版本的挑战使用了以前挑战中使用的任务的子集,允许更集中的研究。此外,还添加了第三维度(Dominance)的标签,每个剪辑的注释者数量增加到至少3个,大多数剪辑的注释者为5个。这个挑战有两个目标,逻辑上组织为子挑战:第一个是预测情感维度在每个时刻的连续值,价,唤醒和支配。第二步是预测数据集中每个记录的单个自我报告的抑郁严重程度指标的值。本文介绍了挑战指南、使用的常用数据以及基线系统在这两个任务上的性能。
{"title":"AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge","authors":"M. Valstar, Björn Schuller, Kirsty Smith, Timur R. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic","doi":"10.1145/2661806.2661807","DOIUrl":"https://doi.org/10.1145/2661806.2661807","url":null,"abstract":"Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In addition to structured self-report questionnaires, psychologists and psychiatrists use in their evaluation of a patient's level of depression the observation of facial expressions and vocal cues. It is in this context that we present the fourth Audio-Visual Emotion recognition Challenge (AVEC 2014). This edition of the challenge uses a subset of the tasks used in a previous challenge, allowing for more focussed studies. In addition, labels for a third dimension (Dominance) have been added and the number of annotators per clip has been increased to a minimum of three, with most clips annotated by 5. The challenge has two goals logically organised as sub-challenges: the first is to predict the continuous values of the affective dimensions valence, arousal and dominance at each moment in time. The second is to predict the value of a single self-reported severity of depression indicator for each recording in the dataset. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128667795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 354
The SRI AVEC-2014 Evaluation System SRI AVEC-2014评估体系
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661818
V. Mitra, Elizabeth Shriberg, Mitchell McLaren, A. Kathol, Colleen Richey, D. Vergyri, M. Graciarena
Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.
虽然抑郁症是一种常见的心理健康问题,对人类社会产生了重大影响,但它往往不被发现。我们探索了一组不同的特征,仅基于语音,以了解哪些特征与贝克抑郁评定量表中自我报告的抑郁分数相关。这些特征,其中许多对于这项任务来说是新颖的,包括(1)语音产生过程中估计的发音轨迹,(2)声学特征,(3)声学-语音特征和(4)韵律特征。使用各种方法对特征进行建模,包括支持向量回归、高斯后端和决策树。我们报告了AVEC-2014萧条数据集的结果,发现单个系统的均方根误差(RMSE)范围为9.18至11.87,平均绝对误差(MAE)范围为7.68至9.99。初始融合带来进一步的改进;融合和特征选择工作仍在进行中。
{"title":"The SRI AVEC-2014 Evaluation System","authors":"V. Mitra, Elizabeth Shriberg, Mitchell McLaren, A. Kathol, Colleen Richey, D. Vergyri, M. Graciarena","doi":"10.1145/2661806.2661818","DOIUrl":"https://doi.org/10.1145/2661806.2661818","url":null,"abstract":"Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Model Fusion for Multimodal Depression Classification and Level Detection 多模态抑郁分类与水平检测的模型融合
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661819
Mohammed Senoussaoui, Milton Orlando Sarria Paja, J. F. Santos, T. Falk
Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patient's level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.
视听情感和情绪障碍线索最近被用于开发工具,以帮助心理学家和精神科医生评估患者的抑郁程度。在本文中,我们在AVEC14挑战的背景下,使用模型融合方法提出了许多不同的多模态抑郁水平预测因子。我们证明了基于i向量的短期音频特征表示包含了抑郁症分类和预测的有用信息。我们还在回归之前采用了分类步骤,以允许根据抑郁的存在或不存在不同的回归模型。我们的实验表明,我们的基于音频的模型和另外两个基于lgbt - top视频特征的模型相结合,比挑战赛组织者提出的基线模型提高了4%。
{"title":"Model Fusion for Multimodal Depression Classification and Level Detection","authors":"Mohammed Senoussaoui, Milton Orlando Sarria Paja, J. F. Santos, T. Falk","doi":"10.1145/2661806.2661819","DOIUrl":"https://doi.org/10.1145/2661806.2661819","url":null,"abstract":"Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patient's level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125387372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge 从分割视频中融合情感维度和视听特征用于抑郁症识别:INAOE-BUAP参加AVEC'14挑战赛
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661815
Humberto Pérez Espinosa, H. Escalante, Luis Villaseñor-Pineda, M. Montes-y-Gómez, David Pinto, Verónica Reyes-Meza
Depression is a disease that affects a considerable portion of the world population. Severe cases of depression interfere with the common live of patients, for those patients a strict monitoring is necessary in order to control the progress of the disease and to prevent undesired side effects. A way to keep track of patients with depression is by means of online monitoring via human-computer-interaction. The AVEC'14 challenge aims at developing technology towards the online monitoring of depression patients. This paper describes an approach to depression recognition from audiovisual information in the context of the AVEC'14 challenge. The proposed method relies on an effective voice segmentation procedure, followed by segment-level feature extraction and aggregation. Finally, a meta-model is trained to fuse mono-modal information. The main novel features of our proposal are that (1) we use affective dimensions for building depression recognition models; (2) we extract visual information from voice and silence segments separately; (3) we consolidate features and use a meta-model for fusion. The proposed methodology is evaluated, experimental results reveal the method is competitive.
抑郁症是一种影响世界上相当一部分人口的疾病。严重的抑郁症干扰了患者的日常生活,对这些患者进行严格的监测是必要的,以控制病情的进展,防止不良的副作用。跟踪抑郁症患者的一种方法是通过人机交互进行在线监测。AVEC第14届挑战赛旨在开发在线监测抑郁症患者的技术。本文描述了在AVEC'14挑战的背景下,从视听信息中识别抑郁症的方法。该方法依赖于一个有效的语音分割过程,然后是段级特征提取和聚合。最后,训练元模型来融合单模态信息。本文的主要新颖之处在于:(1)我们使用情感维度来构建抑郁症识别模型;(2)分别从语音段和沉默段提取视觉信息;(3)整合特征并使用元模型进行融合。对该方法进行了评价,实验结果表明该方法具有一定的竞争力。
{"title":"Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge","authors":"Humberto Pérez Espinosa, H. Escalante, Luis Villaseñor-Pineda, M. Montes-y-Gómez, David Pinto, Verónica Reyes-Meza","doi":"10.1145/2661806.2661815","DOIUrl":"https://doi.org/10.1145/2661806.2661815","url":null,"abstract":"Depression is a disease that affects a considerable portion of the world population. Severe cases of depression interfere with the common live of patients, for those patients a strict monitoring is necessary in order to control the progress of the disease and to prevent undesired side effects. A way to keep track of patients with depression is by means of online monitoring via human-computer-interaction. The AVEC'14 challenge aims at developing technology towards the online monitoring of depression patients. This paper describes an approach to depression recognition from audiovisual information in the context of the AVEC'14 challenge. The proposed method relies on an effective voice segmentation procedure, followed by segment-level feature extraction and aggregation. Finally, a meta-model is trained to fuse mono-modal information. The main novel features of our proposal are that (1) we use affective dimensions for building depression recognition models; (2) we extract visual information from voice and silence segments separately; (3) we consolidate features and use a meta-model for fusion. The proposed methodology is evaluated, experimental results reveal the method is competitive.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"802 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124173358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing 基于运动不协调和时间的抑郁的声音和面部生物标志物
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661809
J. Williamson, T. Quatieri, Brian S. Helfer, G. Ciccarelli, D. Mehta
In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and video-based facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.
在重度抑郁症患者中,神经生理变化经常改变运动控制,从而影响控制语言产生和面部表情的机制。这些变化通常与精神运动迟缓有关,这是一种以神经运动输出减慢为特征的情况,其行为表现为多种运动特性的协调和时间改变。运动输出的变化可以从个人说话时的声音声学和面部运动推断出来。我们从第四届国际音频/视频情感挑战赛(AVEC)提供的录音中基于音频的声音特征和基于视频的面部动作单元中获得了新的多尺度相关结构和时间特征集。这些功能集可以检测出可能是抑郁症症状的声音和面部手势的协调、运动和时间变化。结合高斯混合模型的互补特征和极限学习机分类器,我们的多元回归方案在AVEC测试集上预测Beck抑郁量表评分,均方根误差为8.12,平均绝对误差为6.31。未来的工作需要继续研究基于音频和视频模式的协调和时间改变来检测神经系统疾病。
{"title":"Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing","authors":"J. Williamson, T. Quatieri, Brian S. Helfer, G. Ciccarelli, D. Mehta","doi":"10.1145/2661806.2661809","DOIUrl":"https://doi.org/10.1145/2661806.2661809","url":null,"abstract":"In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and video-based facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126386833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
Inferring Depression and Affect from Application Dependent Meta Knowledge 从应用依赖元知识推断抑郁和影响
Pub Date : 2014-11-07 DOI: 10.1145/2661806.2661813
Markus Kächele, Martin Schels, F. Schwenker
This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather than relying on statistical features that are normally extracted from the raw audio-visual data we propose an approach based on abstract meta information about individual subjects and also prototypical task and label dependent templates to infer the respective emotional states. The results of the approach that were submitted to both parts of the challenge significantly outperformed the baseline approaches. Further, we elaborate on several issues about the labeling of affective corpora and the choice of appropriate performance measures.
本文概述了我们对2014年AVEC竞赛的贡献。它包括对持续情感识别子挑战和抑郁识别子挑战的分类结果和考虑。与其依赖通常从原始视听数据中提取的统计特征,我们提出了一种基于关于个体受试者的抽象元信息以及原型任务和标签依赖模板的方法来推断各自的情绪状态。提交给挑战的两个部分的方法的结果明显优于基线方法。此外,我们详细阐述了关于情感语料库的标记和选择适当的性能指标的几个问题。
{"title":"Inferring Depression and Affect from Application Dependent Meta Knowledge","authors":"Markus Kächele, Martin Schels, F. Schwenker","doi":"10.1145/2661806.2661813","DOIUrl":"https://doi.org/10.1145/2661806.2661813","url":null,"abstract":"This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather than relying on statistical features that are normally extracted from the raw audio-visual data we propose an approach based on abstract meta information about individual subjects and also prototypical task and label dependent templates to infer the respective emotional states. The results of the approach that were submitted to both parts of the challenge significantly outperformed the baseline approaches. Further, we elaborate on several issues about the labeling of affective corpora and the choice of appropriate performance measures.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126062191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
期刊
AVEC '14
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1