首页 > 最新文献

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge最新文献

英文 中文
Session details: Emotion Detection 会话细节:情绪检测
Yorgos Tzimiropoulos
{"title":"Session details: Emotion Detection","authors":"Yorgos Tzimiropoulos","doi":"10.1145/3255912","DOIUrl":"https://doi.org/10.1145/3255912","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123592661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Sub-Challenge Winners 会议详情:子挑战优胜者
M. Valstar
{"title":"Session details: Sub-Challenge Winners","authors":"M. Valstar","doi":"10.1145/3255913","DOIUrl":"https://doi.org/10.1145/3255913","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128825675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features 基于语音和面部地标特征的多模态多分辨率凹陷检测
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988261
Md. Nasir, Arindam Jati, P. G. Shivakumar, Sandeep Nallan Chakravarthula, P. Georgiou
Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.
利用视听线索对抑郁症进行自动分类有助于对其进行客观诊断。在本文中,我们提出了一个多模态抑郁症分类系统,作为2016年视听情感挑战和研讨会(AVEC2016)的一部分。我们研究了一些音频和视频的特征分类与不同的融合技术和时间背景。在音频模态中,Teager能量倒谱系数~(TECC)优于标准基线特征;而基于MFCC特征的i向量建模可以达到最好的精度。另一方面,面部标志特征的多项式参数化在所有系统中取得了最好的性能,并且优于最佳基线系统。
{"title":"Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features","authors":"Md. Nasir, Arindam Jati, P. G. Shivakumar, Sandeep Nallan Chakravarthula, P. Georgiou","doi":"10.1145/2988257.2988261","DOIUrl":"https://doi.org/10.1145/2988257.2988261","url":null,"abstract":"Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131524716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016 AVEC 2016中OA RVM的阶梯回归、数据选择与性别依赖
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988265
Zhaocheng Huang, Brian Stasak, T. Dang, Kalani Wataraka Gamage, P. Le, V. Sethu, J. Epps
Within the field of affective computing, human emotion and disorder/disease recognition have progressively attracted more interest in multimodal analysis. This submission to the Depression Classification and Continuous Emotion Prediction challenges for AVEC2016 investigates both, with a focus on audio subsystems. For depression classification, we investigate token word selection, vocal tract coordination parameters computed from spectral centroid features, and gender-dependent classification systems. Token word selection performed very well on the development set. For emotion prediction, we investigate emotionally salient data selection based on emotion change, an output-associative regression approach based on the probabilistic outputs of relevance vector machine classifiers operating on low-high class pairs (OA RVM-SR), and gender-dependent systems. Experimental results from both the development and test sets show that the RVM-SR method under the OA framework can improve on OA RVM, which performed very well in the AV+EC2015 challenge.
在情感计算领域,人类情感和障碍/疾病识别逐渐引起了对多模态分析的更多兴趣。这份提交给AVEC2016抑郁症分类和持续情绪预测挑战的报告对两者进行了研究,重点是音频子系统。对于抑郁症分类,我们研究了标记词选择、从谱质心特征计算的声道协调参数和性别依赖的分类系统。令牌词选择在开发集上表现得非常好。在情绪预测方面,我们研究了基于情绪变化的情绪显著性数据选择、基于在高低等级对(OA RVM-SR)上运行的相关向量机分类器的概率输出的输出关联回归方法以及性别依赖系统。开发集和测试集的实验结果表明,OA框架下的RVM- sr方法可以改进OA RVM,在AV+EC2015挑战中表现良好。
{"title":"Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016","authors":"Zhaocheng Huang, Brian Stasak, T. Dang, Kalani Wataraka Gamage, P. Le, V. Sethu, J. Epps","doi":"10.1145/2988257.2988265","DOIUrl":"https://doi.org/10.1145/2988257.2988265","url":null,"abstract":"Within the field of affective computing, human emotion and disorder/disease recognition have progressively attracted more interest in multimodal analysis. This submission to the Depression Classification and Continuous Emotion Prediction challenges for AVEC2016 investigates both, with a focus on audio subsystems. For depression classification, we investigate token word selection, vocal tract coordination parameters computed from spectral centroid features, and gender-dependent classification systems. Token word selection performed very well on the development set. For emotion prediction, we investigate emotionally salient data selection based on emotion change, an output-associative regression approach based on the probabilistic outputs of relevance vector machine classifiers operating on low-high class pairs (OA RVM-SR), and gender-dependent systems. Experimental results from both the development and test sets show that the RVM-SR method under the OA framework can improve on OA RVM, which performed very well in the AV+EC2015 challenge.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Exploring Multimodal Visual Features for Continuous Affect Recognition 探索用于连续情感识别的多模态视觉特征
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988270
Bo Sun, Siming Cao, Liandong Li, Jun He, Lejun Yu
This paper presents our work in the Emotion Sub-Challenge of the 6th Audio/Visual Emotion Challenge and Workshop (AVEC 2016), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). As visual features are very important in emotion recognition, we try a variety of handcrafted and deep visual features. For each video clip, besides the baseline features, we extract multi-scale Dense SIFT features (MSDF), and some types of Convolutional neural networks (CNNs) features to recognize the expression phases of the current frame. We train linear Support Vector Regression (SVR) for every kind of features on the RECOLA dataset. Multimodal fusion of these modalities is then performed with a multiple linear regression model. The final Concordance Correlation Coefficient (CCC) we gained on the development set are 0.824 for arousal, and 0.718 for valence; and on the test set are 0.683 for arousal and 0.642 for valence.
本文介绍了我们在第六届视听情感挑战与研讨会(AVEC 2016)的情感子挑战中的工作,其目标是探索利用音频,视觉和生理信号来持续预测情感维度(唤醒和效价)的值。由于视觉特征在情感识别中非常重要,我们尝试了各种手工制作的深度视觉特征。对于每个视频片段,除了提取基线特征外,我们还提取了多尺度密集SIFT特征(MSDF)和一些类型的卷积神经网络(cnn)特征来识别当前帧的表达阶段。我们对RECOLA数据集上的每一类特征训练线性支持向量回归(SVR)。然后用多元线性回归模型对这些模态进行多模态融合。发展集的最终一致性相关系数(CCC)为唤醒集0.824,效价集0.718;在测试集上,唤醒是0.683效价是0.642。
{"title":"Exploring Multimodal Visual Features for Continuous Affect Recognition","authors":"Bo Sun, Siming Cao, Liandong Li, Jun He, Lejun Yu","doi":"10.1145/2988257.2988270","DOIUrl":"https://doi.org/10.1145/2988257.2988270","url":null,"abstract":"This paper presents our work in the Emotion Sub-Challenge of the 6th Audio/Visual Emotion Challenge and Workshop (AVEC 2016), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). As visual features are very important in emotion recognition, we try a variety of handcrafted and deep visual features. For each video clip, besides the baseline features, we extract multi-scale Dense SIFT features (MSDF), and some types of Convolutional neural networks (CNNs) features to recognize the expression phases of the current frame. We train linear Support Vector Regression (SVR) for every kind of features on the RECOLA dataset. Multimodal fusion of these modalities is then performed with a multiple linear regression model. The final Concordance Correlation Coefficient (CCC) we gained on the development set are 0.824 for arousal, and 0.718 for valence; and on the test set are 0.683 for arousal and 0.642 for valence.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Session details: Depression recognition 会议细节:抑郁症识别
H. Gunes
{"title":"Session details: Depression recognition","authors":"H. Gunes","doi":"10.1145/3255911","DOIUrl":"https://doi.org/10.1145/3255911","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116481053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Multimodal Human Affect Estimation using Echo State Networks 基于回声状态网络的连续多模态人类影响估计
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988260
Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, F. Schwenker
A continuous multimodal human affect recognition for both arousal and valence dimensions in a non-acted spontaneous scenario is investigated in this paper. Different regression models based on Random Forests and Echo State Networks are evaluated and compared in terms of robustness and accuracy. Moreover, an extension of Echo State Networks to a bi-directional model is introduced to improve the regression accuracy. A hybrid method using Random Forests, Echo State Networks and linear regression fusion is developed and applied on the test subset of the AVEC16 challenge. Finally, the label shift and prediction delay is discussed and an annotator specific regression model, as well as fusion architecture, is proposed for future work.
本文研究了非行为自发情景下的连续多模态情感识别,包括唤醒维度和效价维度。对基于随机森林和回声状态网络的不同回归模型的鲁棒性和准确性进行了评价和比较。此外,将回声状态网络扩展为双向模型,提高了回归精度。提出了一种基于随机森林、回声状态网络和线性回归融合的混合方法,并将其应用于AVEC16挑战的测试子集。最后,讨论了标签漂移和预测延迟,并提出了针对标注器的回归模型和融合体系结构。
{"title":"Continuous Multimodal Human Affect Estimation using Echo State Networks","authors":"Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, F. Schwenker","doi":"10.1145/2988257.2988260","DOIUrl":"https://doi.org/10.1145/2988257.2988260","url":null,"abstract":"A continuous multimodal human affect recognition for both arousal and valence dimensions in a non-acted spontaneous scenario is investigated in this paper. Different regression models based on Random Forests and Echo State Networks are evaluated and compared in terms of robustness and accuracy. Moreover, an extension of Echo State Networks to a bi-directional model is introduced to improve the regression accuracy. A hybrid method using Random Forests, Echo State Networks and linear regression fusion is developed and applied on the test subset of the AVEC16 challenge. Finally, the label shift and prediction delay is discussed and an annotator specific regression model, as well as fusion architecture, is proposed for future work.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116785159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Multimodal Analysis of Impressions and Personality in Human-Computer and Human-Robot Interactions 人机交互中印象和个性的多模态分析
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988271
H. Gunes
This talk will focus on automatic prediction of impressions and inferences about traits and characteristics of people based on their multimodal observable behaviours in the context of human-virtual character and human-robot interactions. The first part of the talk will introduce and describe the creation and evaluation of the MAPTRAITS system that enables on-the-fly prediction of the widely used Big Five personality dimensions (i.e., agreeableness, openness, neuroticism, conscientiousness and extroversion) from a third-vision perspective. A novel approach for sensing and interpreting personality is through a wearable camera that provides a first-person vision (FPV) perspective and therefore enables the acquisition of information about the users' true behaviours and intentions. Accordingly, the second part of the talk will introduce computational analysis of personality traits and interaction experience through first-person vision features in a human-robot interaction context. The perception of personality is also crucial when the interaction takes place over distance. Tele-operated robot avatars, in which an operator's behaviours are portrayed by a robot proxy, have the potential to improve interactions over distance by transforming the perception of physical and social presence, and trust. However, having communication mediated by a robot changes the perception of the operator's appearance, behaviour and personality. The third and last part of the talk will therefore present a study on how robot mediation affects the way the personality of the operator is perceived, analysed and classified, and will discuss the implications our research findings have for autonomous and tele-operated robot design.
本次演讲将重点关注基于人的多模态可观察行为在人-虚拟角色和人机交互背景下对人的特征和特征的印象和推断的自动预测。演讲的第一部分将介绍和描述MAPTRAITS系统的创建和评估,该系统能够从第三视角实时预测广泛使用的五大人格维度(即宜人性、开放性、神经质、尽责性和外向性)。一种传感和解读个性的新方法是通过可穿戴相机提供第一人称视角(FPV),从而能够获取有关用户真实行为和意图的信息。因此,演讲的第二部分将介绍在人机交互环境中通过第一人称视觉特征对人格特征和交互体验的计算分析。当互动发生在远距离时,对个性的感知也很重要。远程操作的机器人化身,其中操作员的行为是由机器人代理描绘的,有可能通过改变对物理和社会存在的感知以及信任来改善远距离交互。然而,通过机器人进行沟通会改变人们对操作者外表、行为和个性的看法。因此,演讲的第三部分也是最后一部分将介绍机器人调解如何影响操作员的个性被感知、分析和分类的方式,并将讨论我们的研究结果对自主和远程操作机器人设计的影响。
{"title":"Multimodal Analysis of Impressions and Personality in Human-Computer and Human-Robot Interactions","authors":"H. Gunes","doi":"10.1145/2988257.2988271","DOIUrl":"https://doi.org/10.1145/2988257.2988271","url":null,"abstract":"This talk will focus on automatic prediction of impressions and inferences about traits and characteristics of people based on their multimodal observable behaviours in the context of human-virtual character and human-robot interactions. The first part of the talk will introduce and describe the creation and evaluation of the MAPTRAITS system that enables on-the-fly prediction of the widely used Big Five personality dimensions (i.e., agreeableness, openness, neuroticism, conscientiousness and extroversion) from a third-vision perspective. A novel approach for sensing and interpreting personality is through a wearable camera that provides a first-person vision (FPV) perspective and therefore enables the acquisition of information about the users' true behaviours and intentions. Accordingly, the second part of the talk will introduce computational analysis of personality traits and interaction experience through first-person vision features in a human-robot interaction context. The perception of personality is also crucial when the interaction takes place over distance. Tele-operated robot avatars, in which an operator's behaviours are portrayed by a robot proxy, have the potential to improve interactions over distance by transforming the perception of physical and social presence, and trust. However, having communication mediated by a robot changes the perception of the operator's appearance, behaviour and personality. The third and last part of the talk will therefore present a study on how robot mediation affects the way the personality of the operator is perceived, analysed and classified, and will discuss the implications our research findings have for autonomous and tele-operated robot design.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127787140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Depression using Vocal, Facial and Semantic Communication Cues 利用声音、面部和语义交流线索检测抑郁
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988263
J. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie K. Dagli, T. Quatieri
Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically-motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
已知重度抑郁症(MDD)会导致神经生理和神经认知改变,影响运动、语言和认知功能的控制。MDD对这些过程的影响通过耦合机制反映在个人的交流中:声音清晰度,面部手势和对话中传达内容的选择。特别是,mdd引起的神经生理变化与言语和面部运动控制的动力学和协调性下降有关,而神经认知变化影响对话语义。在本文中,生物标志物来源于所有这些模式,首先从先前开发的神经生理学动机的语言和面部协调和时间特征中提取。此外,一个新的指标,下声道收缩的发音纳入,涉及到声音投射。使用稀疏编码的词汇嵌入空间分析主题/化身对话内容的语义特征,以及与主题当前或过去抑郁状态相关的上下文线索。特征和抑郁分类系统是为第六届国际音频/视频情感挑战赛(AVEC)开发的,它提供的数据包括音频、基于视频的面部动作单元,以及与人类控制的虚拟形象交流的个人的转录文本。临床患者健康问卷(PHQ)评分和二元抑郁判定提供给每个参与者。PHQ预测是通过融合每个特征集的高斯阶梯回归量的输出来获得的,发展集的结果均值F1=0.81, RMSE=5.31, MAE=3.34。这些结果与挑战基线发展结果(平均F1=0.73, RMSE=6.62, MAE=5.52)比较有利。在测试集评估中,我们的系统获得了一个平均F1=0.70,这与挑战基线测试结果相似。未来的工作需要考虑跨模式的联合特征分析,以检测基于运动、语言、情感和认知交流成分相互作用的神经系统疾病。
{"title":"Detecting Depression using Vocal, Facial and Semantic Communication Cues","authors":"J. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie K. Dagli, T. Quatieri","doi":"10.1145/2988257.2988263","DOIUrl":"https://doi.org/10.1145/2988257.2988263","url":null,"abstract":"Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically-motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"501 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114524603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 136
DepAudioNet: An Efficient Deep Model for Audio based Depression Classification DepAudioNet:一种高效的基于音频压抑分类的深度模型
Pub Date : 2016-10-16 DOI: 10.1145/2988257.2988267
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, Yunhong Wang
This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.
提出了一种新颖有效的基于音频的抑郁症分类方法。它侧重于两个重要问题,emph{即}数据表示和样本失衡,这在文献中没有很好地解决。对于前者,与传统的浅层手工特征相比,我们提出了一种深度模型DepAudioNet,将卷积神经网络(CNN)和长短期记忆(LSTM)相结合,对声音通道中与抑郁相关的特征进行编码,以提供更全面的音频表示。对于后者,我们在模型训练阶段引入随机抽样策略,平衡正负样本,很大程度上缓解了样本分布不均匀造成的偏差。在2016视听情感挑战赛(AVEC)抑郁分类子挑战(DCC)的DAIC-WOZ数据集上进行了评估,实验结果清楚地证明了所提出方法的有效性。
{"title":"DepAudioNet: An Efficient Deep Model for Audio based Depression Classification","authors":"Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, Yunhong Wang","doi":"10.1145/2988257.2988267","DOIUrl":"https://doi.org/10.1145/2988257.2988267","url":null,"abstract":"This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130566884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
期刊
Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1