首页 > 最新文献

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)最新文献

英文 中文
Harmony search for feature selection in speech emotion recognition 和谐搜索在语音情感识别中的特征选择
Yongsen Tao, Kunxia Wang, Jing Yang, Ning An, Lian Li
Feature selection is a significant aspect of speech emotion recognition system. How to select a small subset out of the thousands of speech data is important for accurate classification of speech emotion. In this paper we investigate heuristic algorithm Harmony search (HS) for feature selection. We extract 3 feature sets, including MFCC, Fourier Parameters (FP), and features extracted with The Munich open Speech and Music Interpretation by Large Space Extraction (openSMILE) toolkit, from Berlin German emotion database (EMODB) and Chinese Elderly emotion database (EESDB). And combine MFCC with FP as the fourth feature set. We use Harmony search to select subsets and decrease the dimension space, and employ 10-fold cross validation in LIBSVM to evaluate the change of accuracy between selected subsets and original sets. Experimental results show that each subset's size reduced by about 50%, however, there is no sharp degeneration on accuracy and the accuracy almost maintains the original ones.
特征选择是语音情感识别系统的一个重要方面。如何从成千上万的语音数据中选择出一个小的子集,对于语音情感的准确分类是非常重要的。本文研究了启发式和谐搜索算法(HS)在特征选择中的应用。我们从柏林德国情感数据库(EMODB)和中国老年人情感数据库(EESDB)中提取了3个特征集,包括MFCC、傅里叶参数(FP)和慕尼黑开放语音和音乐大空间提取(openSMILE)工具包提取的特征集。并结合MFCC和FP作为第四个特性集。我们使用Harmony搜索来选择子集并减少维度空间,并在LIBSVM中使用10倍交叉验证来评估所选子集与原始集之间的准确率变化。实验结果表明,每个子集的大小减少了约50%,但精度没有明显下降,精度基本保持不变。
{"title":"Harmony search for feature selection in speech emotion recognition","authors":"Yongsen Tao, Kunxia Wang, Jing Yang, Ning An, Lian Li","doi":"10.1109/ACII.2015.7344596","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344596","url":null,"abstract":"Feature selection is a significant aspect of speech emotion recognition system. How to select a small subset out of the thousands of speech data is important for accurate classification of speech emotion. In this paper we investigate heuristic algorithm Harmony search (HS) for feature selection. We extract 3 feature sets, including MFCC, Fourier Parameters (FP), and features extracted with The Munich open Speech and Music Interpretation by Large Space Extraction (openSMILE) toolkit, from Berlin German emotion database (EMODB) and Chinese Elderly emotion database (EESDB). And combine MFCC with FP as the fourth feature set. We use Harmony search to select subsets and decrease the dimension space, and employ 10-fold cross validation in LIBSVM to evaluate the change of accuracy between selected subsets and original sets. Experimental results show that each subset's size reduced by about 50%, however, there is no sharp degeneration on accuracy and the accuracy almost maintains the original ones.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"42 1","pages":"362-367"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72636119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Emotion recognition in spontaneous and acted dialogues 自发和表演对话中的情绪识别
Leimin Tian, Johanna D. Moore, Catherine Lai
In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.
在这项工作中,我们比较了两种类型的语音:自发对话和表演对话的情感识别。在AVEC2012自发对话数据库和IEMOCAP动作对话数据库上进行了实验。我们研究了两种类型的声学特征在情绪识别中的表现:知识启发的不流利和非语言发声(DIS-NV)特征,以及基于统计低水平描述符(LLD)的特征。利用每个情感数据库的特征集构建支持向量机(SVM)和长短期记忆递归神经网络(LSTM-RNN)。我们的工作旨在识别约束模型和特征有效性的数据方面。我们的研究结果表明,不同类型的特征和模型的性能受到对话类型和训练数据量的影响。由于DIS-NV在表演对话中的频率低于自发对话,因此DIS-NV特征在识别自发对话中的情绪时表现优于LLD特征,而在表演对话中表现不佳。在训练数据充足的情况下,LSTM-RNN模型的性能优于SVM模型,但LSTM-RNN模型结构复杂,在训练数据较少的情况下可能会限制其性能,并且存在过拟合的风险。此外,我们发现在单词水平上进行情感识别时,远距离上下文可能比在话语水平上进行情感识别更有用。
{"title":"Emotion recognition in spontaneous and acted dialogues","authors":"Leimin Tian, Johanna D. Moore, Catherine Lai","doi":"10.1109/ACII.2015.7344645","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344645","url":null,"abstract":"In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"25 1","pages":"698-704"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73834455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
PhysSigTK: Enabling engagement experiments with physiological signals for game design PhysSigTK:为游戏设计提供生理信号的粘性实验
Stefan Rank, Cathy Lu
We demonstrate PhysSigTK, a physiological signals toolkit for making low-cost hardware accessible in the Unity3D game development environment so that designers of affective games can experiment with how engagement can be captured in their games. Rather than proposing a context-free way of measuring engagement, we enable designers to test how affordable hardware could fit into the assessment of players' states and progress in their particular game using a range of tools.
我们展示了PhysSigTK,这是一种生理信号工具包,可以在Unity3D游戏开发环境中使用低成本硬件,以便情感游戏的设计师可以尝试如何在他们的游戏中捕获参与度。比起提出一种与情境无关的方法去衡量用户粘性,我们让设计师能够使用一系列工具去测试可负担的硬件是否适合于评估玩家在特定游戏中的状态和进程。
{"title":"PhysSigTK: Enabling engagement experiments with physiological signals for game design","authors":"Stefan Rank, Cathy Lu","doi":"10.1109/ACII.2015.7344692","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344692","url":null,"abstract":"We demonstrate PhysSigTK, a physiological signals toolkit for making low-cost hardware accessible in the Unity3D game development environment so that designers of affective games can experiment with how engagement can be captured in their games. Rather than proposing a context-free way of measuring engagement, we enable designers to test how affordable hardware could fit into the assessment of players' states and progress in their particular game using a range of tools.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"138 1","pages":"968-969"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73967226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Utilizing multimodal cues to automatically evaluate public speaking performance 利用多模态线索自动评估公众演讲表现
L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran
Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.
公共演讲是一种重要的口头交流方式,对学习和职业发展的成功都至关重要。然而,缺乏有效和经济地评估演讲者的语言和非语言行为的工具。自动化评分和多模态传感技术的最新进展可能会解决这个问题。我们报告了一项关于使用多模态线索开发公共演讲表演自动评分模型的研究。使用微软Kinect深度相机记录了包含14名受试者的56次演示的多模态演示语料库。按照教育评价标准进行任务设计、题型制定和人的评分。从头部姿势、眼神、面部表情、运动轨迹、语音信号和文本中提取了丰富的多模态特征。模型构建实验表明,同时使用词汇/语音和视觉特征可以获得更准确的评分,这表明在公共演讲技能评估中使用多模态技术是可行的。
{"title":"Utilizing multimodal cues to automatically evaluate public speaking performance","authors":"L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran","doi":"10.1109/ACII.2015.7344601","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344601","url":null,"abstract":"Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"110 1","pages":"394-400"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81753144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Learning speech emotion features by joint disentangling-discrimination 联合解缠-辨析学习语音情感特征
W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao
Speech plays an important part in human-computer interaction. As a major branch of speech processing, speech emotion recognition (SER) has drawn much attention of researchers. Excellent discriminant features are of great importance in SER. However, emotion-specific features are commonly mixed with some other features. In this paper, we introduce an approach to pull apart these two parts of features as much as possible. First we employ an unsupervised feature learning framework to achieve some rough features. Then these rough features are further fed into a semi-supervised feature learning framework. In this phase, efforts are made to disentangle the emotion-specific features and some other features by using a novel loss function, which combines reconstruction penalty, orthogonal penalty, discriminative penalty and verification penalty. Orthogonal penalty is utilized to disentangle emotion-specific features and other features. The discriminative penalty enlarges inter-emotion variations, while the verification penalty reduces the intra-emotion variations. Evaluations on the FAU Aibo emotion database show that our approach can improve the speech emotion classification performance.
语音在人机交互中起着重要的作用。语音情感识别作为语音处理的一个重要分支,受到了研究人员的广泛关注。良好的判别特征在SER中非常重要。然而,情感特征通常与其他特征混合在一起。在本文中,我们介绍了一种尽可能地将这两部分特征分离的方法。首先,我们采用无监督特征学习框架来获得一些粗略的特征。然后将这些粗糙特征进一步馈送到半监督特征学习框架中。在这一阶段,我们利用一种结合重构惩罚、正交惩罚、判别惩罚和验证惩罚的新型损失函数,将情感特征和其他特征分离开来。利用正交惩罚法将情绪特征与其他特征分离开来。辨别性惩罚放大了情绪间的变异,而验证性惩罚则减小了情绪内的变异。对FAU Aibo情感数据库的评估表明,我们的方法可以提高语音情感分类性能。
{"title":"Learning speech emotion features by joint disentangling-discrimination","authors":"W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao","doi":"10.1109/ACII.2015.7344598","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344598","url":null,"abstract":"Speech plays an important part in human-computer interaction. As a major branch of speech processing, speech emotion recognition (SER) has drawn much attention of researchers. Excellent discriminant features are of great importance in SER. However, emotion-specific features are commonly mixed with some other features. In this paper, we introduce an approach to pull apart these two parts of features as much as possible. First we employ an unsupervised feature learning framework to achieve some rough features. Then these rough features are further fed into a semi-supervised feature learning framework. In this phase, efforts are made to disentangle the emotion-specific features and some other features by using a novel loss function, which combines reconstruction penalty, orthogonal penalty, discriminative penalty and verification penalty. Orthogonal penalty is utilized to disentangle emotion-specific features and other features. The discriminative penalty enlarges inter-emotion variations, while the verification penalty reduces the intra-emotion variations. Evaluations on the FAU Aibo emotion database show that our approach can improve the speech emotion classification performance.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"374-379"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83079266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
3D emotional facial animation synthesis with factored conditional Restricted Boltzmann Machines 三维情感面部动画合成与因子条件受限玻尔兹曼机
Yong Zhao, D. Jiang, H. Sahli
This paper presents a 3D emotional facial animation synthesis approach based on the Factored Conditional Restricted Boltzmann Machines (FCRBM). Facial Action Parameters (FAPs) extracted from 2D face image sequences, are adopted to train the FCRBM model parameters. Based on the trained model, given an emotion label sequence and several initial frames of FAPs, the corresponding FAP sequence is generated via the Gibbs sampling, and then used to construct the MPEG-4 compliant 3D facial animation. Emotion recognition and subjective evaluation on the synthesized animations show that the proposed method can obtain natural facial animations representing well the dynamic process of emotions. Besides, facial animation with smooth emotion transitions can be obtained by blending the emotion labels.
提出了一种基于因子条件受限玻尔兹曼机(FCRBM)的三维情感人脸动画合成方法。采用从二维人脸图像序列中提取的面部动作参数(FAPs)来训练FCRBM模型参数。在训练好的模型基础上,给定一个情感标签序列和若干初始帧的FAP序列,通过Gibbs采样生成相应的FAP序列,然后用于构建符合MPEG-4标准的三维人脸动画。情绪识别和对合成动画的主观评价表明,所提出的方法可以得到反映情绪动态过程的自然面部动画。此外,通过混合情绪标签,可以得到情绪转换流畅的面部动画。
{"title":"3D emotional facial animation synthesis with factored conditional Restricted Boltzmann Machines","authors":"Yong Zhao, D. Jiang, H. Sahli","doi":"10.1109/ACII.2015.7344664","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344664","url":null,"abstract":"This paper presents a 3D emotional facial animation synthesis approach based on the Factored Conditional Restricted Boltzmann Machines (FCRBM). Facial Action Parameters (FAPs) extracted from 2D face image sequences, are adopted to train the FCRBM model parameters. Based on the trained model, given an emotion label sequence and several initial frames of FAPs, the corresponding FAP sequence is generated via the Gibbs sampling, and then used to construct the MPEG-4 compliant 3D facial animation. Emotion recognition and subjective evaluation on the synthesized animations show that the proposed method can obtain natural facial animations representing well the dynamic process of emotions. Besides, facial animation with smooth emotion transitions can be obtained by blending the emotion labels.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"42 1","pages":"797-803"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81366722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Emotion, voices and musical instruments: Repeated exposure to angry vocal sounds makes instrumental sounds angrier 情绪、声音和乐器:反复听到愤怒的声音会使乐器的声音更愤怒
Casady Bowman, T. Yamauchi, Kunchen Xiao
The perception of emotion is critical for social interactions. Nonlinguistic signals such as those in the human voice and musical instruments are used for communicating emotion. Using an adaptation paradigm, this study examines the extent to which common mental mechanisms are applied for emotion processing of instrumental and vocal sounds. In two experiments we show that prolonged exposure to affective non-linguistic vocalizations elicits auditory after effects when participants are tested on instrumental morphs (Experiment 1a), yet no aftereffects are apparent when participants are exposed to affective instrumental sounds and tested on non-linguistic voices (Experiment 1b). Specifically, results indicate that exposure to angry vocal sounds made participants perceive instrumental sounds as angrier and less fearful, but not vice versa. These findings suggest that there is a directionality for emotion perception in vocal and instrumental sounds. Significantly, this unidirectional relationship reveals that mechanisms used for emotion processing is likely to be shared from vocal sounds to instrumental sounds, but not vice versa.
对情感的感知对社会互动至关重要。非语言信号,如人声和乐器中的信号,用于交流情感。本研究采用适应范式,考察了共同的心理机制在器乐和声乐情感加工中的应用程度。在两个实验中,我们发现长时间暴露于情感性的非语言发声会在参与者进行器乐音型测试时产生听觉后效(实验1a),而当参与者暴露于情感性的器乐声音和非语言声音测试时则没有明显的后效(实验1b)。具体来说,结果表明,接触愤怒的声音会让参与者认为乐器的声音更愤怒,更不害怕,反之亦然。这些发现表明,在声乐和器乐声音中存在着情感感知的方向性。值得注意的是,这种单向关系揭示了用于情绪处理的机制很可能是声乐和器乐声音共享的,而不是相反。
{"title":"Emotion, voices and musical instruments: Repeated exposure to angry vocal sounds makes instrumental sounds angrier","authors":"Casady Bowman, T. Yamauchi, Kunchen Xiao","doi":"10.1109/ACII.2015.7344641","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344641","url":null,"abstract":"The perception of emotion is critical for social interactions. Nonlinguistic signals such as those in the human voice and musical instruments are used for communicating emotion. Using an adaptation paradigm, this study examines the extent to which common mental mechanisms are applied for emotion processing of instrumental and vocal sounds. In two experiments we show that prolonged exposure to affective non-linguistic vocalizations elicits auditory after effects when participants are tested on instrumental morphs (Experiment 1a), yet no aftereffects are apparent when participants are exposed to affective instrumental sounds and tested on non-linguistic voices (Experiment 1b). Specifically, results indicate that exposure to angry vocal sounds made participants perceive instrumental sounds as angrier and less fearful, but not vice versa. These findings suggest that there is a directionality for emotion perception in vocal and instrumental sounds. Significantly, this unidirectional relationship reveals that mechanisms used for emotion processing is likely to be shared from vocal sounds to instrumental sounds, but not vice versa.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"2012 1","pages":"670-676"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82621663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multimodal emotion recognition in response to videos (Extended abstract) 视频响应的多模态情感识别(扩展摘要)
M. Soleymani, M. Pantic, T. Pun
We present a user-independent emotion recognition method with the goal of detecting expected emotions or affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study. The arousal classes were calm, medium aroused and activated and the valence classes were unpleasant, neutral and pleasant. A one-participant-out cross validation was employed to evaluate the classification performance in a user-independent approach. The best classification accuracy of 68.5% for three labels of valence and 76.4% for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.
我们提出了一种独立于用户的情感识别方法,目的是利用脑电图(EEG)、瞳孔反应和凝视距离来检测视频的预期情绪或情感标签。我们首先从电影和网络资源中选择了20个带有外在情感内容的视频片段。然后记录24名参与者在观看情感视频片段时的脑电图反应和眼球注视数据。根据初步研究中给片段的唤醒和效价分数的中位数来定义基本真相。唤醒等级为平静、中等唤醒和激活,效价等级为不愉快、中性和愉快。采用一参与者出交叉验证,以用户独立的方式评估分类性能。使用模态融合策略和支持向量机对三个效价标签的分类准确率为68.5%,对三个唤醒标签的分类准确率为76.4%。对24名参与者的研究结果表明,用户独立情绪识别在唤醒评估中可以优于个人自我报告,而在效价评估中不会表现不佳。
{"title":"Multimodal emotion recognition in response to videos (Extended abstract)","authors":"M. Soleymani, M. Pantic, T. Pun","doi":"10.1109/ACII.2015.7344615","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344615","url":null,"abstract":"We present a user-independent emotion recognition method with the goal of detecting expected emotions or affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study. The arousal classes were calm, medium aroused and activated and the valence classes were unpleasant, neutral and pleasant. A one-participant-out cross validation was employed to evaluate the classification performance in a user-independent approach. The best classification accuracy of 68.5% for three labels of valence and 76.4% for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"45 1","pages":"491-497"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88150252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A multi-label convolutional neural network approach to cross-domain action unit detection 一种多标签卷积神经网络跨域动作单元检测方法
Sayan Ghosh, Eugene Laksana, Stefan Scherer, Louis-Philippe Morency
Action Unit (AU) detection from facial images is an important classification task in affective computing. However most existing approaches use carefully engineered feature extractors along with off-the-shelf classifiers. There has also been less focus on how well classifiers generalize when tested on different datasets. In our paper, we propose a multi-label convolutional neural network approach to learn a shared representation between multiple AUs directly from the input image. Experiments on three AU datasets- CK+, DISFA and BP4D indicate that our approach obtains competitive results on all datasets. Cross-dataset experiments also indicate that the network generalizes well to other datasets, even when under different training and testing conditions.
面部图像的动作单元(AU)检测是情感计算中的一项重要分类任务。然而,大多数现有的方法使用精心设计的特征提取器和现成的分类器。在对不同数据集进行测试时,分类器泛化的效果也很少受到关注。在本文中,我们提出了一种多标签卷积神经网络方法,直接从输入图像中学习多个au之间的共享表示。在CK+、DISFA和BP4D三个AU数据集上的实验表明,我们的方法在所有数据集上都获得了具有竞争力的结果。跨数据集实验也表明,即使在不同的训练和测试条件下,网络也能很好地泛化到其他数据集。
{"title":"A multi-label convolutional neural network approach to cross-domain action unit detection","authors":"Sayan Ghosh, Eugene Laksana, Stefan Scherer, Louis-Philippe Morency","doi":"10.1109/ACII.2015.7344632","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344632","url":null,"abstract":"Action Unit (AU) detection from facial images is an important classification task in affective computing. However most existing approaches use carefully engineered feature extractors along with off-the-shelf classifiers. There has also been less focus on how well classifiers generalize when tested on different datasets. In our paper, we propose a multi-label convolutional neural network approach to learn a shared representation between multiple AUs directly from the input image. Experiments on three AU datasets- CK+, DISFA and BP4D indicate that our approach obtains competitive results on all datasets. Cross-dataset experiments also indicate that the network generalizes well to other datasets, even when under different training and testing conditions.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"24 2 1","pages":"609-615"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88675540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
An investigation of emotion changes from speech 对言语引起的情绪变化的研究
Zhaocheng Huang
Emotion recognition based on speech plays an important role in Human Computer Interaction (HCI), which has motivated extensive recent investigation into this area. However, current research on emotion recognition is focused on recognizing emotion on a per-file basis and mostly does not provide insight into emotion changes. In my research, emotion transition problem will be investigated, including localizing emotion change points, recognizing emotion transition patterns and predicting or recognizing emotion changes. As well as being potentially important in applications, the research delving into emotion changes paves the way towards a better understanding of emotions from engineering and potentially psychological perspectives.
基于语音的情感识别在人机交互(HCI)中起着重要的作用,近年来引起了这一领域的广泛研究。然而,目前对情绪识别的研究主要集中在基于文件的情绪识别上,大多没有提供对情绪变化的洞察。在我的研究中,我将研究情绪转变问题,包括情绪变化点的定位、情绪转变模式的识别和情绪变化的预测或识别。除了在应用中具有潜在的重要意义外,对情绪变化的深入研究为从工程和潜在的心理学角度更好地理解情绪铺平了道路。
{"title":"An investigation of emotion changes from speech","authors":"Zhaocheng Huang","doi":"10.1109/ACII.2015.7344650","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344650","url":null,"abstract":"Emotion recognition based on speech plays an important role in Human Computer Interaction (HCI), which has motivated extensive recent investigation into this area. However, current research on emotion recognition is focused on recognizing emotion on a per-file basis and mostly does not provide insight into emotion changes. In my research, emotion transition problem will be investigated, including localizing emotion change points, recognizing emotion transition patterns and predicting or recognizing emotion changes. As well as being potentially important in applications, the research delving into emotion changes paves the way towards a better understanding of emotions from engineering and potentially psychological perspectives.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"474 1","pages":"733-736"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79930998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2015 International Conference on Affective Computing and Intelligent Interaction (ACII)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1