Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344596
Yongsen Tao, Kunxia Wang, Jing Yang, Ning An, Lian Li
Feature selection is a significant aspect of speech emotion recognition system. How to select a small subset out of the thousands of speech data is important for accurate classification of speech emotion. In this paper we investigate heuristic algorithm Harmony search (HS) for feature selection. We extract 3 feature sets, including MFCC, Fourier Parameters (FP), and features extracted with The Munich open Speech and Music Interpretation by Large Space Extraction (openSMILE) toolkit, from Berlin German emotion database (EMODB) and Chinese Elderly emotion database (EESDB). And combine MFCC with FP as the fourth feature set. We use Harmony search to select subsets and decrease the dimension space, and employ 10-fold cross validation in LIBSVM to evaluate the change of accuracy between selected subsets and original sets. Experimental results show that each subset's size reduced by about 50%, however, there is no sharp degeneration on accuracy and the accuracy almost maintains the original ones.
{"title":"Harmony search for feature selection in speech emotion recognition","authors":"Yongsen Tao, Kunxia Wang, Jing Yang, Ning An, Lian Li","doi":"10.1109/ACII.2015.7344596","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344596","url":null,"abstract":"Feature selection is a significant aspect of speech emotion recognition system. How to select a small subset out of the thousands of speech data is important for accurate classification of speech emotion. In this paper we investigate heuristic algorithm Harmony search (HS) for feature selection. We extract 3 feature sets, including MFCC, Fourier Parameters (FP), and features extracted with The Munich open Speech and Music Interpretation by Large Space Extraction (openSMILE) toolkit, from Berlin German emotion database (EMODB) and Chinese Elderly emotion database (EESDB). And combine MFCC with FP as the fourth feature set. We use Harmony search to select subsets and decrease the dimension space, and employ 10-fold cross validation in LIBSVM to evaluate the change of accuracy between selected subsets and original sets. Experimental results show that each subset's size reduced by about 50%, however, there is no sharp degeneration on accuracy and the accuracy almost maintains the original ones.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"42 1","pages":"362-367"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72636119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344645
Leimin Tian, Johanna D. Moore, Catherine Lai
In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.
{"title":"Emotion recognition in spontaneous and acted dialogues","authors":"Leimin Tian, Johanna D. Moore, Catherine Lai","doi":"10.1109/ACII.2015.7344645","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344645","url":null,"abstract":"In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"25 1","pages":"698-704"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73834455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344692
Stefan Rank, Cathy Lu
We demonstrate PhysSigTK, a physiological signals toolkit for making low-cost hardware accessible in the Unity3D game development environment so that designers of affective games can experiment with how engagement can be captured in their games. Rather than proposing a context-free way of measuring engagement, we enable designers to test how affordable hardware could fit into the assessment of players' states and progress in their particular game using a range of tools.
{"title":"PhysSigTK: Enabling engagement experiments with physiological signals for game design","authors":"Stefan Rank, Cathy Lu","doi":"10.1109/ACII.2015.7344692","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344692","url":null,"abstract":"We demonstrate PhysSigTK, a physiological signals toolkit for making low-cost hardware accessible in the Unity3D game development environment so that designers of affective games can experiment with how engagement can be captured in their games. Rather than proposing a context-free way of measuring engagement, we enable designers to test how affordable hardware could fit into the assessment of players' states and progress in their particular game using a range of tools.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"138 1","pages":"968-969"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73967226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344601
L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran
Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.
{"title":"Utilizing multimodal cues to automatically evaluate public speaking performance","authors":"L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran","doi":"10.1109/ACII.2015.7344601","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344601","url":null,"abstract":"Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"110 1","pages":"394-400"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81753144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344598
W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao
Speech plays an important part in human-computer interaction. As a major branch of speech processing, speech emotion recognition (SER) has drawn much attention of researchers. Excellent discriminant features are of great importance in SER. However, emotion-specific features are commonly mixed with some other features. In this paper, we introduce an approach to pull apart these two parts of features as much as possible. First we employ an unsupervised feature learning framework to achieve some rough features. Then these rough features are further fed into a semi-supervised feature learning framework. In this phase, efforts are made to disentangle the emotion-specific features and some other features by using a novel loss function, which combines reconstruction penalty, orthogonal penalty, discriminative penalty and verification penalty. Orthogonal penalty is utilized to disentangle emotion-specific features and other features. The discriminative penalty enlarges inter-emotion variations, while the verification penalty reduces the intra-emotion variations. Evaluations on the FAU Aibo emotion database show that our approach can improve the speech emotion classification performance.
{"title":"Learning speech emotion features by joint disentangling-discrimination","authors":"W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao","doi":"10.1109/ACII.2015.7344598","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344598","url":null,"abstract":"Speech plays an important part in human-computer interaction. As a major branch of speech processing, speech emotion recognition (SER) has drawn much attention of researchers. Excellent discriminant features are of great importance in SER. However, emotion-specific features are commonly mixed with some other features. In this paper, we introduce an approach to pull apart these two parts of features as much as possible. First we employ an unsupervised feature learning framework to achieve some rough features. Then these rough features are further fed into a semi-supervised feature learning framework. In this phase, efforts are made to disentangle the emotion-specific features and some other features by using a novel loss function, which combines reconstruction penalty, orthogonal penalty, discriminative penalty and verification penalty. Orthogonal penalty is utilized to disentangle emotion-specific features and other features. The discriminative penalty enlarges inter-emotion variations, while the verification penalty reduces the intra-emotion variations. Evaluations on the FAU Aibo emotion database show that our approach can improve the speech emotion classification performance.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"374-379"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83079266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344664
Yong Zhao, D. Jiang, H. Sahli
This paper presents a 3D emotional facial animation synthesis approach based on the Factored Conditional Restricted Boltzmann Machines (FCRBM). Facial Action Parameters (FAPs) extracted from 2D face image sequences, are adopted to train the FCRBM model parameters. Based on the trained model, given an emotion label sequence and several initial frames of FAPs, the corresponding FAP sequence is generated via the Gibbs sampling, and then used to construct the MPEG-4 compliant 3D facial animation. Emotion recognition and subjective evaluation on the synthesized animations show that the proposed method can obtain natural facial animations representing well the dynamic process of emotions. Besides, facial animation with smooth emotion transitions can be obtained by blending the emotion labels.
{"title":"3D emotional facial animation synthesis with factored conditional Restricted Boltzmann Machines","authors":"Yong Zhao, D. Jiang, H. Sahli","doi":"10.1109/ACII.2015.7344664","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344664","url":null,"abstract":"This paper presents a 3D emotional facial animation synthesis approach based on the Factored Conditional Restricted Boltzmann Machines (FCRBM). Facial Action Parameters (FAPs) extracted from 2D face image sequences, are adopted to train the FCRBM model parameters. Based on the trained model, given an emotion label sequence and several initial frames of FAPs, the corresponding FAP sequence is generated via the Gibbs sampling, and then used to construct the MPEG-4 compliant 3D facial animation. Emotion recognition and subjective evaluation on the synthesized animations show that the proposed method can obtain natural facial animations representing well the dynamic process of emotions. Besides, facial animation with smooth emotion transitions can be obtained by blending the emotion labels.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"42 1","pages":"797-803"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81366722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344641
Casady Bowman, T. Yamauchi, Kunchen Xiao
The perception of emotion is critical for social interactions. Nonlinguistic signals such as those in the human voice and musical instruments are used for communicating emotion. Using an adaptation paradigm, this study examines the extent to which common mental mechanisms are applied for emotion processing of instrumental and vocal sounds. In two experiments we show that prolonged exposure to affective non-linguistic vocalizations elicits auditory after effects when participants are tested on instrumental morphs (Experiment 1a), yet no aftereffects are apparent when participants are exposed to affective instrumental sounds and tested on non-linguistic voices (Experiment 1b). Specifically, results indicate that exposure to angry vocal sounds made participants perceive instrumental sounds as angrier and less fearful, but not vice versa. These findings suggest that there is a directionality for emotion perception in vocal and instrumental sounds. Significantly, this unidirectional relationship reveals that mechanisms used for emotion processing is likely to be shared from vocal sounds to instrumental sounds, but not vice versa.
{"title":"Emotion, voices and musical instruments: Repeated exposure to angry vocal sounds makes instrumental sounds angrier","authors":"Casady Bowman, T. Yamauchi, Kunchen Xiao","doi":"10.1109/ACII.2015.7344641","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344641","url":null,"abstract":"The perception of emotion is critical for social interactions. Nonlinguistic signals such as those in the human voice and musical instruments are used for communicating emotion. Using an adaptation paradigm, this study examines the extent to which common mental mechanisms are applied for emotion processing of instrumental and vocal sounds. In two experiments we show that prolonged exposure to affective non-linguistic vocalizations elicits auditory after effects when participants are tested on instrumental morphs (Experiment 1a), yet no aftereffects are apparent when participants are exposed to affective instrumental sounds and tested on non-linguistic voices (Experiment 1b). Specifically, results indicate that exposure to angry vocal sounds made participants perceive instrumental sounds as angrier and less fearful, but not vice versa. These findings suggest that there is a directionality for emotion perception in vocal and instrumental sounds. Significantly, this unidirectional relationship reveals that mechanisms used for emotion processing is likely to be shared from vocal sounds to instrumental sounds, but not vice versa.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"2012 1","pages":"670-676"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82621663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344615
M. Soleymani, M. Pantic, T. Pun
We present a user-independent emotion recognition method with the goal of detecting expected emotions or affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study. The arousal classes were calm, medium aroused and activated and the valence classes were unpleasant, neutral and pleasant. A one-participant-out cross validation was employed to evaluate the classification performance in a user-independent approach. The best classification accuracy of 68.5% for three labels of valence and 76.4% for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.
{"title":"Multimodal emotion recognition in response to videos (Extended abstract)","authors":"M. Soleymani, M. Pantic, T. Pun","doi":"10.1109/ACII.2015.7344615","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344615","url":null,"abstract":"We present a user-independent emotion recognition method with the goal of detecting expected emotions or affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study. The arousal classes were calm, medium aroused and activated and the valence classes were unpleasant, neutral and pleasant. A one-participant-out cross validation was employed to evaluate the classification performance in a user-independent approach. The best classification accuracy of 68.5% for three labels of valence and 76.4% for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"45 1","pages":"491-497"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88150252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344632
Sayan Ghosh, Eugene Laksana, Stefan Scherer, Louis-Philippe Morency
Action Unit (AU) detection from facial images is an important classification task in affective computing. However most existing approaches use carefully engineered feature extractors along with off-the-shelf classifiers. There has also been less focus on how well classifiers generalize when tested on different datasets. In our paper, we propose a multi-label convolutional neural network approach to learn a shared representation between multiple AUs directly from the input image. Experiments on three AU datasets- CK+, DISFA and BP4D indicate that our approach obtains competitive results on all datasets. Cross-dataset experiments also indicate that the network generalizes well to other datasets, even when under different training and testing conditions.
{"title":"A multi-label convolutional neural network approach to cross-domain action unit detection","authors":"Sayan Ghosh, Eugene Laksana, Stefan Scherer, Louis-Philippe Morency","doi":"10.1109/ACII.2015.7344632","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344632","url":null,"abstract":"Action Unit (AU) detection from facial images is an important classification task in affective computing. However most existing approaches use carefully engineered feature extractors along with off-the-shelf classifiers. There has also been less focus on how well classifiers generalize when tested on different datasets. In our paper, we propose a multi-label convolutional neural network approach to learn a shared representation between multiple AUs directly from the input image. Experiments on three AU datasets- CK+, DISFA and BP4D indicate that our approach obtains competitive results on all datasets. Cross-dataset experiments also indicate that the network generalizes well to other datasets, even when under different training and testing conditions.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"24 2 1","pages":"609-615"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88675540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344650
Zhaocheng Huang
Emotion recognition based on speech plays an important role in Human Computer Interaction (HCI), which has motivated extensive recent investigation into this area. However, current research on emotion recognition is focused on recognizing emotion on a per-file basis and mostly does not provide insight into emotion changes. In my research, emotion transition problem will be investigated, including localizing emotion change points, recognizing emotion transition patterns and predicting or recognizing emotion changes. As well as being potentially important in applications, the research delving into emotion changes paves the way towards a better understanding of emotions from engineering and potentially psychological perspectives.
{"title":"An investigation of emotion changes from speech","authors":"Zhaocheng Huang","doi":"10.1109/ACII.2015.7344650","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344650","url":null,"abstract":"Emotion recognition based on speech plays an important role in Human Computer Interaction (HCI), which has motivated extensive recent investigation into this area. However, current research on emotion recognition is focused on recognizing emotion on a per-file basis and mostly does not provide insight into emotion changes. In my research, emotion transition problem will be investigated, including localizing emotion change points, recognizing emotion transition patterns and predicting or recognizing emotion changes. As well as being potentially important in applications, the research delving into emotion changes paves the way towards a better understanding of emotions from engineering and potentially psychological perspectives.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"474 1","pages":"733-736"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79930998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}