Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344694
Isabel Pfab, Christian J. A. M. Willemse
Social touches are essential in interpersonal communication, for instance to show affect. Despite this importance, mediated interpersonal communication oftentimes lacks the possibility to touch. A human touch is a complex composition of several physical qualities and parameters, but different haptic technologies allow us to isolate such parameters and to investigate their opportunities and limitations for affective communication devices. In our research, we focus on the role that temperature may play in affective mediated communication. In the current paper, we describe the design of a wearable `research tool' that will facilitate systematic research on the possibilities of temperature in affective communication. We present use cases, and define a list of requirements accordingly. Based on a requirement fulfillment analysis, we conclude that our research tool can be of value for research on new forms of affective mediated communication.
{"title":"Design of a wearable research tool for warm mediated social touches","authors":"Isabel Pfab, Christian J. A. M. Willemse","doi":"10.1109/ACII.2015.7344694","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344694","url":null,"abstract":"Social touches are essential in interpersonal communication, for instance to show affect. Despite this importance, mediated interpersonal communication oftentimes lacks the possibility to touch. A human touch is a complex composition of several physical qualities and parameters, but different haptic technologies allow us to isolate such parameters and to investigate their opportunities and limitations for affective communication devices. In our research, we focus on the role that temperature may play in affective mediated communication. In the current paper, we describe the design of a wearable `research tool' that will facilitate systematic research on the possibilities of temperature in affective communication. We present use cases, and define a list of requirements accordingly. Based on a requirement fulfillment analysis, we conclude that our research tool can be of value for research on new forms of affective mediated communication.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"550 1","pages":"976-981"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77140928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344691
Sabrina Campano, Caroline Langlet, N. Glas, C. Clavel, C. Pelachaud
In this paper, we propose a computational model that provides an Embodied Conversational Agent (ECA) with the ability to generate verbal other-repetition (repetitions of some of the words uttered in the previous user speaker turn) when interacting with a user in a museum setting. We focus on the generation of other-repetitions expressing emotional stances in appreciation sentences. Emotional stances and their semantic features are selected according to the user's verbal input, and ECA's utterance is generated according to these features. We present an evaluation of this model through users' subjective reports. Results indicate that the expression of emotional stances by the ECA has a positive effect oIn this paper, we propose a computational model that provides an Embodied Conversational Agent (ECA) with the ability to generate verbal other-repetition (repetitions of some of the words uttered in the previous user speaker turn) when interacting with a user in a museum setting. We focus on the generation of other-repetitions expressing emotional stances in appreciation sentences. Emotional stances and their semantic features are selected according to the user's verbal input, and ECA's utterance is generated according to these features. We present an evaluation of this model through users' subjective reports. Results indicate that the expression of emotional stances by the ECA has a positive effect on user engagement, and that ECA's behaviours are rated as more believable by users when the ECA utters other-repetitions.n user engagement, and that ECA's behaviours are rated as more believable by users when the ECA utters other-repetitions.
{"title":"An ECA expressing appreciations","authors":"Sabrina Campano, Caroline Langlet, N. Glas, C. Clavel, C. Pelachaud","doi":"10.1109/ACII.2015.7344691","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344691","url":null,"abstract":"In this paper, we propose a computational model that provides an Embodied Conversational Agent (ECA) with the ability to generate verbal other-repetition (repetitions of some of the words uttered in the previous user speaker turn) when interacting with a user in a museum setting. We focus on the generation of other-repetitions expressing emotional stances in appreciation sentences. Emotional stances and their semantic features are selected according to the user's verbal input, and ECA's utterance is generated according to these features. We present an evaluation of this model through users' subjective reports. Results indicate that the expression of emotional stances by the ECA has a positive effect oIn this paper, we propose a computational model that provides an Embodied Conversational Agent (ECA) with the ability to generate verbal other-repetition (repetitions of some of the words uttered in the previous user speaker turn) when interacting with a user in a museum setting. We focus on the generation of other-repetitions expressing emotional stances in appreciation sentences. Emotional stances and their semantic features are selected according to the user's verbal input, and ECA's utterance is generated according to these features. We present an evaluation of this model through users' subjective reports. Results indicate that the expression of emotional stances by the ECA has a positive effect on user engagement, and that ECA's behaviours are rated as more believable by users when the ECA utters other-repetitions.n user engagement, and that ECA's behaviours are rated as more believable by users when the ECA utters other-repetitions.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"110 1","pages":"962-967"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85273604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344634
Meshia Cédric Oveneke, Isabel Gonzalez, Weiyi Wang, D. Jiang, H. Sahli
Understanding social signals is a very important aspect of human communication and interaction and has therefore attracted increased attention from various research areas. Among the different types of social signals, particular attention has been paid to facial expression of emotions and its automated analysis from image sequences. Automated facial expression analysis is a very challenging task due to the complex three-dimensional deformation and motion of the face associated to the facial expressions and the loss of 3D information during the image formation process. As a consequence, retrieving 3D spatio-temporal facial information from image sequences is essential for automated facial expression analysis. In this paper, we propose a framework for retrieving three-dimensional facial structure, motion and spatio-temporal features from monocular image sequences. First, we estimate monocular 3D scene flow by retrieving the facial structure using shape-from-shading (SFS) and combine it with 2D optical flow. Secondly, based on the retrieved structure and motion of the face, we extract spatio-temporal features for automated facial expression analysis. Experimental results illustrate the potential of the proposed 3D facial information retrieval framework for facial expression analysis, i.e. facial expression recognition and facial action-unit recognition on a benchmark dataset. This paves the way for future research on monocular 3D facial expression analysis.
{"title":"Monocular 3D facial information retrieval for automated facial expression analysis","authors":"Meshia Cédric Oveneke, Isabel Gonzalez, Weiyi Wang, D. Jiang, H. Sahli","doi":"10.1109/ACII.2015.7344634","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344634","url":null,"abstract":"Understanding social signals is a very important aspect of human communication and interaction and has therefore attracted increased attention from various research areas. Among the different types of social signals, particular attention has been paid to facial expression of emotions and its automated analysis from image sequences. Automated facial expression analysis is a very challenging task due to the complex three-dimensional deformation and motion of the face associated to the facial expressions and the loss of 3D information during the image formation process. As a consequence, retrieving 3D spatio-temporal facial information from image sequences is essential for automated facial expression analysis. In this paper, we propose a framework for retrieving three-dimensional facial structure, motion and spatio-temporal features from monocular image sequences. First, we estimate monocular 3D scene flow by retrieving the facial structure using shape-from-shading (SFS) and combine it with 2D optical flow. Secondly, based on the retrieved structure and motion of the face, we extract spatio-temporal features for automated facial expression analysis. Experimental results illustrate the potential of the proposed 3D facial information retrieval framework for facial expression analysis, i.e. facial expression recognition and facial action-unit recognition on a benchmark dataset. This paves the way for future research on monocular 3D facial expression analysis.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"15 1","pages":"623-629"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77674028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344600
Ingo Siegert, Ronald Böck, A. Wendemuth, Bogdan Vlasenko
In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets, introducing a new ranking method. Further, this enables us to present a method for automatic identification of groups of corpora with similar characteristics. This answers an urgent question in classifier development, namely whether data from different corpora is similar enough to jointly be used as training material, overcoming shortage of material in matching domains. We compare the results of this method with manual groupings of corpora. We consider the established emotional speech corpora AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS and VAM, however our approach is general.
{"title":"Exploring dataset similarities using PCA-based feature selection","authors":"Ingo Siegert, Ronald Böck, A. Wendemuth, Bogdan Vlasenko","doi":"10.1109/ACII.2015.7344600","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344600","url":null,"abstract":"In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets, introducing a new ranking method. Further, this enables us to present a method for automatic identification of groups of corpora with similar characteristics. This answers an urgent question in classifier development, namely whether data from different corpora is similar enough to jointly be used as training material, overcoming shortage of material in matching domains. We compare the results of this method with manual groupings of corpora. We consider the established emotional speech corpora AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS and VAM, however our approach is general.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"62 1","pages":"387-393"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90738063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344557
R. Cowie
It matters for affective computing to have a framework that brings key points about human emotion to mind in an orderly way. A natural option builds on the ancient view that overt emotion arises from interactions between rational awareness and systems of a different type whose functions are ongoing, but not obvious. Key ideas from modern research can be incorporated by assuming that the latter do five broad kinds of work: evaluating states of affairs; preparing us to act accordingly; learning from significant conjunctions; interrupting conscious processes if need be; and aligning us with other people. Multiple structures act as interfaces between those systems and rational awareness. Emotional feelings inform conscious awareness of what they are doing, and emotion words split the space of their activity into discrete regions. The picture is not ideal, but it offers a substantial organising device.
{"title":"The enduring basis of emotional episodes: Towards a capacious overview","authors":"R. Cowie","doi":"10.1109/ACII.2015.7344557","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344557","url":null,"abstract":"It matters for affective computing to have a framework that brings key points about human emotion to mind in an orderly way. A natural option builds on the ancient view that overt emotion arises from interactions between rational awareness and systems of a different type whose functions are ongoing, but not obvious. Key ideas from modern research can be incorporated by assuming that the latter do five broad kinds of work: evaluating states of affairs; preparing us to act accordingly; learning from significant conjunctions; interrupting conscious processes if need be; and aligning us with other people. Multiple structures act as interfaces between those systems and rational awareness. Emotional feelings inform conscious awareness of what they are doing, and emotion words split the space of their activity into discrete regions. The picture is not ideal, but it offers a substantial organising device.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"30 1","pages":"98-104"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85733448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344670
Yun Zhang, Wei Xin, D. Miao
This paper presents an original research on eye tracking based personality test. To deal with the unavoidable human deception and inaccurate self-assessment during subjective psychological test, eye tracking techniques are utilized to reveal the participant's cognitive procedure during test. A non-intrusive real-time eye tracking based questionnaire system is developed for Chinese military recruitment personality test. A pilot study is carried out on 12 qualified samples. The preliminary result of experiment indicates a strong correlation between the participant's fixation features and test results. And such kind of relationship can be developed as an assistive indicator or a predictive parameter to traditional psychological test result to highly improve its reliability and validity in future applications.
{"title":"Personality test based on eye tracking techniques","authors":"Yun Zhang, Wei Xin, D. Miao","doi":"10.1109/ACII.2015.7344670","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344670","url":null,"abstract":"This paper presents an original research on eye tracking based personality test. To deal with the unavoidable human deception and inaccurate self-assessment during subjective psychological test, eye tracking techniques are utilized to reveal the participant's cognitive procedure during test. A non-intrusive real-time eye tracking based questionnaire system is developed for Chinese military recruitment personality test. A pilot study is carried out on 12 qualified samples. The preliminary result of experiment indicates a strong correlation between the participant's fixation features and test results. And such kind of relationship can be developed as an assistive indicator or a predictive parameter to traditional psychological test result to highly improve its reliability and validity in future applications.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"91 1","pages":"832-837"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86028647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344637
Quan Gan, Chongliang Wu, Shangfei Wang, Q. Ji
Current works on differentiating between posed and spontaneous facial expressions usually use features that are handcrafted for expression category recognition. Till now, no features have been specifically designed for differentiating between posed and spontaneous facial expressions. Recently, deep learning models have been proven to be efficient for many challenging computer vision tasks, and therefore in this paper we propose using the deep Boltzmann machine to learn representations of facial images and to differentiate between posed and spontaneous facial expressions. First, faces are located from images. Then, a two-layer deep Boltzmann machine is trained to distinguish posed and spon-tanous expressions. Experimental results on two benchmark datasets, i.e. the SPOS and USTC-NVIE datasets, demonstrate that the deep Boltzmann machine performs well on posed and spontaneous expression differentiation tasks. Comparison results on both datasets show that our method has an advantage over the other methods.
{"title":"Posed and spontaneous facial expression differentiation using deep Boltzmann machines","authors":"Quan Gan, Chongliang Wu, Shangfei Wang, Q. Ji","doi":"10.1109/ACII.2015.7344637","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344637","url":null,"abstract":"Current works on differentiating between posed and spontaneous facial expressions usually use features that are handcrafted for expression category recognition. Till now, no features have been specifically designed for differentiating between posed and spontaneous facial expressions. Recently, deep learning models have been proven to be efficient for many challenging computer vision tasks, and therefore in this paper we propose using the deep Boltzmann machine to learn representations of facial images and to differentiate between posed and spontaneous facial expressions. First, faces are located from images. Then, a two-layer deep Boltzmann machine is trained to distinguish posed and spon-tanous expressions. Experimental results on two benchmark datasets, i.e. the SPOS and USTC-NVIE datasets, demonstrate that the deep Boltzmann machine performs well on posed and spontaneous expression differentiation tasks. Comparison results on both datasets show that our method has an advantage over the other methods.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"116 1","pages":"643-648"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86065151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344580
Abhinav Dhall, Roland Göcke
Depression and other mood disorders are common, disabling disorders with a profound impact on individuals and families. Inspite of its high prevalence, it is easily missed during the early stages. Automatic depression analysis has become a very active field of research in the affective computing community in the past few years. This paper presents a framework for depression analysis based on unimodal visual cues. Temporally piece-wise Fisher Vectors (FV) are computed on temporal segments. As a low-level feature, block-wise Local Binary Pattern-Three Orthogonal Planes descriptors are computed. Statistical aggregation techniques are analysed and compared for creating a discriminative representative for a video sample. The paper explores the strength of FV in representing temporal segments in a spontaneous clinical data. This creates a meaningful representation of the facial dynamics in a temporal segment. The experiments are conducted on the Audio Video Emotion Challenge (AVEC) 2014 German speaking depression database. The superior results of the proposed framework show the effectiveness of the technique as compared to the current state-of-art.
{"title":"A temporally piece-wise fisher vector approach for depression analysis","authors":"Abhinav Dhall, Roland Göcke","doi":"10.1109/ACII.2015.7344580","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344580","url":null,"abstract":"Depression and other mood disorders are common, disabling disorders with a profound impact on individuals and families. Inspite of its high prevalence, it is easily missed during the early stages. Automatic depression analysis has become a very active field of research in the affective computing community in the past few years. This paper presents a framework for depression analysis based on unimodal visual cues. Temporally piece-wise Fisher Vectors (FV) are computed on temporal segments. As a low-level feature, block-wise Local Binary Pattern-Three Orthogonal Planes descriptors are computed. Statistical aggregation techniques are analysed and compared for creating a discriminative representative for a video sample. The paper explores the strength of FV in representing temporal segments in a spontaneous clinical data. This creates a meaningful representation of the facial dynamics in a temporal segment. The experiments are conducted on the Audio Video Emotion Challenge (AVEC) 2014 German speaking depression database. The superior results of the proposed framework show the effectiveness of the technique as compared to the current state-of-art.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"23 1","pages":"255-259"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83298186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344689
G. McKeown
This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.
{"title":"Turing's menagerie: Talking lions, virtual bats, electric sheep and analogical peacocks: Common ground and common interest are necessary components of engagement","authors":"G. McKeown","doi":"10.1109/ACII.2015.7344689","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344689","url":null,"abstract":"This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"20 1","pages":"950-955"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88076923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344675
Yu-Hao Chin, Po-Chuan Lin, Tzu-Chiang Tai, Jia-Ching Wang
The music listened by human is sometimes exposed to noise. For example, background noise usually exists when listening to music in broadcasts or lives. The noise will worsen the performance in various music emotion recognition systems. To solve the problem, this work constructs a robust system for music emotion classification in a noisy environment. Furthermore, the genre is considered when determining the emotional label for the song. The proposed system consists of three major parts, i.e. subspace based noise suppression, genre index computation, and support vector machine (SVM). Firstly, the system uses noise suppression to remove the noise content in the signal. After that, acoustical features are extracted from each music clip. Next, a dictionary is constructed by using songs that cover a wide range of genres, and it is adopted to implement sparse coding. Via sparse coding, data can be transformed to sparse coefficient vectors, and this paper computes genre indexes for the music genres based on the sparse coefficient vector. The genre indexes are regarded as combination weights in the latter phase. At the training stage of the SVM, this paper train emotional models for each genre. At the prediction stage, the predictions that obtained by emotional models in each genre are weighted combined across all genres using the genre indexes. Finally, the proposed system annotates multiple emotional labels for a song based on the combined prediction. The experimental result shows that the system can achieve a good performance in both normal and noisy environments.
{"title":"Genre based emotion annotation for music in noisy environment","authors":"Yu-Hao Chin, Po-Chuan Lin, Tzu-Chiang Tai, Jia-Ching Wang","doi":"10.1109/ACII.2015.7344675","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344675","url":null,"abstract":"The music listened by human is sometimes exposed to noise. For example, background noise usually exists when listening to music in broadcasts or lives. The noise will worsen the performance in various music emotion recognition systems. To solve the problem, this work constructs a robust system for music emotion classification in a noisy environment. Furthermore, the genre is considered when determining the emotional label for the song. The proposed system consists of three major parts, i.e. subspace based noise suppression, genre index computation, and support vector machine (SVM). Firstly, the system uses noise suppression to remove the noise content in the signal. After that, acoustical features are extracted from each music clip. Next, a dictionary is constructed by using songs that cover a wide range of genres, and it is adopted to implement sparse coding. Via sparse coding, data can be transformed to sparse coefficient vectors, and this paper computes genre indexes for the music genres based on the sparse coefficient vector. The genre indexes are regarded as combination weights in the latter phase. At the training stage of the SVM, this paper train emotional models for each genre. At the prediction stage, the predictions that obtained by emotional models in each genre are weighted combined across all genres using the genre indexes. Finally, the proposed system annotates multiple emotional labels for a song based on the combined prediction. The experimental result shows that the system can achieve a good performance in both normal and noisy environments.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"863-866"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83086079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}