Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344621
Lian Zhang, Joshua W. Wade, A. Swanson, A. Weitlauf, Z. Warren, N. Sarkar
Autism Spectrum Disorder (ASD) is a group of neurodevelopmental disabilities with a high prevalence rate. While much research has focused on improving social communication deficits in ASD populations, less emphasis has been devoted to improving skills relevant for adult independent living, such as driving. In this paper, a novel virtual reality (VR)-based driving system with different difficulty levels of tasks is presented to train and improve driving skills of teenagers with ASD. The goal of this paper is to measure the cognitive load experienced by an individual with ASD while he is driving in the VR-based driving system. Several eye gaze features are identified that varied with cognitive load in an experiment participated by 12 teenagers with ASD. Several machine learning methods were compared and the ability of these methods to accurately measure cognitive load was validated with respect to the subjective rating of a therapist. Results will be used to build models in an intelligent VR-based driving system that can sense a participant's real-time cognitive load and offer driving tasks at an appropriate difficulty level in order to maximize the participant's long-term performance.
{"title":"Cognitive state measurement from eye gaze analysis in an intelligent virtual reality driving system for autism intervention","authors":"Lian Zhang, Joshua W. Wade, A. Swanson, A. Weitlauf, Z. Warren, N. Sarkar","doi":"10.1109/ACII.2015.7344621","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344621","url":null,"abstract":"Autism Spectrum Disorder (ASD) is a group of neurodevelopmental disabilities with a high prevalence rate. While much research has focused on improving social communication deficits in ASD populations, less emphasis has been devoted to improving skills relevant for adult independent living, such as driving. In this paper, a novel virtual reality (VR)-based driving system with different difficulty levels of tasks is presented to train and improve driving skills of teenagers with ASD. The goal of this paper is to measure the cognitive load experienced by an individual with ASD while he is driving in the VR-based driving system. Several eye gaze features are identified that varied with cognitive load in an experiment participated by 12 teenagers with ASD. Several machine learning methods were compared and the ability of these methods to accurately measure cognitive load was validated with respect to the subjective rating of a therapist. Results will be used to build models in an intelligent VR-based driving system that can sense a participant's real-time cognitive load and offer driving tasks at an appropriate difficulty level in order to maximize the participant's long-term performance.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"51 1","pages":"532-538"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79395135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344597
Ya Li, Linlin Chao, Yazhu Liu, Wei Bao, J. Tao
The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.
{"title":"From simulated speech to natural speech, what are the robust features for emotion recognition?","authors":"Ya Li, Linlin Chao, Yazhu Liu, Wei Bao, J. Tao","doi":"10.1109/ACII.2015.7344597","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344597","url":null,"abstract":"The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"64 1","pages":"368-373"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79580455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344603
Xiao Sun, Fei Gao, Chengcheng Li, F. Ren
Related research for sentiment analysis on Chinese microblog is aiming at analyzing the emotion of posters. This paper presents a content extension method that combines post with its' comments into a microblog conversation for sentiment analysis. A new convolutional auto encoder which can extract contextual sentiment information from microblog conversation of the post is proposed. Furthermore, a DBN model, which is composed by several layers of RBM(Restricted Boltzmann Machine) stacked together, is implemented to extract some higher level feature for short text of a post. These RBM layers can encoder observed short text to learn hidden structures or semantics information for better feature representation. A ClassRBM (Classification RBM) layer, which is stacked on top of RBM layers, is adapted to achieve the final sentiment classification. The experiment results demonstrate that, with proper structure and parameter, the performance of the proposed deep learning method on sentiment classification is better than state-of-the-art surface learning models such as SVM or NB, which also proves that DBN is suitable for short-length document classification with the proposed feature dimensionality extension method.
{"title":"Chinese microblog sentiment classification based on convolution neural network with content extension method","authors":"Xiao Sun, Fei Gao, Chengcheng Li, F. Ren","doi":"10.1109/ACII.2015.7344603","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344603","url":null,"abstract":"Related research for sentiment analysis on Chinese microblog is aiming at analyzing the emotion of posters. This paper presents a content extension method that combines post with its' comments into a microblog conversation for sentiment analysis. A new convolutional auto encoder which can extract contextual sentiment information from microblog conversation of the post is proposed. Furthermore, a DBN model, which is composed by several layers of RBM(Restricted Boltzmann Machine) stacked together, is implemented to extract some higher level feature for short text of a post. These RBM layers can encoder observed short text to learn hidden structures or semantics information for better feature representation. A ClassRBM (Classification RBM) layer, which is stacked on top of RBM layers, is adapted to achieve the final sentiment classification. The experiment results demonstrate that, with proper structure and parameter, the performance of the proposed deep learning method on sentiment classification is better than state-of-the-art surface learning models such as SVM or NB, which also proves that DBN is suitable for short-length document classification with the proposed feature dimensionality extension method.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"38 1","pages":"408-414"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78422426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344696
Yoren Gaffary, David Antonio Gómez Jáuregui, Jean-Claude Martin, M. Ammi
Previous studies about kinesthetic expressions of emotions are mainly based on acted expressions of affective states, which might be quite different from spontaneous expressions. In a previous study, we proposed a task to collect haptic expressions of a spontaneous stress. In this paper, we explore the effectiveness of this task to induce a spontaneous stress in two ways: a subjective feedback, and a more objective approach-avoidance behavior.
{"title":"Gestural and Postural Reactions to Stressful Event: Design of a Haptic Stressful Stimulus","authors":"Yoren Gaffary, David Antonio Gómez Jáuregui, Jean-Claude Martin, M. Ammi","doi":"10.1109/ACII.2015.7344696","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344696","url":null,"abstract":"Previous studies about kinesthetic expressions of emotions are mainly based on acted expressions of affective states, which might be quite different from spontaneous expressions. In a previous study, we proposed a task to collect haptic expressions of a spontaneous stress. In this paper, we explore the effectiveness of this task to induce a spontaneous stress in two ways: a subjective feedback, and a more objective approach-avoidance behavior.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"116 1","pages":"988-992"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76872659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344631
Isabel Gonzalez, W. Verhelst, Meshia Cédric Oveneke, H. Sahli, D. Jiang
We present a framework for combination aware AU intensity recognition. It includes a feature extraction approach that can handle small head movements which does not require face alignment. A three layered structure is used for the AU classification. The first layer is dedicated to independent AU recognition, and the second layer incorporates AU combination knowledge. At a third layer, AU dynamics are handled based on variable duration semi-Markov model. The first two layers are modeled using extreme learning machines (ELMs). ELMs have equal performance to support vector machines but are computationally more efficient, and can handle multi-class classification directly. Moreover, they include feature selection via manifold regularization. We show that the proposed layered classification scheme can improve results by considering AU combinations as well as intensity recognition.
{"title":"Framework for combination aware AU intensity recognition","authors":"Isabel Gonzalez, W. Verhelst, Meshia Cédric Oveneke, H. Sahli, D. Jiang","doi":"10.1109/ACII.2015.7344631","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344631","url":null,"abstract":"We present a framework for combination aware AU intensity recognition. It includes a feature extraction approach that can handle small head movements which does not require face alignment. A three layered structure is used for the AU classification. The first layer is dedicated to independent AU recognition, and the second layer incorporates AU combination knowledge. At a third layer, AU dynamics are handled based on variable duration semi-Markov model. The first two layers are modeled using extreme learning machines (ELMs). ELMs have equal performance to support vector machines but are computationally more efficient, and can handle multi-class classification directly. Moreover, they include feature selection via manifold regularization. We show that the proposed layered classification scheme can improve results by considering AU combinations as well as intensity recognition.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"21 1","pages":"602-608"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77729160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344545
Caroline Langlet, C. Clavel
This paper introduces a sentiment analysis method suitable to the human-agent and face-to-face interactions. We present the positioning of our system and its evaluation protocol according to the existing sentiment analysis literature and detail how the proposed system integrates the human-agent interaction issues. Finally, we provide an in-depth analysis of the results obtained by the evaluation, opening the discussion on the different difficulties and the remaining challenges of sentiment analysis in human-agent interactions.
{"title":"Adapting sentiment analysis to face-to-face human-agent interactions: From the detection to the evaluation issues","authors":"Caroline Langlet, C. Clavel","doi":"10.1109/ACII.2015.7344545","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344545","url":null,"abstract":"This paper introduces a sentiment analysis method suitable to the human-agent and face-to-face interactions. We present the positioning of our system and its evaluation protocol according to the existing sentiment analysis literature and detail how the proposed system integrates the human-agent interaction issues. Finally, we provide an in-depth analysis of the results obtained by the evaluation, opening the discussion on the different difficulties and the remaining challenges of sentiment analysis in human-agent interactions.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"12 1","pages":"14-20"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81535062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344693
S. Bhattacharya
Human emotion plays significant role is affecting our reasoning, learning, cognition and decision making, which in turn may affect usability of interactive systems. Detection of emotion of interactive system users is therefore important, as it can help design for improved user experience. In this work, we propose a model to detect the emotional state of the users of touch screen devices. Although a number of methods were developed to detect human emotion, those are computationally intensive and require setup cost. The model we propose aims to avoid these limitations and make the detection process viable for mobile platforms. We assume three emotional states of a user: positive, negative and neutral. The touch interaction is characterized by a set of seven features, derived from the finger strokes and taps. Our proposed model is a linear combination of these features. The model is developed and validated with empirical data involving 57 participants performing 7 touch input tasks. The validation study demonstrates a high prediction accuracy of 90.47%.
{"title":"A linear regression model to detect user emotion for touch input interactive systems","authors":"S. Bhattacharya","doi":"10.1109/ACII.2015.7344693","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344693","url":null,"abstract":"Human emotion plays significant role is affecting our reasoning, learning, cognition and decision making, which in turn may affect usability of interactive systems. Detection of emotion of interactive system users is therefore important, as it can help design for improved user experience. In this work, we propose a model to detect the emotional state of the users of touch screen devices. Although a number of methods were developed to detect human emotion, those are computationally intensive and require setup cost. The model we propose aims to avoid these limitations and make the detection process viable for mobile platforms. We assume three emotional states of a user: positive, negative and neutral. The touch interaction is characterized by a set of seven features, derived from the finger strokes and taps. Our proposed model is a linear combination of these features. The model is developed and validated with empirical data involving 57 participants performing 7 touch input tasks. The validation study demonstrates a high prediction accuracy of 90.47%.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"51 1","pages":"970-975"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83189095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344678
Florian B. Pokorny, F. Graf, F. Pernkopf, Björn Schuller
Boosted by a wide potential application spectrum, emotional speech recognition, i.e., the automatic computer-aided identification of human emotional states based on speech signals, currently describes a popular field of research. However, a variety of studies especially concentrating on the recognition of negative emotions often neglected the specific requirements of real-world scenarios, for example, robustness, real-time capability, and realistic speech corpora. Motivated by these facts, a robust, low-complex classification system for the detection of negative emotions in speech signals was implemented on the basis of a spontaneous, strongly emotionally colored speech corpus. Therefore, an innovative approach in the field of emotion recognition was applied as the core of the system - the bag-of-words approach that is originally known from text and image document retrieval applications. Thorough performance evaluations were carried out and a promising recognition accuracy of 65.6 % for the 2-class paradigm negative versus non-negative emotional states attests to the potential of bags-of-words in speech emotion recognition in the wild.
{"title":"Detection of negative emotions in speech signals using bags-of-audio-words","authors":"Florian B. Pokorny, F. Graf, F. Pernkopf, Björn Schuller","doi":"10.1109/ACII.2015.7344678","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344678","url":null,"abstract":"Boosted by a wide potential application spectrum, emotional speech recognition, i.e., the automatic computer-aided identification of human emotional states based on speech signals, currently describes a popular field of research. However, a variety of studies especially concentrating on the recognition of negative emotions often neglected the specific requirements of real-world scenarios, for example, robustness, real-time capability, and realistic speech corpora. Motivated by these facts, a robust, low-complex classification system for the detection of negative emotions in speech signals was implemented on the basis of a spontaneous, strongly emotionally colored speech corpus. Therefore, an innovative approach in the field of emotion recognition was applied as the core of the system - the bag-of-words approach that is originally known from text and image document retrieval applications. Thorough performance evaluations were carried out and a promising recognition accuracy of 65.6 % for the 2-class paradigm negative versus non-negative emotional states attests to the potential of bags-of-words in speech emotion recognition in the wild.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"4 1","pages":"879-884"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88698604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344674
Tao Zhuo, Peng Zhang, Kangli Chen, Yanning Zhang
The goal of video summarization is to turn large volume of video data into a compact visual summary that can be easily interpreted by users in a while. Existing summarization strategies employed the point based feature correspondence for the superframe segmentation. Unfortunately, the information carried by those sparse points is far from sufficiency and stability to describe the change of interesting regions of each frame. Therefore, in order to overcome the limitations of point feature, we propose a region correspondence based superframe segmentation to achieve more effective video summarization. Instead of utilizing the motion of feature points, we calculate the similarity of content-motion to obtain the strength of change between the consecutive frames. With the help of circulant structure kernel, the proposed method is able to perform more accurate motion estimation efficiently. Experimental testing on the videos from benchmark database has demonstrate the effectiveness of the proposed method.
{"title":"Superframe segmentation based on content-motion correspondence for social video summarization","authors":"Tao Zhuo, Peng Zhang, Kangli Chen, Yanning Zhang","doi":"10.1109/ACII.2015.7344674","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344674","url":null,"abstract":"The goal of video summarization is to turn large volume of video data into a compact visual summary that can be easily interpreted by users in a while. Existing summarization strategies employed the point based feature correspondence for the superframe segmentation. Unfortunately, the information carried by those sparse points is far from sufficiency and stability to describe the change of interesting regions of each frame. Therefore, in order to overcome the limitations of point feature, we propose a region correspondence based superframe segmentation to achieve more effective video summarization. Instead of utilizing the motion of feature points, we calculate the similarity of content-motion to obtain the strength of change between the consecutive frames. With the help of circulant structure kernel, the proposed method is able to perform more accurate motion estimation efficiently. Experimental testing on the videos from benchmark database has demonstrate the effectiveness of the proposed method.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"24 1","pages":"857-862"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87790330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344610
M. Schröder, Elisabetta Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. Maat, G. McKeown, Sathish Pammi, M. Pantic, C. Pelachaud, Björn Schuller, E. D. Sevin, M. Valstar, M. Wöllmer
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and non-verbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and non-verbal behaviours required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and non-verbal behaviour, since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on non-verbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling etc. We first summarise three prototype versions of the SAL scenario, in which the behaviour of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analysing and synthesising the respective behaviours. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behaviour, dialogue management, and synthesis of speaker and listener behaviour of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse, and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.
{"title":"Building autonomous sensitive artificial listeners (Extended abstract)","authors":"M. Schröder, Elisabetta Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. Maat, G. McKeown, Sathish Pammi, M. Pantic, C. Pelachaud, Björn Schuller, E. D. Sevin, M. Valstar, M. Wöllmer","doi":"10.1109/ACII.2015.7344610","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344610","url":null,"abstract":"This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and non-verbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and non-verbal behaviours required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and non-verbal behaviour, since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on non-verbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling etc. We first summarise three prototype versions of the SAL scenario, in which the behaviour of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analysing and synthesising the respective behaviours. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behaviour, dialogue management, and synthesis of speaker and listener behaviour of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse, and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"282 1","pages":"456-462"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88057551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}