Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344577
Sharath Chandra Guntuku, Weisi Lin, M. A. Scott, G. Ghinea
Affect is evoked through an intricate relationship between the characteristics of stimuli, individuals, and systems of perception. While affect is widely researched, few studies consider the combination of multimedia system characteristics and human factors together. As such, this paper explores tpersonality (Five-Factor Model) and cultural traits (Hofstede Model) on the intensity of multimedia-evoked positive and negative affects (emotions). A set of 144 video sequences (from 12 short movie clips) were evaluated by 114 participants from a cross-cultural population, producing 1232 ratings. On this data, threehe influence of personality (Five-Factor Model) and cultural traits (Hofstede Model) on the intensity of multimedia-evoked positive and negative affects (emotions). A set of 144 video sequences (from 12 short movie clips) were evaluated by 114 participants from a cross-cultural population, producing 1232 ratings. On this data, three multilevel regression models are compared: a baseline model that only considers system factors; an extended model that includes personality and culture; and an optimistic model in which each participant is modelled. An analysis shows that personal and cultural traits represent 5.6% of the variance in positive affect and 13.6% of the variance in negative affect. In addition, the affect-enjoyment correlation varied across the clips. This suggests that personality and culture play a key role in predicting the intensity of negative affect and whether or not it is enjoyed, but a more sophisticated set of predictors is needed to model positive affect with the same efficacy.
{"title":"Modelling the influence of personality and culture on affect and enjoyment in multimedia","authors":"Sharath Chandra Guntuku, Weisi Lin, M. A. Scott, G. Ghinea","doi":"10.1109/ACII.2015.7344577","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344577","url":null,"abstract":"Affect is evoked through an intricate relationship between the characteristics of stimuli, individuals, and systems of perception. While affect is widely researched, few studies consider the combination of multimedia system characteristics and human factors together. As such, this paper explores tpersonality (Five-Factor Model) and cultural traits (Hofstede Model) on the intensity of multimedia-evoked positive and negative affects (emotions). A set of 144 video sequences (from 12 short movie clips) were evaluated by 114 participants from a cross-cultural population, producing 1232 ratings. On this data, threehe influence of personality (Five-Factor Model) and cultural traits (Hofstede Model) on the intensity of multimedia-evoked positive and negative affects (emotions). A set of 144 video sequences (from 12 short movie clips) were evaluated by 114 participants from a cross-cultural population, producing 1232 ratings. On this data, three multilevel regression models are compared: a baseline model that only considers system factors; an extended model that includes personality and culture; and an optimistic model in which each participant is modelled. An analysis shows that personal and cultural traits represent 5.6% of the variance in positive affect and 13.6% of the variance in negative affect. In addition, the affect-enjoyment correlation varied across the clips. This suggests that personality and culture play a key role in predicting the intensity of negative affect and whether or not it is enjoyed, but a more sophisticated set of predictors is needed to model positive affect with the same efficacy.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"16 1","pages":"236-242"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85818646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344571
Johnathan Mell, Gale M. Lucas, J. Gratch, A. Rosenfeld
Negotiation between virtual agents and humans is a complex field that requires designers of systems to be aware not only of the efficient solutions to a given game, but also the mechanisms by which humans create value over multiple negotiations. One way of considering the agent's impact beyond a single negotiation session is by considering the use of external “ledgers” across multiple sessions. We present results that describe the effects of favor exchange on negotiation outcomes, fairness, and trust for two distinct cross-cultural populations, and illustrate the ramifications of their similarities and differences on virtual agent design.
{"title":"Saying YES! The cross-cultural complexities of favors and trust in human-agent negotiation","authors":"Johnathan Mell, Gale M. Lucas, J. Gratch, A. Rosenfeld","doi":"10.1109/ACII.2015.7344571","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344571","url":null,"abstract":"Negotiation between virtual agents and humans is a complex field that requires designers of systems to be aware not only of the efficient solutions to a given game, but also the mechanisms by which humans create value over multiple negotiations. One way of considering the agent's impact beyond a single negotiation session is by considering the use of external “ledgers” across multiple sessions. We present results that describe the effects of favor exchange on negotiation outcomes, fairness, and trust for two distinct cross-cultural populations, and illustrate the ramifications of their similarities and differences on virtual agent design.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"1 1","pages":"194-200"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79315145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344581
Lang He, D. Jiang, H. Sahli
In this paper, we present our system design for audio visual multi-modal depression recognition. To improve the estimation accuracy of the Beck Depression Inventory (BDI) score, besides the Low Level Descriptors (LLD) features and the Local Gabor Binary Pattern-Three Orthogonal Planes (LGBP-TOP) features provided by the 2014 Audio/Visual Emotion Challenge and Workshop (AVEC2014), we extract extra features to capture key behavioural changes associated with depression. From audio we extract the speaking rate, and from video, the head pose features, the Space-Temporal Interesting Point (STIP) features, and local kinematic features via the Divergence-Curl-Shear descriptors. These features describe body movements, and spatio-temporal changes within the image sequence. We also consider global dynamic features, obtained using motion history histogram (MHH), bag of words (BOW) features and vector of local aggregated descriptors (VLAD). To capture the complementary information within the used features, we evaluate two fusion systems - the feature fusion scheme, and the model fusion scheme via local linear regression (LLR). Experiments are carried out on the training set and development set of the Depression Recognition Sub-Challenge (DSC) of AVEC2014, we obtain root mean square error (RMSE) of 7.6697, and mean absolute error (MAE) of 6.1683 on the development set, which are better or comparable with the state of the art results of the AVEC2014 challenge.
{"title":"Multimodal depression recognition with dynamic visual and audio cues","authors":"Lang He, D. Jiang, H. Sahli","doi":"10.1109/ACII.2015.7344581","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344581","url":null,"abstract":"In this paper, we present our system design for audio visual multi-modal depression recognition. To improve the estimation accuracy of the Beck Depression Inventory (BDI) score, besides the Low Level Descriptors (LLD) features and the Local Gabor Binary Pattern-Three Orthogonal Planes (LGBP-TOP) features provided by the 2014 Audio/Visual Emotion Challenge and Workshop (AVEC2014), we extract extra features to capture key behavioural changes associated with depression. From audio we extract the speaking rate, and from video, the head pose features, the Space-Temporal Interesting Point (STIP) features, and local kinematic features via the Divergence-Curl-Shear descriptors. These features describe body movements, and spatio-temporal changes within the image sequence. We also consider global dynamic features, obtained using motion history histogram (MHH), bag of words (BOW) features and vector of local aggregated descriptors (VLAD). To capture the complementary information within the used features, we evaluate two fusion systems - the feature fusion scheme, and the model fusion scheme via local linear regression (LLR). Experiments are carried out on the training set and development set of the Depression Recognition Sub-Challenge (DSC) of AVEC2014, we obtain root mean square error (RMSE) of 7.6697, and mean absolute error (MAE) of 6.1683 on the development set, which are better or comparable with the state of the art results of the AVEC2014 challenge.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"17 1","pages":"260-266"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81031062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344579
Junkai Chen, Z. Chi, Hong Fu
A new approach for pain event detection in video is presented in this paper. Different from some previous works which focused on frame-based detection, we target in detecting pain events at video level. In this work, we explore the spatial information of video frames and dynamic textures of video sequences, and propose two different types of features. HOG of fiducial points (P-HOG) is employed to extract spatial features from video frames and HOG from Three Orthogonal Planes (HOG-TOP) is used to represent dynamic textures of video subsequences. After that, we apply max pooling to represent a video sequence as a global feature vector. Multiple Kernel Learning (MKL) is utilized to find an optimal fusion of the two types of features. And an SVM with multiple kernels is trained to perform the final classification. We conduct our experiments on the UNBC-McMaster Shoulder Pain dataset and achieve promising results, showing the effectiveness of our approach.
{"title":"A new approach for pain event detection in video","authors":"Junkai Chen, Z. Chi, Hong Fu","doi":"10.1109/ACII.2015.7344579","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344579","url":null,"abstract":"A new approach for pain event detection in video is presented in this paper. Different from some previous works which focused on frame-based detection, we target in detecting pain events at video level. In this work, we explore the spatial information of video frames and dynamic textures of video sequences, and propose two different types of features. HOG of fiducial points (P-HOG) is employed to extract spatial features from video frames and HOG from Three Orthogonal Planes (HOG-TOP) is used to represent dynamic textures of video subsequences. After that, we apply max pooling to represent a video sequence as a global feature vector. Multiple Kernel Learning (MKL) is utilized to find an optimal fusion of the two types of features. And an SVM with multiple kernels is trained to perform the final classification. We conduct our experiments on the UNBC-McMaster Shoulder Pain dataset and achieve promising results, showing the effectiveness of our approach.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"71 1","pages":"250-254"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83229565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344555
Na Li, Yong Xia, Yuwei Xia
Classification of images based on the feelings generated by each image in its reviewers is becoming more and more popular. Due to the difficulty of gathering training data, this task is intrinsically a small-sample learning problem. Hence, the results produced by most existing solutions are less accurate. In this paper, we propose the semi-supervised hierarchical classification (SSHC) algorithm for emotional classification of color images. We extract three groups of features for each classification task and use those features in a two-level classification model that is based on the support vector machine (SVM) and Adaboost technique. To enlarge the training dataset, we employ each training image to retrieve similar images from the Internet cloud and jointly use the manually labeled small dataset and retrieved large but unlabeled dataset to train a classifier via semi-supervised learning. We have evaluated the proposed algorithm against the fuzzy similarity-based emotional classification (FSBEC) algorithm and another supervised hierarchical classification algorithm that does not learn from online images in three bi-class classification tasks, including “warm vs. cool”, “light vs. heavy” and “static vs. dynamic”. Our pilot results suggest that, by learning from the similar images archived in the Internet cloud, the proposed SSHC algorithm can produce more accurate emotional classification of color images.
{"title":"Semi-supervised emotional classification of color images by learning from cloud","authors":"Na Li, Yong Xia, Yuwei Xia","doi":"10.1109/ACII.2015.7344555","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344555","url":null,"abstract":"Classification of images based on the feelings generated by each image in its reviewers is becoming more and more popular. Due to the difficulty of gathering training data, this task is intrinsically a small-sample learning problem. Hence, the results produced by most existing solutions are less accurate. In this paper, we propose the semi-supervised hierarchical classification (SSHC) algorithm for emotional classification of color images. We extract three groups of features for each classification task and use those features in a two-level classification model that is based on the support vector machine (SVM) and Adaboost technique. To enlarge the training dataset, we employ each training image to retrieve similar images from the Internet cloud and jointly use the manually labeled small dataset and retrieved large but unlabeled dataset to train a classifier via semi-supervised learning. We have evaluated the proposed algorithm against the fuzzy similarity-based emotional classification (FSBEC) algorithm and another supervised hierarchical classification algorithm that does not learn from online images in three bi-class classification tasks, including “warm vs. cool”, “light vs. heavy” and “static vs. dynamic”. Our pilot results suggest that, by learning from the similar images archived in the Internet cloud, the proposed SSHC algorithm can produce more accurate emotional classification of color images.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"1 1","pages":"84-90"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83096511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344666
Chung-Hsien Wu, Wei-Bin Liang, Kuan-Chun Cheng, Jen-Chun Lin
This paper presents an approach to hierarchical modeling of temporal course in emotional expression for speech emotion recognition. In the proposed approach, a segmentation algorithm is employed to hierarchically chunk an input utterance into three-level temporal units, including low-level descriptors (LLDs)-based sub-utterance level, emotion profile (EP)-based sub-utterance level and utterance level. An emotion-oriented hierarchical structure is constructed based on the three-level units to describe the temporal emotion expression in an utterance. A hierarchical correlation model is also proposed to fuse the three-level outputs from the corresponding emotion recognizers and further model the correlation among them to determine the emotional state of the utterance. The EMO-DB corpus was used to evaluate the performance on speech emotion recognition. Experimental results show that the proposed method considering the temporal course in emotional expression provides the potential to improve the speech emotion recognition performance.
{"title":"Hierarchical modeling of temporal course in emotional expression for speech emotion recognition","authors":"Chung-Hsien Wu, Wei-Bin Liang, Kuan-Chun Cheng, Jen-Chun Lin","doi":"10.1109/ACII.2015.7344666","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344666","url":null,"abstract":"This paper presents an approach to hierarchical modeling of temporal course in emotional expression for speech emotion recognition. In the proposed approach, a segmentation algorithm is employed to hierarchically chunk an input utterance into three-level temporal units, including low-level descriptors (LLDs)-based sub-utterance level, emotion profile (EP)-based sub-utterance level and utterance level. An emotion-oriented hierarchical structure is constructed based on the three-level units to describe the temporal emotion expression in an utterance. A hierarchical correlation model is also proposed to fuse the three-level outputs from the corresponding emotion recognizers and further model the correlation among them to determine the emotional state of the utterance. The EMO-DB corpus was used to evaluate the performance on speech emotion recognition. Experimental results show that the proposed method considering the temporal course in emotional expression provides the potential to improve the speech emotion recognition performance.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"27 1","pages":"810-814"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88475715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344648
Christoffer Holmgård, Georgios N. Yannakakis, H. P. Martínez, Karen-Inge Karstoft
In this paper we profile the stress responses of patients diagnosed with post-traumatic stress disorder (PTSD) to individual events in the game-based PTSD stress inoculation and exposure virtual environment StartleMart. Thirteen veterans suffering from PTSD play the game while we record their skin conductance. Game logs are used to identify individual events, and continuous decomposition analysis is applied to the skin conductance signals to derive event-related stress responses. The extracted skin conductance features from this analysis are used to profile each individual player in terms of stress response. We observe a large degree of variation across the 13 veterans which further validates the idiosyncratic nature of PTSD physiological manifestations. Further to game data and skin conductance signals we ask PTSD patients to indicate the most stressful event experienced (class-based annotation) and also compare the stress level of all events in a pairwise preference manner (rank-based annotation). We compare the two annotation stress schemes by correlating the self-reports to individual event-based stress manifestations. The self-reports collected through class-based annotation exhibit no correlation to physiological responses, whereas, the pairwise preferences yield significant correlations to all skin conductance features extracted via continuous decomposition analysis. The core findings of the paper suggest that reporting of stress preferences across events yields more reliable data that capture aspects of the stress experienced and that features extracted from skin conductance via continuous decomposition analysis offer appropriate predictors of stress manifestation across PTSD patients.
{"title":"To rank or to classify? Annotating stress for reliable PTSD profiling","authors":"Christoffer Holmgård, Georgios N. Yannakakis, H. P. Martínez, Karen-Inge Karstoft","doi":"10.1109/ACII.2015.7344648","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344648","url":null,"abstract":"In this paper we profile the stress responses of patients diagnosed with post-traumatic stress disorder (PTSD) to individual events in the game-based PTSD stress inoculation and exposure virtual environment StartleMart. Thirteen veterans suffering from PTSD play the game while we record their skin conductance. Game logs are used to identify individual events, and continuous decomposition analysis is applied to the skin conductance signals to derive event-related stress responses. The extracted skin conductance features from this analysis are used to profile each individual player in terms of stress response. We observe a large degree of variation across the 13 veterans which further validates the idiosyncratic nature of PTSD physiological manifestations. Further to game data and skin conductance signals we ask PTSD patients to indicate the most stressful event experienced (class-based annotation) and also compare the stress level of all events in a pairwise preference manner (rank-based annotation). We compare the two annotation stress schemes by correlating the self-reports to individual event-based stress manifestations. The self-reports collected through class-based annotation exhibit no correlation to physiological responses, whereas, the pairwise preferences yield significant correlations to all skin conductance features extracted via continuous decomposition analysis. The core findings of the paper suggest that reporting of stress preferences across events yields more reliable data that capture aspects of the stress experienced and that features extracted from skin conductance via continuous decomposition analysis offer appropriate predictors of stress manifestation across PTSD patients.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"21 1","pages":"719-725"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87920872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344595
Andra Adams, P. Robinson
Classifying complex categorical emotions has been a relatively unexplored area of affective computing. We present a classifier trained to recognize 18 complex emotion categories. A leave-one-out training approach was used on 181 acted videos from the EU-Emotion Stimulus Set. Performance scores for the 18-choice classification problem were AROC = 0.84, 2AFC = 0.84, F1 = 0.33, Accuracy = 0.47. On a simplified 6-choice classification problem, the classifier had an accuracy of 0.64 compared with the validated human accuracy of 0.74. The classifier has been integrated into an expression training interface which gives meaningful feedback to humans on their portrayal of complex emotions through face and head movements. This work has applications as an intervention for Autism Spectrum Conditions.
分类复杂的分类情绪一直是情感计算的一个相对未开发的领域。我们提出了一个分类器训练识别18个复杂的情绪类别。对欧盟情绪刺激集的181个动作视频采用了“留一”训练方法。18选项分类问题的性能得分为AROC = 0.84, 2AFC = 0.84, F1 = 0.33,准确率= 0.47。在一个简化的6选项分类问题上,该分类器的准确率为0.64,而经过验证的人类准确率为0.74。该分类器已集成到表情训练界面中,该界面通过面部和头部运动向人类提供有意义的反馈,以描述复杂的情绪。这项工作可以应用于自闭症谱系疾病的干预。
{"title":"Automated recognition of complex categorical emotions from facial expressions and head motions","authors":"Andra Adams, P. Robinson","doi":"10.1109/ACII.2015.7344595","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344595","url":null,"abstract":"Classifying complex categorical emotions has been a relatively unexplored area of affective computing. We present a classifier trained to recognize 18 complex emotion categories. A leave-one-out training approach was used on 181 acted videos from the EU-Emotion Stimulus Set. Performance scores for the 18-choice classification problem were AROC = 0.84, 2AFC = 0.84, F1 = 0.33, Accuracy = 0.47. On a simplified 6-choice classification problem, the classifier had an accuracy of 0.64 compared with the validated human accuracy of 0.74. The classifier has been integrated into an expression training interface which gives meaningful feedback to humans on their portrayal of complex emotions through face and head movements. This work has applications as an intervention for Autism Spectrum Conditions.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"355-361"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74225878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344589
C. D. Melo, J. Gratch
Research shows that people consistently reach more efficient solutions than those predicted by standard economic models, which assume people are selfish. Artificial intelligence, in turn, seeks to create machines that can achieve these levels of efficiency in human-machine interaction. However, as reinforced in this paper, people's decisions are systematically less efficient - i.e., less fair and favorable - with machines than with humans. To understand the cause of this bias, we resort to a well-known experimental economics model: Fehr and Schmidt's inequity aversion model. This model accounts for people's aversion to disadvantageous outcome inequality (envy) and aversion to advantageous outcome inequality (guilt). We present an experiment where participants engaged in the ultimatum and dictator games with human or machine counterparts. By fitting this data to Fehr and Schmidt's model, we show that people acted as if they were just as envious of humans as of machines; but, in contrast, people showed less guilt when making unfavorable decisions to machines. This result, thus, provides critical insight into this bias people show, in economic settings, in favor of humans. We discuss implications for the design of machines that engage in social decision making with humans.
{"title":"People show envy, not guilt, when making decisions with machines","authors":"C. D. Melo, J. Gratch","doi":"10.1109/ACII.2015.7344589","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344589","url":null,"abstract":"Research shows that people consistently reach more efficient solutions than those predicted by standard economic models, which assume people are selfish. Artificial intelligence, in turn, seeks to create machines that can achieve these levels of efficiency in human-machine interaction. However, as reinforced in this paper, people's decisions are systematically less efficient - i.e., less fair and favorable - with machines than with humans. To understand the cause of this bias, we resort to a well-known experimental economics model: Fehr and Schmidt's inequity aversion model. This model accounts for people's aversion to disadvantageous outcome inequality (envy) and aversion to advantageous outcome inequality (guilt). We present an experiment where participants engaged in the ultimatum and dictator games with human or machine counterparts. By fitting this data to Fehr and Schmidt's model, we show that people acted as if they were just as envious of humans as of machines; but, in contrast, people showed less guilt when making unfavorable decisions to machines. This result, thus, provides critical insight into this bias people show, in economic settings, in favor of humans. We discuss implications for the design of machines that engage in social decision making with humans.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"56 1","pages":"315-321"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1109/ACII.2015.7344562
I. Lefter, H. Nefs, C. Jonker, L. Rothkrantz
Recent years have witnessed a growing interest in recognizing emotions and events based on speech. One of the applications of such systems is automatically detecting when a situations gets out of hand and human intervention is needed. Most studies have focused on increasing recognition accuracies using parts of the same dataset for training and testing. However, this says little about how such a trained system is expected to perform `in the wild'. In this paper we present a cross-corpus study using the audio part of three multimodal datasets containing negative human-human interactions. We present intra- and cross-corpus accuracies whilst manipulating the acoustic features, normalization schemes, and oversampling of the least represented class to alleviate the negative effects of data unbalance. We observe a decrease in performance when disjunct corpora are used for training and testing. Merging two datasets for training results in a slightly lower performance than the best one obtained by using only one corpus for training. A hand crafted low dimensional feature set shows competitive behavior when compared to a brute force high dimensional features vector. Corpus normalization and artificially creating samples of the sparsest class have a positive effect.
{"title":"Cross-corpus analysis for acoustic recognition of negative interactions","authors":"I. Lefter, H. Nefs, C. Jonker, L. Rothkrantz","doi":"10.1109/ACII.2015.7344562","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344562","url":null,"abstract":"Recent years have witnessed a growing interest in recognizing emotions and events based on speech. One of the applications of such systems is automatically detecting when a situations gets out of hand and human intervention is needed. Most studies have focused on increasing recognition accuracies using parts of the same dataset for training and testing. However, this says little about how such a trained system is expected to perform `in the wild'. In this paper we present a cross-corpus study using the audio part of three multimodal datasets containing negative human-human interactions. We present intra- and cross-corpus accuracies whilst manipulating the acoustic features, normalization schemes, and oversampling of the least represented class to alleviate the negative effects of data unbalance. We observe a decrease in performance when disjunct corpora are used for training and testing. Merging two datasets for training results in a slightly lower performance than the best one obtained by using only one corpus for training. A hand crafted low dimensional feature set shows competitive behavior when compared to a brute force high dimensional features vector. Corpus normalization and artificially creating samples of the sparsest class have a positive effect.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"48 1","pages":"132-138"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78730588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}