E. Wolf, Manuel Martínez, Alina Roitberg, R. Stiefelhagen, B. Deml
Eye-based monitoring has been suggested as a means to measure mental load in a non-intrusive way. In most cases, the experiments have been conducted in a setting where the user has been mainly passive. This constraint does not reflect applications where we want to identify mental load of an active user, e.g. during surgery. The main objective of our work is to investigate the potential of an eye tracking device for measuring the mental load in realistic active situations. In our first experiments we calibrate our setup by using a well established passive setup. There, we confirm that our setup can recover reliably eye width in real time, and we can observe the previously reported relationship between pupil width and cognitive load, however, we also observe a very high variance between different test subjects. In a follow up active task experiment, neither pupil width nor eye gaze showed a significant predictive power over workflow disruptions. To address this, we present an approach for estimating the likelihood of workflow disruptions during active fine-motor tasks. Our method combines the eye-based data with the Bayesian Surprise theory and is able to successfully predict user's struggle with correlations of 35% and 75% respectively.
{"title":"Estimating mental load in passive and active tasks from pupil and gaze changes using bayesian surprise","authors":"E. Wolf, Manuel Martínez, Alina Roitberg, R. Stiefelhagen, B. Deml","doi":"10.1145/3279810.3279852","DOIUrl":"https://doi.org/10.1145/3279810.3279852","url":null,"abstract":"Eye-based monitoring has been suggested as a means to measure mental load in a non-intrusive way. In most cases, the experiments have been conducted in a setting where the user has been mainly passive. This constraint does not reflect applications where we want to identify mental load of an active user, e.g. during surgery. The main objective of our work is to investigate the potential of an eye tracking device for measuring the mental load in realistic active situations. In our first experiments we calibrate our setup by using a well established passive setup. There, we confirm that our setup can recover reliably eye width in real time, and we can observe the previously reported relationship between pupil width and cognitive load, however, we also observe a very high variance between different test subjects. In a follow up active task experiment, neither pupil width nor eye gaze showed a significant predictive power over workflow disruptions. To address this, we present an approach for estimating the likelihood of workflow disruptions during active fine-motor tasks. Our method combines the eye-based data with the Bayesian Surprise theory and is able to successfully predict user's struggle with correlations of 35% and 75% respectively.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127323803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nora Castner, Solveig Klepper, Lena Kopnarski, F. Hüttig, C. Keutel, K. Scheiter, Juliane Richter, Thérése F. Eder, Enkelejda Kasneci
The cognitive processes that underly expert decision making in medical image interpretation are crucial to the understanding of what constitutes optimal performance. Often, if an anomaly goes undetected, the exact nature of the false negative is not fully understood. This work looks at 24 experts' performance (true positives and false negatives) during an anomaly detection task for 13 images and the corresponding gaze behavior. By using a drawing and an eye-tracking experimental paradigm, we compared expert target anomaly detection in orthopantomographs (OPTs) against their own gaze behavior. We found there was a relationship between the number of anomalies detected and the anomalies looked at. However, roughly 70% of anomalies that were not explicitly marked in the drawing paradigm were looked at. Therefore, we looked how often an anomaly was glanced at. We found that when not explicitly marked, target anomalies were more often glanced at once or twice. In contrast, when targets were marked, the number of glances was higher. Furthermore, since this behavior was not similar over all images, we attribute these differences to image complexity.
{"title":"Overlooking","authors":"Nora Castner, Solveig Klepper, Lena Kopnarski, F. Hüttig, C. Keutel, K. Scheiter, Juliane Richter, Thérése F. Eder, Enkelejda Kasneci","doi":"10.1145/3279810.3279845","DOIUrl":"https://doi.org/10.1145/3279810.3279845","url":null,"abstract":"The cognitive processes that underly expert decision making in medical image interpretation are crucial to the understanding of what constitutes optimal performance. Often, if an anomaly goes undetected, the exact nature of the false negative is not fully understood. This work looks at 24 experts' performance (true positives and false negatives) during an anomaly detection task for 13 images and the corresponding gaze behavior. By using a drawing and an eye-tracking experimental paradigm, we compared expert target anomaly detection in orthopantomographs (OPTs) against their own gaze behavior. We found there was a relationship between the number of anomalies detected and the anomalies looked at. However, roughly 70% of anomalies that were not explicitly marked in the drawing paradigm were looked at. Therefore, we looked how often an anomaly was glanced at. We found that when not explicitly marked, target anomalies were more often glanced at once or twice. In contrast, when targets were marked, the number of glances was higher. Furthermore, since this behavior was not similar over all images, we attribute these differences to image complexity.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128346414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, J. Brady, F. Makedon
Recent developments in computer vision and the emergence of wearable sensors have opened opportunities for the development of advanced and sophisticated techniques to enable multi-modal user assessment and personalized training which is important in educational, industrial training and rehabilitation applications. They have also paved way for the use of assistive robots to accurately assess human cognitive and physical skills. Assessment and training cannot be generalized as the requirement varies for every person and for every application. The ability of the system to adapt to the individual's needs and performance is essential for its effectiveness. In this paper, the focus is on task performance prediction which is an important parameter to consider for personalization. Several research works focus on how to predict task performance based on physiological and behavioral data. In this work, we follow a multi-modal approach where the system collects information from different modalities to predict performance based on (a) User's emotional state recognized from facial expressions(Behavioral data), (b) User's emotional state from body postures(Behavioral data) (c) task performance from EEG signals (Physiological data) while the person performs a robot-based cognitive task. This multi-modal approach of combining physiological data and behavioral data produces the highest accuracy of 87.5 percent, which outperforms the accuracy of prediction extracted from any single modality. In particular, this approach is useful in finding associations between facial expressions, body postures and brain signals while a person performs a cognitive task.
{"title":"Multimodal approach for cognitive task performance prediction from body postures, facial expressions and EEG signal","authors":"Ashwin Ramesh Babu, Akilesh Rajavenkatanarayanan, J. Brady, F. Makedon","doi":"10.1145/3279810.3279849","DOIUrl":"https://doi.org/10.1145/3279810.3279849","url":null,"abstract":"Recent developments in computer vision and the emergence of wearable sensors have opened opportunities for the development of advanced and sophisticated techniques to enable multi-modal user assessment and personalized training which is important in educational, industrial training and rehabilitation applications. They have also paved way for the use of assistive robots to accurately assess human cognitive and physical skills. Assessment and training cannot be generalized as the requirement varies for every person and for every application. The ability of the system to adapt to the individual's needs and performance is essential for its effectiveness. In this paper, the focus is on task performance prediction which is an important parameter to consider for personalization. Several research works focus on how to predict task performance based on physiological and behavioral data. In this work, we follow a multi-modal approach where the system collects information from different modalities to predict performance based on (a) User's emotional state recognized from facial expressions(Behavioral data), (b) User's emotional state from body postures(Behavioral data) (c) task performance from EEG signals (Physiological data) while the person performs a robot-based cognitive task. This multi-modal approach of combining physiological data and behavioral data produces the highest accuracy of 87.5 percent, which outperforms the accuracy of prediction extracted from any single modality. In particular, this approach is useful in finding associations between facial expressions, body postures and brain signals while a person performs a cognitive task.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124367570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eye movements hold information about human perception, intention, and cognitive state. Various algorithms have been proposed to identify and distinguish eye movements, particularly fixations, saccades, and smooth pursuits. A major drawback of existing algorithms is that they rely on accurate and constant sampling rates, error free recordings, and impend straightforward adaptation to new movements, such as microsaccades, since they are designed for certain eye movement detection. We propose a novel rule-based machine learning approach to create detectors on annotated or simulated data. It is capable of learning diverse types of eye movements as well as automatically detecting pupil detection errors in the raw gaze data. Additionally, our approach is capable of using any sampling rate, even with fluctuations. Our approach consists of learning several interdependent thresholds and previous type classifications and combines them into sets of detectors automatically. We evaluated our approach against the state-of-the-art algorithms on publicly available datasets. Our approach is integrated in the newest version of EyeTrace which can be downloaded at http://www.ti.uni-tuebingen.de/Eyetrace.1751.0.html.
{"title":"Rule-based learning for eye movement type detection","authors":"Wolfgang Fuhl, Nora Castner, Enkelejda Kasneci","doi":"10.1145/3279810.3279844","DOIUrl":"https://doi.org/10.1145/3279810.3279844","url":null,"abstract":"Eye movements hold information about human perception, intention, and cognitive state. Various algorithms have been proposed to identify and distinguish eye movements, particularly fixations, saccades, and smooth pursuits. A major drawback of existing algorithms is that they rely on accurate and constant sampling rates, error free recordings, and impend straightforward adaptation to new movements, such as microsaccades, since they are designed for certain eye movement detection. We propose a novel rule-based machine learning approach to create detectors on annotated or simulated data. It is capable of learning diverse types of eye movements as well as automatically detecting pupil detection errors in the raw gaze data. Additionally, our approach is capable of using any sampling rate, even with fluctuations. Our approach consists of learning several interdependent thresholds and previous type classifications and combines them into sets of detectors automatically. We evaluated our approach against the state-of-the-art algorithms on publicly available datasets. Our approach is integrated in the newest version of EyeTrace which can be downloaded at http://www.ti.uni-tuebingen.de/Eyetrace.1751.0.html.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernd Dudzik, J. Broekens, Mark Antonius Neerincx, J. Olenick, C. Chang, S. Kozlowski, H. Hung
Combining self-reports in which individuals reflect on their thoughts and feelings (Experience Samples) with sensor data collected via ubiquitous monitoring can provide researchers and applications with detailed insights about human behavior and psychology. However, meaningfully associating these two sources of data with each other is difficult: while it is natural for human beings to reflect on their experience in terms of remembered episodes, it is an open challenge to retrace this subjective organization in sensor data referencing objective time. Lifelogging is a specific approach to the ubiquitous monitoring of individuals that can contribute to overcoming this recollection gap. It strives to create a comprehensive timeline of semantic annotations that reflect the impressions of the monitored person from his or her own subjective point-of-view. In this paper, we describe a novel approach for processing such lifelogs to situate remembered experiences in an objective timeline. It involves the computational modeling of individuals' memory processes to estimate segments within a lifelog acting as plausible digital representations for their recollections. We report about an empirical investigation in which we use our approach to discover plausible representations for remembered social interactions between participants in a longitudinal study. In particular, we describe an exploration of the behavior displayed by our model for memory processes in this setting. Finally, we explore the representations discovered for this study and discuss insights that might be gained from them.
{"title":"Discovering digital representations for remembered episodes from lifelog data","authors":"Bernd Dudzik, J. Broekens, Mark Antonius Neerincx, J. Olenick, C. Chang, S. Kozlowski, H. Hung","doi":"10.1145/3279810.3279850","DOIUrl":"https://doi.org/10.1145/3279810.3279850","url":null,"abstract":"Combining self-reports in which individuals reflect on their thoughts and feelings (Experience Samples) with sensor data collected via ubiquitous monitoring can provide researchers and applications with detailed insights about human behavior and psychology. However, meaningfully associating these two sources of data with each other is difficult: while it is natural for human beings to reflect on their experience in terms of remembered episodes, it is an open challenge to retrace this subjective organization in sensor data referencing objective time. Lifelogging is a specific approach to the ubiquitous monitoring of individuals that can contribute to overcoming this recollection gap. It strives to create a comprehensive timeline of semantic annotations that reflect the impressions of the monitored person from his or her own subjective point-of-view. In this paper, we describe a novel approach for processing such lifelogs to situate remembered experiences in an objective timeline. It involves the computational modeling of individuals' memory processes to estimate segments within a lifelog acting as plausible digital representations for their recollections. We report about an empirical investigation in which we use our approach to discover plausible representations for remembered social interactions between participants in a longitudinal study. In particular, we describe an exploration of the behavior displayed by our model for memory processes in this setting. Finally, we explore the representations discovered for this study and discuss insights that might be gained from them.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129155589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimer is a new technology that aims to provide a data-driven understanding of how humans cognitively and physically experience spatial environments. By multimodally measuring biosensor data to model how the built environment and its uses influence cognitive processes, Multimer aims to help space professionals like architects, workplace strategists, and urban planners make better design interventions. Multimer is perhaps the first spatial technology that collects biosensor data, like brainwave and heart rate data, and analyzes it with both spatiotemporal and neurophysiological tools. The Multimer mobile app can record data from several kinds of commonly available, inexpensive, wearable sensors, including EEG, ECG, pedometer, accelerometer, and gyroscope modules. The Multimer app also records user-entered information via its user interface and micro-surveys, then also combines all this data with a user's geo-location using GPS, beacons, and other location tools. Multimer's study platform displays all of this data in real-time at the individual and aggregate level. Multimer also validates the data by comparing the collected sensor and sentiment data in spatiotemporal contexts, and then it integrates the collected data with other data sets such as citizen reports, traffic data, and city amenities to provide actionable insights towards the evaluation and redesign of sites and spaces. This report presents preliminary results from the data validation process for a Multimer study of 101 subjects in New York City from August to October 2017. Ultimately, the aim of this study is to prototype a replicable, scalable model of how the built environment and the movement of traffic influence the neurophysiological state of pedestrians, cyclists, and drivers.
{"title":"Multimer: validating multimodal, cognitive data in the city: towards a model of how the urban environment influences streetscape users","authors":"Arlene Ducao, Ilias Koen, Zhiqi Guo","doi":"10.1145/3279810.3279853","DOIUrl":"https://doi.org/10.1145/3279810.3279853","url":null,"abstract":"Multimer is a new technology that aims to provide a data-driven understanding of how humans cognitively and physically experience spatial environments. By multimodally measuring biosensor data to model how the built environment and its uses influence cognitive processes, Multimer aims to help space professionals like architects, workplace strategists, and urban planners make better design interventions. Multimer is perhaps the first spatial technology that collects biosensor data, like brainwave and heart rate data, and analyzes it with both spatiotemporal and neurophysiological tools. The Multimer mobile app can record data from several kinds of commonly available, inexpensive, wearable sensors, including EEG, ECG, pedometer, accelerometer, and gyroscope modules. The Multimer app also records user-entered information via its user interface and micro-surveys, then also combines all this data with a user's geo-location using GPS, beacons, and other location tools. Multimer's study platform displays all of this data in real-time at the individual and aggregate level. Multimer also validates the data by comparing the collected sensor and sentiment data in spatiotemporal contexts, and then it integrates the collected data with other data sets such as citizen reports, traffic data, and city amenities to provide actionable insights towards the evaluation and redesign of sites and spaces. This report presents preliminary results from the data validation process for a Multimer study of 101 subjects in New York City from August to October 2017. Ultimately, the aim of this study is to prototype a replicable, scalable model of how the built environment and the movement of traffic influence the neurophysiological state of pedestrians, cyclists, and drivers.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mazen Salous, F. Putze, Tanja Schultz, Jutta Hild, J. Beyerer
Multimodal data is increasingly used in cognitive prediction models to better analyze and predict different user cognitive processes. Classifiers based on such data, however, have different performance characteristics. We discuss in this paper an intervention-free selection task using multimodal data of EEG and eye tracking in three different models. We show that a sequential model, LSTM, is more sensitive but less precise than a static model SVM. Moreover, we introduce a confidence-based Competition-Fusion model using both SVM and LSTM. The fusion model further improves the recall compared to either SVM or LSTM alone, without decreasing precision compared to LSTM. According to the results, we recommend SVM for interactive applications which require minimal false positives (high precision), and recommend LSTM and highly recommend Competition-Fusion Model for applications which handle intervention-free selection requests in an additional post-processing step, requiring higher recall than precision.
{"title":"Investigating static and sequential models for intervention-free selection using multimodal data of EEG and eye tracking","authors":"Mazen Salous, F. Putze, Tanja Schultz, Jutta Hild, J. Beyerer","doi":"10.1145/3279810.3279841","DOIUrl":"https://doi.org/10.1145/3279810.3279841","url":null,"abstract":"Multimodal data is increasingly used in cognitive prediction models to better analyze and predict different user cognitive processes. Classifiers based on such data, however, have different performance characteristics. We discuss in this paper an intervention-free selection task using multimodal data of EEG and eye tracking in three different models. We show that a sequential model, LSTM, is more sensitive but less precise than a static model SVM. Moreover, we introduce a confidence-based Competition-Fusion model using both SVM and LSTM. The fusion model further improves the recall compared to either SVM or LSTM alone, without decreasing precision compared to LSTM. According to the results, we recommend SVM for interactive applications which require minimal false positives (high precision), and recommend LSTM and highly recommend Competition-Fusion Model for applications which handle intervention-free selection requests in an additional post-processing step, requiring higher recall than precision.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leanne M. Hirshfield, T. Williams, Natalie M. Sommer, Trevor Grant, Senem Velipasalar Gursoy
In this work we explore how Augmented Reality annotations can be used as a form of Mixed Reality gesture, how neurophysiological measurements can inform the decision as to whether or not to use such gestures, and whether and how to adapt language when using such gestures. In this paper, we propose a preliminary investigation of how decisions regarding robot-to-human communication modality in mixed reality environments might be made on the basis of humans' perceptual and cognitive states. Specifically, we propose to use brain data acquired with high-density functional near-infrared spectroscopy (fNIRS) to measure the neural correlates of cognitive and emotional states with particular relevance to adaptive human-robot interaction (HRI). In this paper we describe several states of interest that fNIRS is well suited to measure and that have direct implications to HRI adaptations and we leverage a framework developed in our prior work to explore how different neurophysiological measures could inform the selection of different communication strategies. We then describe results from a feasibility experiment where multilabel Convolutional Long Short Term Memory Networks were trained to classify the target mental states of 10 participants and we discuss a research agenda for adaptive human-robot teams based on our findings.
{"title":"Workload-driven modulation of mixed-reality robot-human communication","authors":"Leanne M. Hirshfield, T. Williams, Natalie M. Sommer, Trevor Grant, Senem Velipasalar Gursoy","doi":"10.1145/3279810.3279848","DOIUrl":"https://doi.org/10.1145/3279810.3279848","url":null,"abstract":"In this work we explore how Augmented Reality annotations can be used as a form of Mixed Reality gesture, how neurophysiological measurements can inform the decision as to whether or not to use such gestures, and whether and how to adapt language when using such gestures. In this paper, we propose a preliminary investigation of how decisions regarding robot-to-human communication modality in mixed reality environments might be made on the basis of humans' perceptual and cognitive states. Specifically, we propose to use brain data acquired with high-density functional near-infrared spectroscopy (fNIRS) to measure the neural correlates of cognitive and emotional states with particular relevance to adaptive human-robot interaction (HRI). In this paper we describe several states of interest that fNIRS is well suited to measure and that have direct implications to HRI adaptations and we leverage a framework developed in our prior work to explore how different neurophysiological measures could inform the selection of different communication strategies. We then describe results from a feasibility experiment where multilabel Convolutional Long Short Term Memory Networks were trained to classify the target mental states of 10 participants and we discuss a research agenda for adaptive human-robot teams based on our findings.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131124884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Fedotov, O. Perepelkina, E. Kazimirova, M. Konstantinova, W. Minker
Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of engaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.
{"title":"Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data","authors":"D. Fedotov, O. Perepelkina, E. Kazimirova, M. Konstantinova, W. Minker","doi":"10.1145/3279810.3279842","DOIUrl":"https://doi.org/10.1145/3279810.3279842","url":null,"abstract":"Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of engaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127521199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Guntz, J. Crowley, D. Vaufreydaz, R. Balzarini, Philippe Dessus
In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We report on a pilot experiment with multi-modal observation of human experts engaged in solving challenging problems in Chess. Our results confirm that cognitive processes have observable correlates in displays of emotion and fixation, and that these displays can be used to evaluate models of cognitive processes. They also revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. In this paper, we propose a cognitive model to explain our observations, and describe initial results from a second experiment designed to test this model.
{"title":"The role of emotion in problem solving: first results from observing chess","authors":"Thomas Guntz, J. Crowley, D. Vaufreydaz, R. Balzarini, Philippe Dessus","doi":"10.1145/3279810.3279846","DOIUrl":"https://doi.org/10.1145/3279810.3279846","url":null,"abstract":"In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We report on a pilot experiment with multi-modal observation of human experts engaged in solving challenging problems in Chess. Our results confirm that cognitive processes have observable correlates in displays of emotion and fixation, and that these displays can be used to evaluate models of cognitive processes. They also revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. In this paper, we propose a cognitive model to explain our observations, and describe initial results from a second experiment designed to test this model.","PeriodicalId":326513,"journal":{"name":"Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126466606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}