Despite their ability to complete certain tasks, dialog systems still suffer from poor adaptation to users' engagement and attention. We observe human behaviors in different conversational settings to understand human communication dynamics and then transfer the knowledge to multimodal dialog system design. To focus solely on maintaining engaging conversations, we design and implement a non-task oriented multimodal dialog system, which serves as a framework for controlled multimodal conversation analysis. We design computational methods to model user engagement and attention in real time by leveraging automatically harvested multimodal human behaviors, such as smiles and speech volume. We aim to design and implement a multimodal dialog system to coordinate with users' engagement and attention on the fly via techniques such as adaptive conversational strategies and incremental speech production.
{"title":"Attention and Engagement Aware Multimodal Conversational Systems","authors":"Zhou Yu","doi":"10.1145/2818346.2823309","DOIUrl":"https://doi.org/10.1145/2818346.2823309","url":null,"abstract":"Despite their ability to complete certain tasks, dialog systems still suffer from poor adaptation to users' engagement and attention. We observe human behaviors in different conversational settings to understand human communication dynamics and then transfer the knowledge to multimodal dialog system design. To focus solely on maintaining engaging conversations, we design and implement a non-task oriented multimodal dialog system, which serves as a framework for controlled multimodal conversation analysis. We design computational methods to model user engagement and attention in real time by leveraging automatically harvested multimodal human behaviors, such as smiles and speech volume. We aim to design and implement a multimodal dialog system to coordinate with users' engagement and attention on the fly via techniques such as adaptive conversational strategies and incremental speech production.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"185 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82917580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fred Charles, Florian Pecune, Gabor Aranyi, C. Pelachaud, M. Cavazza
User interaction with Embodied Conversational Agents (ECA) should involve a significant affective component to achieve realism in communication. This aspect has been studied through different frameworks describing the relationship between user and ECA, for instance alignment, rapport and empathy. We conducted an experiment to explore how an ECA's non-verbal expression can be controlled to respond to a single affective dimension generated by users as input. Our system is based on the mapping of a high-level affective dimension, approach/avoidance, onto a new ECA control mechanism in which Action Units (AU) are activated through a neural network. Since 'approach' has been associated to prefrontal cortex activation, we use a measure of prefrontal cortex left-asymmetry through fNIRS as a single input signal representing the user's attitude towards the ECA. We carried out the experiment with 10 subjects, who have been instructed to express a positive mental attitude towards the ECA. In return, the ECA facial expression would reflect the perceived attitude under a neurofeedback paradigm. Our results suggest that users are able to successfully interact with the ECA and perceive its response as consistent and realistic, both in terms of ECA responsiveness and in terms of relevance of facial expressions. From a system perspective, the empirical calibration of the network supports a progressive recruitment of various AUs, which provides a principled description of the ECA response and its intensity. Our findings suggest that complex ECA facial expressions can be successfully aligned with one high-level affective dimension. Furthermore, this use of a single dimension as input could support experiments in the fine-tuning of AU activation or their personalization to user preferred modalities.
{"title":"ECA Control using a Single Affective User Dimension","authors":"Fred Charles, Florian Pecune, Gabor Aranyi, C. Pelachaud, M. Cavazza","doi":"10.1145/2818346.2820730","DOIUrl":"https://doi.org/10.1145/2818346.2820730","url":null,"abstract":"User interaction with Embodied Conversational Agents (ECA) should involve a significant affective component to achieve realism in communication. This aspect has been studied through different frameworks describing the relationship between user and ECA, for instance alignment, rapport and empathy. We conducted an experiment to explore how an ECA's non-verbal expression can be controlled to respond to a single affective dimension generated by users as input. Our system is based on the mapping of a high-level affective dimension, approach/avoidance, onto a new ECA control mechanism in which Action Units (AU) are activated through a neural network. Since 'approach' has been associated to prefrontal cortex activation, we use a measure of prefrontal cortex left-asymmetry through fNIRS as a single input signal representing the user's attitude towards the ECA. We carried out the experiment with 10 subjects, who have been instructed to express a positive mental attitude towards the ECA. In return, the ECA facial expression would reflect the perceived attitude under a neurofeedback paradigm. Our results suggest that users are able to successfully interact with the ECA and perceive its response as consistent and realistic, both in terms of ECA responsiveness and in terms of relevance of facial expressions. From a system perspective, the empirical calibration of the network supports a progressive recruitment of various AUs, which provides a principled description of the ECA response and its intensity. Our findings suggest that complex ECA facial expressions can be successfully aligned with one high-level affective dimension. Furthermore, this use of a single dimension as input could support experiments in the fine-tuning of AU activation or their personalization to user preferred modalities.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87401564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel feature extraction framework from mutli-party multimodal conversation for inference of personality traits and emergent leadership. The proposed framework represents multi modal features as the combination of each participant's nonverbal activity and group activity. This feature representation enables to compare the nonverbal patterns extracted from the participants of different groups in a metric space. It captures how the target member outputs nonverbal behavior observed in a group (e.g. the member speaks while all members move their body), and can be available for any kind of multiparty conversation task. Frequent co-occurrent events are discovered using graph clustering from multimodal sequences. The proposed framework is applied for the ELEA corpus which is an audio visual dataset collected from group meetings. We evaluate the framework for binary classification task of 10 personality traits. Experimental results show that the model trained with co-occurrence features obtained higher accuracy than previously related work in 8 out of 10 traits. In addition, the co-occurrence features improve the accuracy from 2 % up to 17 %.
{"title":"Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery","authors":"S. Okada, O. Aran, D. Gática-Pérez","doi":"10.1145/2818346.2820757","DOIUrl":"https://doi.org/10.1145/2818346.2820757","url":null,"abstract":"This paper proposes a novel feature extraction framework from mutli-party multimodal conversation for inference of personality traits and emergent leadership. The proposed framework represents multi modal features as the combination of each participant's nonverbal activity and group activity. This feature representation enables to compare the nonverbal patterns extracted from the participants of different groups in a metric space. It captures how the target member outputs nonverbal behavior observed in a group (e.g. the member speaks while all members move their body), and can be available for any kind of multiparty conversation task. Frequent co-occurrent events are discovered using graph clustering from multimodal sequences. The proposed framework is applied for the ELEA corpus which is an audio visual dataset collected from group meetings. We evaluate the framework for binary classification task of 10 personality traits. Experimental results show that the model trained with co-occurrence features obtained higher accuracy than previously related work in 8 out of 10 traits. In addition, the co-occurrence features improve the accuracy from 2 % up to 17 %.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87982317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past 10 years we have seen worldwide an immense growth of research and development into companion robots. Those are robots that fulfil particular tasks, but do so in a socially acceptable manner. The companionship aspect reflects the repeated and long-term nature of such interactions, and the potential of people to form relationships with such robots, e.g. as friendly assistants. A number of companion and assistant robots have been entering the market, two of the latest examples are Aldebaran's Pepper robot, or Jibo (Cynthia Breazeal). Companion robots are more and more targeting particular application areas, e.g. as home assistants or therapeutic tools. Research into companion robots needs to address many fundamental research problems concerning perception, cognition, action and learning, but regardless how sophisticated our robotic systems may be, the potential users need to be taken into account from the early stages of development. The talk will emphasize the need for a highly user-centred approach towards design, development and evaluation of companion robots. An important challenge is to evaluate robots in realistic and long-term scenarios, in order to capture as closely as possible those key aspects that will play a role when using such robots in the real world. In order to illustrate these points, my talk will give examples of interaction studies that my research team has been involved in. This includes studies into how people perceive robots' non-verbal cues, creating and evaluating realistic scenarios for home companion robots using narrative framing, and verbal and tactile interaction of children with the therapeutic and social robot Kaspar. The talk will highlight the issues we encountered when we proceeded from laboratory-based experiments and prototypes to real-world applications.
{"title":"Interaction Studies with Social Robots","authors":"K. Dautenhahn","doi":"10.1145/2818346.2818347","DOIUrl":"https://doi.org/10.1145/2818346.2818347","url":null,"abstract":"Over the past 10 years we have seen worldwide an immense growth of research and development into companion robots. Those are robots that fulfil particular tasks, but do so in a socially acceptable manner. The companionship aspect reflects the repeated and long-term nature of such interactions, and the potential of people to form relationships with such robots, e.g. as friendly assistants. A number of companion and assistant robots have been entering the market, two of the latest examples are Aldebaran's Pepper robot, or Jibo (Cynthia Breazeal). Companion robots are more and more targeting particular application areas, e.g. as home assistants or therapeutic tools. Research into companion robots needs to address many fundamental research problems concerning perception, cognition, action and learning, but regardless how sophisticated our robotic systems may be, the potential users need to be taken into account from the early stages of development. The talk will emphasize the need for a highly user-centred approach towards design, development and evaluation of companion robots. An important challenge is to evaluate robots in realistic and long-term scenarios, in order to capture as closely as possible those key aspects that will play a role when using such robots in the real world. In order to illustrate these points, my talk will give examples of interaction studies that my research team has been involved in. This includes studies into how people perceive robots' non-verbal cues, creating and evaluating realistic scenarios for home companion robots using narrative framing, and verbal and tactile interaction of children with the therapeutic and social robot Kaspar. The talk will highlight the issues we encountered when we proceeded from laboratory-based experiments and prototypes to real-world applications.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78326714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Ng, Viet Dung Nguyen, Vassilios Vonikakis, Stefan Winkler
This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.
{"title":"Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning","authors":"Hongwei Ng, Viet Dung Nguyen, Vassilios Vonikakis, Stefan Winkler","doi":"10.1145/2818346.2830593","DOIUrl":"https://doi.org/10.1145/2818346.2830593","url":null,"abstract":"This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88925689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our community has long pursued principles and methods for enabling fluid and effortless collaborations between people and computing systems. Forging deep connections between people and machines has come into focus over the last 25 years as a grand challenge at the intersection of artificial intelligence, human-computer interaction, and cognitive psychology. I will review experiences and directions with leveraging advances in perception, learning, and reasoning in pursuit of our shared dreams.
{"title":"Connections: 2015 ICMI Sustained Accomplishment Award Lecture","authors":"E. Horvitz","doi":"10.1145/2818346.2835500","DOIUrl":"https://doi.org/10.1145/2818346.2835500","url":null,"abstract":"Our community has long pursued principles and methods for enabling fluid and effortless collaborations between people and computing systems. Forging deep connections between people and machines has come into focus over the last 25 years as a grand challenge at the intersection of artificial intelligence, human-computer interaction, and cognitive psychology. I will review experiences and directions with leveraging advances in perception, learning, and reasoning in pursuit of our shared dreams.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89114812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, intelligent agents are expected to be affect-sensitive as agents are becoming essential entities that supports computer-mediated tasks, especially in teaching and training. These agents use common natural modalities-such as facial expressions, gestures and eye gaze in order to recognize a user's affective state and respond accordingly. However, these nonverbal cues may not be universal as emotion recognition and expression differ from culture to culture. It is important that intelligent interfaces are equipped with the abilities to meet the challenge of cultural diversity to facilitate human-machine interaction particularly in Asia. Asians are known to be more passive and possess certain traits such as indirectness and non-confrontationalism, which lead to emotions such as (culture-specific form of) shyness and timidity. Therefore, a model based on other culture may not be applicable in an Asian setting, out-rulling a one-size-fits-all approach. This study is initiated to identify the discriminative markers of culture-specific emotions based on the multimodal interactions.
{"title":"A Computational Model of Culture-Specific Emotion Detection for Artificial Agents in the Learning Domain","authors":"Ganapreeta Renunathan Naidu","doi":"10.1145/2818346.2823307","DOIUrl":"https://doi.org/10.1145/2818346.2823307","url":null,"abstract":"Nowadays, intelligent agents are expected to be affect-sensitive as agents are becoming essential entities that supports computer-mediated tasks, especially in teaching and training. These agents use common natural modalities-such as facial expressions, gestures and eye gaze in order to recognize a user's affective state and respond accordingly. However, these nonverbal cues may not be universal as emotion recognition and expression differ from culture to culture. It is important that intelligent interfaces are equipped with the abilities to meet the challenge of cultural diversity to facilitate human-machine interaction particularly in Asia. Asians are known to be more passive and possess certain traits such as indirectness and non-confrontationalism, which lead to emotions such as (culture-specific form of) shyness and timidity. Therefore, a model based on other culture may not be applicable in an Asian setting, out-rulling a one-size-fits-all approach. This study is initiated to identify the discriminative markers of culture-specific emotions based on the multimodal interactions.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75694221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Domínguez, K. Chiluiza, Vanessa Echeverría, X. Ochoa
The traditional recording of student interaction in classrooms has raised privacy concerns in both students and academics. However, the same students are happy to share their daily lives through social media. Perception of data ownership is the key factor in this paradox. This article proposes the design of a personal Multimodal Recording Device (MRD) that could capture the actions of its owner during lectures. The MRD would be able to capture close-range video, audio, writing, and other environmental signals. Differently from traditional centralized recording systems, students would have control over their own recorded data. They could decide to share their information in exchange of access to the recordings of the instructor, notes form their classmates, and analysis of, for example, their attention performance. By sharing their data, students participate in the co-creation of enhanced and synchronized course notes that will benefit all the participating students. This work presents details about how such a device could be build from available components. This work also discusses and evaluates the design of such device, including its foreseeable costs, scalability, flexibility, intrusiveness and recording quality.
{"title":"Multimodal Selfies: Designing a Multimodal Recording Device for Students in Traditional Classrooms","authors":"Federico Domínguez, K. Chiluiza, Vanessa Echeverría, X. Ochoa","doi":"10.1145/2818346.2830606","DOIUrl":"https://doi.org/10.1145/2818346.2830606","url":null,"abstract":"The traditional recording of student interaction in classrooms has raised privacy concerns in both students and academics. However, the same students are happy to share their daily lives through social media. Perception of data ownership is the key factor in this paradox. This article proposes the design of a personal Multimodal Recording Device (MRD) that could capture the actions of its owner during lectures. The MRD would be able to capture close-range video, audio, writing, and other environmental signals. Differently from traditional centralized recording systems, students would have control over their own recorded data. They could decide to share their information in exchange of access to the recordings of the instructor, notes form their classmates, and analysis of, for example, their attention performance. By sharing their data, students participate in the co-creation of enhanced and synchronized course notes that will benefit all the participating students. This work presents details about how such a device could be build from available components. This work also discusses and evaluates the design of such device, including its foreseeable costs, scalability, flexibility, intrusiveness and recording quality.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75843857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research in learning analytics and educational data mining has recently become prominent in the fields of computer science and education. Most scholars in the field emphasize student learning and student data analytics; however, it is also important to focus on teaching analytics and teacher preparation because of their key roles in student learning, especially in K-12 learning environments. Nonverbal communication strategies play an important role in successful interpersonal communication of teachers with their students. In order to assist novice or practicing teachers with exhibiting open and affirmative nonverbal cues in their classrooms, we have designed a multimodal teaching platform with provisions for online feedback. We used an interactive teaching rehearsal software, TeachLivE, as our basic research environment. TeachLivE employs a digital puppetry paradigm as its core technology. Individuals walk into this virtual environment and interact with virtual students displayed on a large screen. They can practice classroom management, pedagogy and content delivery skills with a teaching plan in the TeachLivE environment. We have designed an experiment to evaluate the impact of an online nonverbal feedback application. In this experiment, different types of multimodal data have been collected during two experimental settings. These data include talk-time and nonverbal behaviors of the virtual students, captured in log files; talk time and full body tracking data of the participant; and video recording of the virtual classroom with the participant. 34 student teachers participated in this 30-minute experiment. In each of the settings, the participants were provided with teaching plans from which they taught. All the participants took part in both of the experimental settings. In order to have a balanced experiment design, half of the participants received nonverbal online feedback in their first session and the other half received this feedback in the second session. A visual indication was used for feedback each time the participant exhibited a closed, defensive posture. Based on recorded full-body tracking data, we observed that only those who received feedback in their first session demonstrated a significant number of open postures in the session containing no feedback. However, the post-questionnaire information indicated that all participants were more mindful of their body postures while teaching after they had participated in the study.
{"title":"Providing Real-time Feedback for Student Teachers in a Virtual Rehearsal Environment","authors":"R. Barmaki, C. Hughes","doi":"10.1145/2818346.2830604","DOIUrl":"https://doi.org/10.1145/2818346.2830604","url":null,"abstract":"Research in learning analytics and educational data mining has recently become prominent in the fields of computer science and education. Most scholars in the field emphasize student learning and student data analytics; however, it is also important to focus on teaching analytics and teacher preparation because of their key roles in student learning, especially in K-12 learning environments. Nonverbal communication strategies play an important role in successful interpersonal communication of teachers with their students. In order to assist novice or practicing teachers with exhibiting open and affirmative nonverbal cues in their classrooms, we have designed a multimodal teaching platform with provisions for online feedback. We used an interactive teaching rehearsal software, TeachLivE, as our basic research environment. TeachLivE employs a digital puppetry paradigm as its core technology. Individuals walk into this virtual environment and interact with virtual students displayed on a large screen. They can practice classroom management, pedagogy and content delivery skills with a teaching plan in the TeachLivE environment. We have designed an experiment to evaluate the impact of an online nonverbal feedback application. In this experiment, different types of multimodal data have been collected during two experimental settings. These data include talk-time and nonverbal behaviors of the virtual students, captured in log files; talk time and full body tracking data of the participant; and video recording of the virtual classroom with the participant. 34 student teachers participated in this 30-minute experiment. In each of the settings, the participants were provided with teaching plans from which they taught. All the participants took part in both of the experimental settings. In order to have a balanced experiment design, half of the participants received nonverbal online feedback in their first session and the other half received this feedback in the second session. A visual indication was used for feedback each time the participant exhibited a closed, defensive posture. Based on recorded full-body tracking data, we observed that only those who received feedback in their first session demonstrated a significant number of open postures in the session containing no feedback. However, the post-questionnaire information indicated that all participants were more mindful of their body postures while teaching after they had participated in the study.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76959158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prevailing social norms prohibit interrupting another person when they are speaking. In this research, simultaneous speech was investigated in groups of students as they jointly solved math problems and peer tutored one another. Analyses were based on the Math Data Corpus, which includes ground-truth performance coding and speech transcriptions. Simultaneous speech was elevated 120-143% during the most productive phase of problem solving, compared with matched intervals. It also was elevated 18-37% in students who were domain experts, compared with non-experts. Qualitative analyses revealed that experts differed from non-experts in the function of their interruptions. Analysis of these functional asymmetries produced nine key behaviors that were used to identify the dominant math expert in a group with 95-100% accuracy in three minutes. This research demonstrates that overlapped speech is a marker of group problem-solving progress and domain expertise. It provides valuable information for the emerging field of learning analytics.
{"title":"Spoken Interruptions Signal Productive Problem Solving and Domain Expertise in Mathematics","authors":"S. Oviatt, Kevin Hang, Jianlong Zhou, Fang Chen","doi":"10.1145/2818346.2820743","DOIUrl":"https://doi.org/10.1145/2818346.2820743","url":null,"abstract":"Prevailing social norms prohibit interrupting another person when they are speaking. In this research, simultaneous speech was investigated in groups of students as they jointly solved math problems and peer tutored one another. Analyses were based on the Math Data Corpus, which includes ground-truth performance coding and speech transcriptions. Simultaneous speech was elevated 120-143% during the most productive phase of problem solving, compared with matched intervals. It also was elevated 18-37% in students who were domain experts, compared with non-experts. Qualitative analyses revealed that experts differed from non-experts in the function of their interruptions. Analysis of these functional asymmetries produced nine key behaviors that were used to identify the dominant math expert in a group with 95-100% accuracy in three minutes. This research demonstrates that overlapped speech is a marker of group problem-solving progress and domain expertise. It provides valuable information for the emerging field of learning analytics.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"29 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79862327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}