Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). Researchers have worked on computational analyses for the early detection of such symptoms and to develop better treatments more than 40 years. This paper collected a Japanese audio-report dataset with score labels related to ASD and SPD through a crowd-sourcing service from the general population. We measured language characteristics with the 2nd edition of the Social Responsiveness Scale (SRS2) and the Schizotypal Personality Questionnaire (SPQ), including an odd speech subscale from SPQ to quantize the FTD symptoms. We investigated the following four research questions through machine-learning-based score predictions: (RQ1) How are schizotypal and autistic measures correlated? (RQ2) What is the most suitable task to elicit FTD symptoms? (RQ3) Does the length of speech affect the elicitation of FTD symptoms? (RQ4) Which features are critical for capturing FTD symptoms? We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly. In terms of the tasks, our result identified the effectiveness of FTD elicitation by the most negative memory. Furthermore, we confirmed that longer speech elicited more FTD symptoms as the increased score prediction performance of an FTD-related subscale odd speech from SPQ. Our ablation study confirmed the importance of function words and both the abstract and temporal features for FTD-related odd speech estimation. In contrast, embedding-based features were effective only in the SRS predictions, and content words were effective only in the SPQ predictions, a result that implies the differences of SPD-like and ASD-like symptoms. Data and programs used in this paper can be found here: https://sites.google.com/view/sagatake/resource.
{"title":"Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders","authors":"Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura","doi":"10.1145/3577190.3614132","DOIUrl":"https://doi.org/10.1145/3577190.3614132","url":null,"abstract":"Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). Researchers have worked on computational analyses for the early detection of such symptoms and to develop better treatments more than 40 years. This paper collected a Japanese audio-report dataset with score labels related to ASD and SPD through a crowd-sourcing service from the general population. We measured language characteristics with the 2nd edition of the Social Responsiveness Scale (SRS2) and the Schizotypal Personality Questionnaire (SPQ), including an odd speech subscale from SPQ to quantize the FTD symptoms. We investigated the following four research questions through machine-learning-based score predictions: (RQ1) How are schizotypal and autistic measures correlated? (RQ2) What is the most suitable task to elicit FTD symptoms? (RQ3) Does the length of speech affect the elicitation of FTD symptoms? (RQ4) Which features are critical for capturing FTD symptoms? We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly. In terms of the tasks, our result identified the effectiveness of FTD elicitation by the most negative memory. Furthermore, we confirmed that longer speech elicited more FTD symptoms as the increased score prediction performance of an FTD-related subscale odd speech from SPQ. Our ablation study confirmed the importance of function words and both the abstract and temporal features for FTD-related odd speech estimation. In contrast, embedding-based features were effective only in the SRS predictions, and content words were effective only in the SPQ predictions, a result that implies the differences of SPD-like and ASD-like symptoms. Data and programs used in this paper can be found here: https://sites.google.com/view/sagatake/resource.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack Fitzgerald, Ethan Seefried, James E Yost, Sangmi Pallickara, Nathaniel Blanchard
Predicting where wildfires will spread provides invaluable information to firefighters and scientists, which can save lives and homes. However, doing so requires a large amount of multimodal data e.g., accurate weather predictions, real-time satellite data, and environmental descriptors. In this work, we utilize 12 distinct features from multiple modalities in order to predict where wildfires will spread over the next 24 hours. We created a custom U-Net architecture designed to train as efficiently as possible, while still maximizing accuracy, to facilitate quickly deploying the model when a wildfire is detected. Our custom architecture demonstrates state-of-the-art performance and trains an order of magnitude more quickly than prior work, while using fewer computational resources. We further evaluated our architecture with an ablation study to identify which features were key for prediction and which provided negligible impact on performance. All of our source code is available on GitHub1.
{"title":"Paying Attention to Wildfire: Using U-Net with Attention Blocks on Multimodal Data for Next Day Prediction","authors":"Jack Fitzgerald, Ethan Seefried, James E Yost, Sangmi Pallickara, Nathaniel Blanchard","doi":"10.1145/3577190.3614116","DOIUrl":"https://doi.org/10.1145/3577190.3614116","url":null,"abstract":"Predicting where wildfires will spread provides invaluable information to firefighters and scientists, which can save lives and homes. However, doing so requires a large amount of multimodal data e.g., accurate weather predictions, real-time satellite data, and environmental descriptors. In this work, we utilize 12 distinct features from multiple modalities in order to predict where wildfires will spread over the next 24 hours. We created a custom U-Net architecture designed to train as efficiently as possible, while still maximizing accuracy, to facilitate quickly deploying the model when a wildfire is detected. Our custom architecture demonstrates state-of-the-art performance and trains an order of magnitude more quickly than prior work, while using fewer computational resources. We further evaluated our architecture with an ablation study to identify which features were key for prediction and which provided negligible impact on performance. All of our source code is available on GitHub1.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernd Dudzik, Tiffany Matej Hrkalovic, Dennis Küster, David St-Onge, Felix Putze, Laurence Devillers
The ability to automatically infer relevant aspects of human users’ thoughts and feelings is crucial for technologies to intelligently adapt their behaviors in complex interactions. Research on multimodal analysis has demonstrated the potential of technology to provide such estimates for a broad range of internal states and processes. However, constructing robust approaches for deployment in real-world applications remains an open problem. The MSECP-Wild workshop series is a multidisciplinary forum to present and discuss research addressing this challenge. Submissions to this 5th iteration span efforts relevant to multimodal data collection, modeling, and applications. In addition, our workshop program builds on discussions emerging in previous iterations, highlighting ethical considerations when building and deploying technology modeling internal states in the wild. For this purpose, we host a range of relevant keynote speakers and interactive activities.
{"title":"The 5th Workshop on Modeling Socio-Emotional and Cognitive Processes from Multimodal Data in the Wild (MSECP-Wild)","authors":"Bernd Dudzik, Tiffany Matej Hrkalovic, Dennis Küster, David St-Onge, Felix Putze, Laurence Devillers","doi":"10.1145/3577190.3616883","DOIUrl":"https://doi.org/10.1145/3577190.3616883","url":null,"abstract":"The ability to automatically infer relevant aspects of human users’ thoughts and feelings is crucial for technologies to intelligently adapt their behaviors in complex interactions. Research on multimodal analysis has demonstrated the potential of technology to provide such estimates for a broad range of internal states and processes. However, constructing robust approaches for deployment in real-world applications remains an open problem. The MSECP-Wild workshop series is a multidisciplinary forum to present and discuss research addressing this challenge. Submissions to this 5th iteration span efforts relevant to multimodal data collection, modeling, and applications. In addition, our workshop program builds on discussions emerging in previous iterations, highlighting ethical considerations when building and deploying technology modeling internal states in the wild. For this purpose, we host a range of relevant keynote speakers and interactive activities.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.
{"title":"A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care","authors":"Maja Mataric","doi":"10.1145/3577190.3616524","DOIUrl":"https://doi.org/10.1145/3577190.3616524","url":null,"abstract":"As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Windle, Iain Matthews, Ben Milner, Sarah Taylor
This paper describes our entry to the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. This year’s challenge focuses on generating gestures in a dyadic setting – predicting a main-agent’s motion from the speech of both the main-agent and an interlocutor. We adapt a Transformer-XL architecture for this task by adding a cross-attention module that integrates the interlocutor’s speech with that of the main-agent. Our model is conditioned on speech audio (encoded using PASE+), text (encoded using FastText) and a speaker identity label, and is able to generate smooth and speech appropriate gestures for a given identity. We consider the GENEA Challenge user study results and present a discussion of our model strengths and where improvements can be made.
{"title":"The UEA Digital Humans entry to the GENEA Challenge 2023","authors":"Jonathan Windle, Iain Matthews, Ben Milner, Sarah Taylor","doi":"10.1145/3577190.3616116","DOIUrl":"https://doi.org/10.1145/3577190.3616116","url":null,"abstract":"This paper describes our entry to the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. This year’s challenge focuses on generating gestures in a dyadic setting – predicting a main-agent’s motion from the speech of both the main-agent and an interlocutor. We adapt a Transformer-XL architecture for this task by adding a cross-attention module that integrates the interlocutor’s speech with that of the main-agent. Our model is conditioned on speech audio (encoded using PASE+), text (encoded using FastText) and a speaker identity label, and is able to generate smooth and speech appropriate gestures for a given identity. We consider the GENEA Challenge user study results and present a discussion of our model strengths and where improvements can be made.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing research has shown the potential of classifying Alzheimer's Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.
{"title":"Classification of Alzheimer's Disease with Deep Learning on Eye-tracking Data","authors":"Sriram, Harshinee, Conati, Cristina, Field, Thalia","doi":"10.1145/3577190.3614149","DOIUrl":"https://doi.org/10.1145/3577190.3614149","url":null,"abstract":"Existing research has shown the potential of classifying Alzheimer's Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Surgery, typically seen as the surgeon’s sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants’ voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team’s capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.
{"title":"Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing","authors":"Arnaud Allemang--Trivalle","doi":"10.1145/3577190.3614233","DOIUrl":"https://doi.org/10.1145/3577190.3614233","url":null,"abstract":"Surgery, typically seen as the surgeon’s sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants’ voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team’s capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka
A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.
{"title":"Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues","authors":"Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka","doi":"10.1145/3577190.3614152","DOIUrl":"https://doi.org/10.1145/3577190.3614152","url":null,"abstract":"A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kayla Matheus, Ellie Mamantov, Marynel Vázquez, Brian Scassellati
Social robots are in a unique position to aid mental health by supporting engagement with behavioral interventions. One such behavioral intervention is the practice of deep breathing, which has been shown to physiologically reduce symptoms of anxiety. Multiple robots have been recently developed that support deep breathing, but none yet implement a method to detect how accurately an individual is performing the practice. Detecting breathing phases (i.e., inhaling, breath holding, or exhaling) is a challenge with these robots since often the robot is being manipulated or moved by the user, or the robot itself is moving to generate haptic feedback. Accordingly, we first present OMMDB: a novel, multimodal, public dataset made up of individuals performing deep breathing with an Ommie robot in multiple conditions of robot ego-motion. The dataset includes RGB video, inertial sensor data, and motor encoder data, as well as ground truth breathing data from a respiration belt. Our second contribution features experimental results with a convolutional long-short term memory neural network trained using OMMDB. These results show the system’s ability to be applied to the domain of deep breathing and generalize between individual users. We additionally show that our model is able to generalize across multiple types of robot ego-motion, reducing the need to train individual models for varying human-robot interaction conditions.
{"title":"Deep Breathing Phase Classification with a Social Robot for Mental Health","authors":"Kayla Matheus, Ellie Mamantov, Marynel Vázquez, Brian Scassellati","doi":"10.1145/3577190.3614173","DOIUrl":"https://doi.org/10.1145/3577190.3614173","url":null,"abstract":"Social robots are in a unique position to aid mental health by supporting engagement with behavioral interventions. One such behavioral intervention is the practice of deep breathing, which has been shown to physiologically reduce symptoms of anxiety. Multiple robots have been recently developed that support deep breathing, but none yet implement a method to detect how accurately an individual is performing the practice. Detecting breathing phases (i.e., inhaling, breath holding, or exhaling) is a challenge with these robots since often the robot is being manipulated or moved by the user, or the robot itself is moving to generate haptic feedback. Accordingly, we first present OMMDB: a novel, multimodal, public dataset made up of individuals performing deep breathing with an Ommie robot in multiple conditions of robot ego-motion. The dataset includes RGB video, inertial sensor data, and motor encoder data, as well as ground truth breathing data from a respiration belt. Our second contribution features experimental results with a convolutional long-short term memory neural network trained using OMMDB. These results show the system’s ability to be applied to the domain of deep breathing and generalize between individual users. We additionally show that our model is able to generalize across multiple types of robot ego-motion, reducing the need to train individual models for varying human-robot interaction conditions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. S. Rajshekar Reddy, Lucca Eloy, Rachel Dickler, Jason G. Reitman, Samuel L. Pugh, Peter W. Foltz, Jamie C. Gorman, Julie L. Harrison, Leanne Hirshfield
Joint Visual Attention (JVA) has long been considered a critical component of successful collaborations, enabling coordination and construction of a shared knowledge space. However, recent studies challenge the notion that JVA alone ensures effective collaboration. To gain deeper insights into JVA’s influence, we examine nonlinear gaze coupling and gaze regularity in the collaborators’ visual attention. Specifically, we analyze gaze data from 19 dyadic and triadic teams engaged in a co-located programming task using Recurrence Quantification Analysis (RQA). Our results emphasize the significance of team-level gaze regularity for improving task performance - highlighting the importance of maintaining stable or sustained episodes of joint or individual attention, than disjointed patterns. Additionally, through regression analyses, we examine the predictive capacity of recurrence metrics for subjective traits such as social cohesion and social loafing, revealing unique interpersonal and team dynamics behind productive collaborations. We elaborate on our findings via qualitative anecdotes and discuss their implications in shaping real-time interventions for optimizing collaborative success.
{"title":"Synerg-eye-zing: Decoding Nonlinear Gaze Dynamics Underlying Successful Collaborations in Co-located Teams","authors":"G. S. Rajshekar Reddy, Lucca Eloy, Rachel Dickler, Jason G. Reitman, Samuel L. Pugh, Peter W. Foltz, Jamie C. Gorman, Julie L. Harrison, Leanne Hirshfield","doi":"10.1145/3577190.3614104","DOIUrl":"https://doi.org/10.1145/3577190.3614104","url":null,"abstract":"Joint Visual Attention (JVA) has long been considered a critical component of successful collaborations, enabling coordination and construction of a shared knowledge space. However, recent studies challenge the notion that JVA alone ensures effective collaboration. To gain deeper insights into JVA’s influence, we examine nonlinear gaze coupling and gaze regularity in the collaborators’ visual attention. Specifically, we analyze gaze data from 19 dyadic and triadic teams engaged in a co-located programming task using Recurrence Quantification Analysis (RQA). Our results emphasize the significance of team-level gaze regularity for improving task performance - highlighting the importance of maintaining stable or sustained episodes of joint or individual attention, than disjointed patterns. Additionally, through regression analyses, we examine the predictive capacity of recurrence metrics for subjective traits such as social cohesion and social loafing, revealing unique interpersonal and team dynamics behind productive collaborations. We elaborate on our findings via qualitative anecdotes and discuss their implications in shaping real-time interventions for optimizing collaborative success.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}