首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders 精神分裂症、自闭症及形式思维障碍患者语言特征的计算分析
Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura
Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). Researchers have worked on computational analyses for the early detection of such symptoms and to develop better treatments more than 40 years. This paper collected a Japanese audio-report dataset with score labels related to ASD and SPD through a crowd-sourcing service from the general population. We measured language characteristics with the 2nd edition of the Social Responsiveness Scale (SRS2) and the Schizotypal Personality Questionnaire (SPQ), including an odd speech subscale from SPQ to quantize the FTD symptoms. We investigated the following four research questions through machine-learning-based score predictions: (RQ1) How are schizotypal and autistic measures correlated? (RQ2) What is the most suitable task to elicit FTD symptoms? (RQ3) Does the length of speech affect the elicitation of FTD symptoms? (RQ4) Which features are critical for capturing FTD symptoms? We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly. In terms of the tasks, our result identified the effectiveness of FTD elicitation by the most negative memory. Furthermore, we confirmed that longer speech elicited more FTD symptoms as the increased score prediction performance of an FTD-related subscale odd speech from SPQ. Our ablation study confirmed the importance of function words and both the abstract and temporal features for FTD-related odd speech estimation. In contrast, embedding-based features were effective only in the SRS predictions, and content words were effective only in the SPQ predictions, a result that implies the differences of SPD-like and ASD-like symptoms. Data and programs used in this paper can be found here: https://sites.google.com/view/sagatake/resource.
形式思维障碍(FTD)是一组影响语言和思维的认知症状,可以通过语言来观察。FTD见于发育性或精神障碍,如自闭症谱系障碍(ASD)或精神分裂症,及其相关的分裂型人格障碍(SPD)。40多年来,研究人员一直致力于通过计算分析来早期发现这些症状,并开发出更好的治疗方法。本文通过大众众包服务收集了一个日本音频报告数据集,其中包含与ASD和SPD相关的分数标签。我们使用第二版社会反应量表(SRS2)和分裂型人格问卷(SPQ)测量语言特征,包括SPQ中的奇数言语子量表来量化FTD症状。我们通过基于机器学习的分数预测研究了以下四个研究问题:(RQ1)分裂型和自闭症的测量是如何相关的?(RQ2)什么是最适合引发FTD症状的任务?(RQ3)言语长度是否影响FTD症状的引发?(RQ4)哪些特性对于捕捉FTD症状至关重要?我们证实了ftd相关的子量表,奇怪的言语,与总SPQ和SRS得分显著相关,尽管它们本身没有显著相关。在任务方面,我们的结果确定了最负性记忆诱发FTD的有效性。此外,我们证实,较长的言语引发更多的FTD症状,因为来自SPQ的FTD相关亚量表奇数言语的得分预测性能提高。我们的消融研究证实了虚词及其抽象和时间特征对ftd相关奇语估计的重要性。相比之下,基于嵌入的特征仅在SRS预测中有效,而内容词仅在SPQ预测中有效,这一结果暗示了spd样症状和asd样症状的差异。本文中使用的数据和程序可以在这里找到:https://sites.google.com/view/sagatake/resource。
{"title":"Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders","authors":"Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura","doi":"10.1145/3577190.3614132","DOIUrl":"https://doi.org/10.1145/3577190.3614132","url":null,"abstract":"Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). Researchers have worked on computational analyses for the early detection of such symptoms and to develop better treatments more than 40 years. This paper collected a Japanese audio-report dataset with score labels related to ASD and SPD through a crowd-sourcing service from the general population. We measured language characteristics with the 2nd edition of the Social Responsiveness Scale (SRS2) and the Schizotypal Personality Questionnaire (SPQ), including an odd speech subscale from SPQ to quantize the FTD symptoms. We investigated the following four research questions through machine-learning-based score predictions: (RQ1) How are schizotypal and autistic measures correlated? (RQ2) What is the most suitable task to elicit FTD symptoms? (RQ3) Does the length of speech affect the elicitation of FTD symptoms? (RQ4) Which features are critical for capturing FTD symptoms? We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly. In terms of the tasks, our result identified the effectiveness of FTD elicitation by the most negative memory. Furthermore, we confirmed that longer speech elicited more FTD symptoms as the increased score prediction performance of an FTD-related subscale odd speech from SPQ. Our ablation study confirmed the importance of function words and both the abstract and temporal features for FTD-related odd speech estimation. In contrast, embedding-based features were effective only in the SRS predictions, and content words were effective only in the SPQ predictions, a result that implies the differences of SPD-like and ASD-like symptoms. Data and programs used in this paper can be found here: https://sites.google.com/view/sagatake/resource.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Paying Attention to Wildfire: Using U-Net with Attention Blocks on Multimodal Data for Next Day Prediction 关注野火:在多模态数据上使用带注意块的U-Net进行次日预测
Jack Fitzgerald, Ethan Seefried, James E Yost, Sangmi Pallickara, Nathaniel Blanchard
Predicting where wildfires will spread provides invaluable information to firefighters and scientists, which can save lives and homes. However, doing so requires a large amount of multimodal data e.g., accurate weather predictions, real-time satellite data, and environmental descriptors. In this work, we utilize 12 distinct features from multiple modalities in order to predict where wildfires will spread over the next 24 hours. We created a custom U-Net architecture designed to train as efficiently as possible, while still maximizing accuracy, to facilitate quickly deploying the model when a wildfire is detected. Our custom architecture demonstrates state-of-the-art performance and trains an order of magnitude more quickly than prior work, while using fewer computational resources. We further evaluated our architecture with an ablation study to identify which features were key for prediction and which provided negligible impact on performance. All of our source code is available on GitHub1.
预测野火将在哪里蔓延,为消防员和科学家提供了宝贵的信息,可以挽救生命和家园。然而,这样做需要大量的多模式数据,如准确的天气预报、实时卫星数据和环境描述符。在这项工作中,我们利用来自多种模式的12个不同特征来预测未来24小时内野火将在哪里蔓延。我们创建了一个定制的U-Net架构,旨在尽可能高效地训练,同时最大限度地提高准确性,以便在检测到野火时快速部署模型。我们的定制架构展示了最先进的性能,并且在使用更少的计算资源的同时,比以前的工作更快地训练了一个数量级。我们通过消融研究进一步评估了我们的架构,以确定哪些特征是预测的关键,哪些对性能的影响可以忽略不计。我们所有的源代码都可以在GitHub1上获得。
{"title":"Paying Attention to Wildfire: Using U-Net with Attention Blocks on Multimodal Data for Next Day Prediction","authors":"Jack Fitzgerald, Ethan Seefried, James E Yost, Sangmi Pallickara, Nathaniel Blanchard","doi":"10.1145/3577190.3614116","DOIUrl":"https://doi.org/10.1145/3577190.3614116","url":null,"abstract":"Predicting where wildfires will spread provides invaluable information to firefighters and scientists, which can save lives and homes. However, doing so requires a large amount of multimodal data e.g., accurate weather predictions, real-time satellite data, and environmental descriptors. In this work, we utilize 12 distinct features from multiple modalities in order to predict where wildfires will spread over the next 24 hours. We created a custom U-Net architecture designed to train as efficiently as possible, while still maximizing accuracy, to facilitate quickly deploying the model when a wildfire is detected. Our custom architecture demonstrates state-of-the-art performance and trains an order of magnitude more quickly than prior work, while using fewer computational resources. We further evaluated our architecture with an ablation study to identify which features were key for prediction and which provided negligible impact on performance. All of our source code is available on GitHub1.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 5th Workshop on Modeling Socio-Emotional and Cognitive Processes from Multimodal Data in the Wild (MSECP-Wild) 第五届野外多模态数据建模社会情绪和认知过程研讨会(MSECP-Wild)
Bernd Dudzik, Tiffany Matej Hrkalovic, Dennis Küster, David St-Onge, Felix Putze, Laurence Devillers
The ability to automatically infer relevant aspects of human users’ thoughts and feelings is crucial for technologies to intelligently adapt their behaviors in complex interactions. Research on multimodal analysis has demonstrated the potential of technology to provide such estimates for a broad range of internal states and processes. However, constructing robust approaches for deployment in real-world applications remains an open problem. The MSECP-Wild workshop series is a multidisciplinary forum to present and discuss research addressing this challenge. Submissions to this 5th iteration span efforts relevant to multimodal data collection, modeling, and applications. In addition, our workshop program builds on discussions emerging in previous iterations, highlighting ethical considerations when building and deploying technology modeling internal states in the wild. For this purpose, we host a range of relevant keynote speakers and interactive activities.
自动推断人类用户思想和感受的相关方面的能力对于在复杂交互中智能地调整他们的行为的技术至关重要。对多模态分析的研究已经证明了为广泛的内部状态和过程提供这种估计的技术潜力。然而,在实际应用程序中构建健壮的部署方法仍然是一个有待解决的问题。MSECP-Wild系列研讨会是一个多学科论坛,旨在介绍和讨论应对这一挑战的研究。提交到第五次迭代的内容涵盖了与多模态数据收集、建模和应用程序相关的工作。此外,我们的研讨会计划建立在先前迭代中出现的讨论的基础上,强调了在构建和部署技术建模野外内部状态时的道德考虑。为此,我们举办了一系列相关的主题演讲和互动活动。
{"title":"The 5th Workshop on Modeling Socio-Emotional and Cognitive Processes from Multimodal Data in the Wild (MSECP-Wild)","authors":"Bernd Dudzik, Tiffany Matej Hrkalovic, Dennis Küster, David St-Onge, Felix Putze, Laurence Devillers","doi":"10.1145/3577190.3616883","DOIUrl":"https://doi.org/10.1145/3577190.3616883","url":null,"abstract":"The ability to automatically infer relevant aspects of human users’ thoughts and feelings is crucial for technologies to intelligently adapt their behaviors in complex interactions. Research on multimodal analysis has demonstrated the potential of technology to provide such estimates for a broad range of internal states and processes. However, constructing robust approaches for deployment in real-world applications remains an open problem. The MSECP-Wild workshop series is a multidisciplinary forum to present and discuss research addressing this challenge. Submissions to this 5th iteration span efforts relevant to multimodal data collection, modeling, and applications. In addition, our workshop program builds on discussions emerging in previous iterations, highlighting ethical considerations when building and deploying technology modeling internal states in the wild. For this purpose, we host a range of relevant keynote speakers and interactive activities.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care 一个适合你的机器人:多模式个性化人机交互和未来的工作和护理
Maja Mataric
As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.
随着人工智能变得无处不在,它的物理化身——机器人——也将逐渐进入我们的生活。当它们这样做的时候,我们会要求它们理解我们,预测我们的需求和欲望,并在我们情绪和思想的变化、学习、成长和衰老时适应我们。最近机器学习在机器感知、导航和自然语言处理方面的重大进展所创造的联系,使人类与机器人在现实环境中的互动成为可能,就像对人类服务的需求不断增长一样,从老年人护理到护理,再到教育和培训。本次演讲将讨论我们在社会辅助机器人(SAR)方面的研究,它使用具体化的社会互动来支持用户在健康、保健、培训和教育方面的目标。SAR将用于用户建模的机器学习、多模态行为信号处理和情感计算结合在一起,使机器人能够理解、交互并适应用户的特定和不断变化的需求。演讲将涵盖使用多模态交互数据和表达机器人行为来监控、指导、激励和支持各种用户群体和用例的方法和挑战。我们将涵盖与不同年龄范围(婴儿、儿童、成人、老年人)、能力范围(通常是发育、自闭症、中风、阿尔茨海默氏症)、环境(学校、治疗中心、家庭)、部署持续时间(长达6个月)以及商业影响的用户一起工作的见解。
{"title":"A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care","authors":"Maja Mataric","doi":"10.1145/3577190.3616524","DOIUrl":"https://doi.org/10.1145/3577190.3616524","url":null,"abstract":"As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The UEA Digital Humans entry to the GENEA Challenge 2023 东安格利亚大学数字人类参加2023年GENEA挑战
Jonathan Windle, Iain Matthews, Ben Milner, Sarah Taylor
This paper describes our entry to the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. This year’s challenge focuses on generating gestures in a dyadic setting – predicting a main-agent’s motion from the speech of both the main-agent and an interlocutor. We adapt a Transformer-XL architecture for this task by adding a cross-attention module that integrates the interlocutor’s speech with that of the main-agent. Our model is conditioned on speech audio (encoded using PASE+), text (encoded using FastText) and a speaker identity label, and is able to generate smooth and speech appropriate gestures for a given identity. We consider the GENEA Challenge user study results and present a discussion of our model strengths and where improvements can be made.
本文描述了我们进入GENEA(体现代理的非语言行为的生成和评估)挑战2023。今年的挑战重点是在二元环境中生成手势——从主体和对话者的讲话中预测主体的动作。我们为这个任务调整了一个Transformer-XL架构,添加了一个跨注意力模块,该模块集成了对话者和主代理的演讲。我们的模型以语音音频(使用PASE+编码)、文本(使用FastText编码)和说话者身份标签为条件,并且能够为给定的身份生成流畅且适合语音的手势。我们考虑了GENEA挑战用户研究结果,并讨论了我们的模型优势和可以改进的地方。
{"title":"The UEA Digital Humans entry to the GENEA Challenge 2023","authors":"Jonathan Windle, Iain Matthews, Ben Milner, Sarah Taylor","doi":"10.1145/3577190.3616116","DOIUrl":"https://doi.org/10.1145/3577190.3616116","url":null,"abstract":"This paper describes our entry to the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. This year’s challenge focuses on generating gestures in a dyadic setting – predicting a main-agent’s motion from the speech of both the main-agent and an interlocutor. We adapt a Transformer-XL architecture for this task by adding a cross-attention module that integrates the interlocutor’s speech with that of the main-agent. Our model is conditioned on speech audio (encoded using PASE+), text (encoded using FastText) and a speaker identity label, and is able to generate smooth and speech appropriate gestures for a given identity. We consider the GENEA Challenge user study results and present a discussion of our model strengths and where improvements can be made.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Classification of Alzheimer's Disease with Deep Learning on Eye-tracking Data 基于眼动数据的深度学习阿尔茨海默病分类
Sriram, Harshinee, Conati, Cristina, Field, Thalia
Existing research has shown the potential of classifying Alzheimer's Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.
现有的研究表明,使用依赖于特定任务的工程特征的分类器,可以从眼动追踪(ET)数据中对阿尔茨海默病(AD)进行分类。在本文中,我们研究了是否可以通过在原始ET数据上使用端到端训练的深度学习分类器来改进现有结果。该分类器(VTNet)并行使用GRU和CNN来利用ET数据的视觉(V)和时间(T)表示,以前用于在处理视觉显示时检测用户混淆。将VTNet应用于我们的目标AD分类任务的一个主要挑战是,可用的ET数据序列比之前的混淆检测任务中使用的数据序列要长得多,这推动了基于lstm的模型可管理的极限。我们讨论了如何应对这一挑战,并表明VTNet在AD分类方面优于最先进的方法,为该模型的通用性提供了令人鼓舞的证据,可以从ET数据中进行预测。
{"title":"Classification of Alzheimer's Disease with Deep Learning on Eye-tracking Data","authors":"Sriram, Harshinee, Conati, Cristina, Field, Thalia","doi":"10.1145/3577190.3614149","DOIUrl":"https://doi.org/10.1145/3577190.3614149","url":null,"abstract":"Existing research has shown the potential of classifying Alzheimer's Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing 通过多模态传感增强外科团队协作和态势感知
Arnaud Allemang--Trivalle
Surgery, typically seen as the surgeon’s sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants’ voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team’s capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.
外科手术通常被视为外科医生的唯一责任,需要一个更广阔的视角,认识到其他手术室人员的重要作用。团队成员之间的互动对于提供高质量的护理至关重要,并取决于共同的情况意识。我提出了一个两阶段的方法来设计和评估一个多模式平台,以监测手术室成员,提供对外科手术的见解。第一阶段的重点是设计一个数据收集平台,根据手术限制量身定制,通过同步记录参与者的声音、位置、方向、心电图和呼吸信号,产生新的协作和态势感知指标。第二阶段涉及创建直观的仪表板和可视化,帮助外科医生回顾手术记录,识别不良事件并有助于采取主动措施。这项工作旨在展示一种创新的数据收集和分析方法,增强外科团队的能力。多模式平台有可能加强协作,促进情况意识,并最终减轻手术不良事件。这项研究为手术室的变革奠定了基础,使人们能够更全面、更包容地认识到手术是一项团队工作。
{"title":"Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing","authors":"Arnaud Allemang--Trivalle","doi":"10.1145/3577190.3614233","DOIUrl":"https://doi.org/10.1145/3577190.3614233","url":null,"abstract":"Surgery, typically seen as the surgeon’s sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants’ voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team’s capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues 从多模态非语言线索分析和识别对话者凝视功能
Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka
A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.
提出了一种分析和识别群体对话中凝视功能的新框架。考虑到注视功能的多样性和模糊性,我们首先定义了43种非排他性注视功能,这些功能在对话中起着重要作用,如监测、调节和表达。在此基础上,本文构建了功能性凝视语料库,并通过语料库分析揭示了说话时的称呼、思考和听者的参与等功能。接下来,针对10个最常见的功能,我们构建卷积神经网络(cnn)来识别来自多模态输入的基于帧的存在/不存在的每个凝视功能,包括头部姿势、话语状态、凝视/回避状态、眼球方向和面部表情。通过对不同输入集的比较,我们的实验证实,使用所有模态输入的CNN在边听边看方面达到了最好的性能,F值为0.839。
{"title":"Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues","authors":"Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka","doi":"10.1145/3577190.3614152","DOIUrl":"https://doi.org/10.1145/3577190.3614152","url":null,"abstract":"A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Breathing Phase Classification with a Social Robot for Mental Health 基于心理健康社交机器人的深呼吸阶段分类
Kayla Matheus, Ellie Mamantov, Marynel Vázquez, Brian Scassellati
Social robots are in a unique position to aid mental health by supporting engagement with behavioral interventions. One such behavioral intervention is the practice of deep breathing, which has been shown to physiologically reduce symptoms of anxiety. Multiple robots have been recently developed that support deep breathing, but none yet implement a method to detect how accurately an individual is performing the practice. Detecting breathing phases (i.e., inhaling, breath holding, or exhaling) is a challenge with these robots since often the robot is being manipulated or moved by the user, or the robot itself is moving to generate haptic feedback. Accordingly, we first present OMMDB: a novel, multimodal, public dataset made up of individuals performing deep breathing with an Ommie robot in multiple conditions of robot ego-motion. The dataset includes RGB video, inertial sensor data, and motor encoder data, as well as ground truth breathing data from a respiration belt. Our second contribution features experimental results with a convolutional long-short term memory neural network trained using OMMDB. These results show the system’s ability to be applied to the domain of deep breathing and generalize between individual users. We additionally show that our model is able to generalize across multiple types of robot ego-motion, reducing the need to train individual models for varying human-robot interaction conditions.
社交机器人通过支持参与行为干预,在帮助心理健康方面处于独特的地位。其中一种行为干预是练习深呼吸,这已被证明可以从生理上减轻焦虑症状。最近已经开发出多种支持深呼吸的机器人,但还没有一种方法可以检测个人练习的准确性。检测呼吸阶段(即吸气、屏气或呼气)对这些机器人来说是一个挑战,因为机器人经常被用户操纵或移动,或者机器人本身正在移动以产生触觉反馈。因此,我们首先提出了OMMDB:一个新颖的、多模态的公共数据集,由在机器人自我运动的多种条件下使用Ommie机器人进行深呼吸的个体组成。该数据集包括RGB视频、惯性传感器数据和电机编码器数据,以及来自呼吸带的地面真实呼吸数据。我们的第二个贡献是使用OMMDB训练的卷积长短期记忆神经网络的实验结果。这些结果表明,该系统可以应用于深呼吸领域,并在个体用户之间进行推广。我们还表明,我们的模型能够推广到多种类型的机器人自我运动,减少了为不同的人机交互条件训练单个模型的需要。
{"title":"Deep Breathing Phase Classification with a Social Robot for Mental Health","authors":"Kayla Matheus, Ellie Mamantov, Marynel Vázquez, Brian Scassellati","doi":"10.1145/3577190.3614173","DOIUrl":"https://doi.org/10.1145/3577190.3614173","url":null,"abstract":"Social robots are in a unique position to aid mental health by supporting engagement with behavioral interventions. One such behavioral intervention is the practice of deep breathing, which has been shown to physiologically reduce symptoms of anxiety. Multiple robots have been recently developed that support deep breathing, but none yet implement a method to detect how accurately an individual is performing the practice. Detecting breathing phases (i.e., inhaling, breath holding, or exhaling) is a challenge with these robots since often the robot is being manipulated or moved by the user, or the robot itself is moving to generate haptic feedback. Accordingly, we first present OMMDB: a novel, multimodal, public dataset made up of individuals performing deep breathing with an Ommie robot in multiple conditions of robot ego-motion. The dataset includes RGB video, inertial sensor data, and motor encoder data, as well as ground truth breathing data from a respiration belt. Our second contribution features experimental results with a convolutional long-short term memory neural network trained using OMMDB. These results show the system’s ability to be applied to the domain of deep breathing and generalize between individual users. We additionally show that our model is able to generalize across multiple types of robot ego-motion, reducing the need to train individual models for varying human-robot interaction conditions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synerg-eye-zing: Decoding Nonlinear Gaze Dynamics Underlying Successful Collaborations in Co-located Teams 协同注视:解码同地团队成功合作的非线性注视动力学
G. S. Rajshekar Reddy, Lucca Eloy, Rachel Dickler, Jason G. Reitman, Samuel L. Pugh, Peter W. Foltz, Jamie C. Gorman, Julie L. Harrison, Leanne Hirshfield
Joint Visual Attention (JVA) has long been considered a critical component of successful collaborations, enabling coordination and construction of a shared knowledge space. However, recent studies challenge the notion that JVA alone ensures effective collaboration. To gain deeper insights into JVA’s influence, we examine nonlinear gaze coupling and gaze regularity in the collaborators’ visual attention. Specifically, we analyze gaze data from 19 dyadic and triadic teams engaged in a co-located programming task using Recurrence Quantification Analysis (RQA). Our results emphasize the significance of team-level gaze regularity for improving task performance - highlighting the importance of maintaining stable or sustained episodes of joint or individual attention, than disjointed patterns. Additionally, through regression analyses, we examine the predictive capacity of recurrence metrics for subjective traits such as social cohesion and social loafing, revealing unique interpersonal and team dynamics behind productive collaborations. We elaborate on our findings via qualitative anecdotes and discuss their implications in shaping real-time interventions for optimizing collaborative success.
长期以来,联合视觉注意(JVA)一直被认为是成功协作的关键组成部分,它使共享知识空间的协调和构建成为可能。然而,最近的研究挑战了JVA单独确保有效协作的概念。为了更深入地了解JVA的影响,我们研究了合作者视觉注意中的非线性凝视耦合和凝视规律。具体来说,我们使用递归量化分析(RQA)分析了19个参与同一地点编程任务的二元和三元团队的凝视数据。我们的研究结果强调了团队层面的凝视规律对提高任务表现的重要性——强调了保持稳定或持续的联合或个人注意力的重要性,而不是不连贯的模式。此外,通过回归分析,我们研究了重现度量对主观特征(如社会凝聚力和社会懒惰)的预测能力,揭示了高效合作背后独特的人际关系和团队动态。我们通过定性轶事详细阐述了我们的发现,并讨论了它们在塑造实时干预措施以优化合作成功方面的意义。
{"title":"Synerg-eye-zing: Decoding Nonlinear Gaze Dynamics Underlying Successful Collaborations in Co-located Teams","authors":"G. S. Rajshekar Reddy, Lucca Eloy, Rachel Dickler, Jason G. Reitman, Samuel L. Pugh, Peter W. Foltz, Jamie C. Gorman, Julie L. Harrison, Leanne Hirshfield","doi":"10.1145/3577190.3614104","DOIUrl":"https://doi.org/10.1145/3577190.3614104","url":null,"abstract":"Joint Visual Attention (JVA) has long been considered a critical component of successful collaborations, enabling coordination and construction of a shared knowledge space. However, recent studies challenge the notion that JVA alone ensures effective collaboration. To gain deeper insights into JVA’s influence, we examine nonlinear gaze coupling and gaze regularity in the collaborators’ visual attention. Specifically, we analyze gaze data from 19 dyadic and triadic teams engaged in a co-located programming task using Recurrence Quantification Analysis (RQA). Our results emphasize the significance of team-level gaze regularity for improving task performance - highlighting the importance of maintaining stable or sustained episodes of joint or individual attention, than disjointed patterns. Additionally, through regression analyses, we examine the predictive capacity of recurrence metrics for subjective traits such as social cohesion and social loafing, revealing unique interpersonal and team dynamics behind productive collaborations. We elaborate on our findings via qualitative anecdotes and discuss their implications in shaping real-time interventions for optimizing collaborative success.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1