首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
WiFiTuned: Monitoring Engagement in Online Participation by Harmonizing WiFi and Audio wifitune:通过协调WiFi和音频来监测在线参与的参与度
Vijay Kumar Singh, Pragma Kar, Ayush Madhan Sohini, Madhav Rangaiah, Sandip Chakraborty, Mukulika Maity
This paper proposes a multi-modal, non-intrusive and privacy preserving system WiFiTuned for monitoring engagement in online participation i.e., meeting/classes/seminars. It uses two sensing modalities i.e., WiFi CSI and audio for the same. WiFiTuned detects the head movements of participants during online participation through WiFi CSI and detects the speaker’s intent through audio. Then it correlates the two to detect engagement. We evaluate WiFiTuned with 22 participants and observe that it can detect the engagement level with an average accuracy of more than .
本文提出了一种多模式、非侵入性和隐私保护系统wifitune,用于监控在线参与,即会议/课程/研讨会的参与。它使用两种传感模式,即WiFi CSI和音频。wifitune通过WiFi CSI检测在线参与过程中参与者的头部动作,并通过音频检测说话者的意图。然后,它将两者联系起来,以检测接触。我们用22名参与者对wifitune进行了评估,并观察到它可以检测参与度水平,平均准确率超过。
{"title":"WiFiTuned: Monitoring Engagement in Online Participation by Harmonizing WiFi and Audio","authors":"Vijay Kumar Singh, Pragma Kar, Ayush Madhan Sohini, Madhav Rangaiah, Sandip Chakraborty, Mukulika Maity","doi":"10.1145/3577190.3614108","DOIUrl":"https://doi.org/10.1145/3577190.3614108","url":null,"abstract":"This paper proposes a multi-modal, non-intrusive and privacy preserving system WiFiTuned for monitoring engagement in online participation i.e., meeting/classes/seminars. It uses two sensing modalities i.e., WiFi CSI and audio for the same. WiFiTuned detects the head movements of participants during online participation through WiFi CSI and detects the speaker’s intent through audio. Then it correlates the two to detect engagement. We evaluate WiFiTuned with 22 participants and observe that it can detect the engagement level with an average accuracy of more than .","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting When the Mind Wanders Off Task in Real-time: An Overview and Systematic Review 实时检测思维何时偏离任务:概述和系统回顾
Vishal Kuvar, Julia W. Y. Kam, Stephen Hutt, Caitlin Mills
Research on the ubiquity and consequences of task-unrelated thought (TUT; often used to operationalize mind wandering) in several domains recently sparked a surge in efforts to create “stealth measurements” of TUT using machine learning. Although these attempts have been successful, they have used widely varied algorithms, modalities, and performance metrics — making them difficult to compare and inform future work on best practices. We aim to synthesize these findings through a systematic review of 42 studies identified following PRISMA guidelines to answer two research questions: 1) are there any modalities that are better indicators of TUT than the rest; and 2) do multimodal models provide better results than unimodal models? We found that models built on gaze typically outperform other modalities and that multimodal models do not present a clear edge over their unimodal counterparts. Our review highlights the typical steps involved in model creation and the choices available in each step to guide future research, while also discussing the limitations of the current “state of the art” — namely the barriers to generalizability.
任务无关思维的普遍性及其后果研究(通常用于操作走神)最近在几个领域引发了使用机器学习创建TUT“隐形测量”的热潮。尽管这些尝试取得了成功,但它们使用了各种各样的算法、模式和性能指标,这使得它们难以进行比较,并为未来的最佳实践工作提供信息。我们的目标是通过对42项遵循PRISMA指南的研究进行系统回顾来综合这些发现,以回答两个研究问题:1)是否存在比其他模式更好的图坦卡蒙指标;2)多模态模型是否比单模态模型提供更好的结果?我们发现建立在凝视上的模型通常优于其他模态,而多模态模型与单模态模型相比并没有明显的优势。我们的回顾重点介绍了模型创建的典型步骤,以及每个步骤中可用的选择,以指导未来的研究,同时也讨论了当前“最先进技术”的局限性——即通用性的障碍。
{"title":"Detecting When the Mind Wanders Off Task in Real-time: An Overview and Systematic Review","authors":"Vishal Kuvar, Julia W. Y. Kam, Stephen Hutt, Caitlin Mills","doi":"10.1145/3577190.3614126","DOIUrl":"https://doi.org/10.1145/3577190.3614126","url":null,"abstract":"Research on the ubiquity and consequences of task-unrelated thought (TUT; often used to operationalize mind wandering) in several domains recently sparked a surge in efforts to create “stealth measurements” of TUT using machine learning. Although these attempts have been successful, they have used widely varied algorithms, modalities, and performance metrics — making them difficult to compare and inform future work on best practices. We aim to synthesize these findings through a systematic review of 42 studies identified following PRISMA guidelines to answer two research questions: 1) are there any modalities that are better indicators of TUT than the rest; and 2) do multimodal models provide better results than unimodal models? We found that models built on gaze typically outperform other modalities and that multimodal models do not present a clear edge over their unimodal counterparts. Our review highlights the typical steps involved in model creation and the choices available in each step to guide future research, while also discussing the limitations of the current “state of the art” — namely the barriers to generalizability.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification 多模态融合相互作用:人与自动量化研究
Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency
In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.
为了执行异构信号的多模态融合,我们需要了解它们之间的相互作用:每个模态如何单独提供对任务有用的信息,以及这些信息在其他模态存在时如何变化。在本文中,我们对人类如何注释多模态交互的两种分类进行了比较研究:(1)部分标签,其中不同的注释者注释给定第一、第二和两种模态的标签;(2)反事实标签,其中相同的注释者注释给定第一模态的标签,然后要求他们明确地解释他们的答案在给定第二模态时如何变化。我们进一步提出了一种基于(3)信息分解的替代分类法,其中注释者注释冗余度:模式单独或一起给出相同预测的程度,唯一性:一种模式能够预测另一种模式无法预测的程度,以及协同性:两种模式能够做出预测的程度,否则使用单个模式无法做出预测。通过实验和注释,我们强调了每种方法的一些机会和局限性,并提出了一种将部分和反事实标签的注释自动转换为信息分解的方法,从而产生了一种准确有效的多模态交互量化方法。
{"title":"Multimodal Fusion Interactions: A Study of Human and Automatic Quantification","authors":"Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency","doi":"10.1145/3577190.3614151","DOIUrl":"https://doi.org/10.1145/3577190.3614151","url":null,"abstract":"In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response ASMRcade:自主感觉经络反应的交互式音频触发器
Silvan Mertes, Marcel Strobl, Ruben Schlagowski, Elisabeth André
Autonomous Sensory Meridian Response (ASMR) is a sensory phenomenon involving pleasurable tingling sensations in response to stimuli such as whispering, tapping, and hair brushing. It is increasingly used to promote health and well-being, help with sleep, and reduce stress and anxiety. ASMR triggers are both highly individual and of great variety. Consequently, finding or identifying suitable ASMR content, e.g., by searching online platforms, can take time and effort. This work addresses this challenge by introducing a novel interactive approach for users to generate personalized ASMR sounds. The presented system utilizes a generative adversarial network (GAN) for sound generation and a graphical user interface (GUI) for user control. Our system allows users to create and manipulate audio samples by interacting with a visual representation of the GAN’s latent input vector. Further, we present the results of a first user study which indicates that our approach is suitable for triggering ASMR experiences.
自主感觉经络反应(ASMR)是一种感觉现象,包括对耳语、轻拍和梳头等刺激的愉悦刺痛感。它越来越多地用于促进健康和幸福,帮助睡眠,减少压力和焦虑。ASMR的触发是高度个性化和多样化的。因此,寻找或识别合适的ASMR内容,例如,通过搜索在线平台,可能需要时间和精力。这项工作通过为用户引入一种新的交互方法来生成个性化的ASMR声音来解决这一挑战。该系统利用生成对抗网络(GAN)进行声音生成,并利用图形用户界面(GUI)进行用户控制。我们的系统允许用户通过与GAN的潜在输入向量的视觉表示进行交互来创建和操作音频样本。此外,我们提出了第一个用户研究的结果,表明我们的方法适用于触发ASMR体验。
{"title":"ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response","authors":"Silvan Mertes, Marcel Strobl, Ruben Schlagowski, Elisabeth André","doi":"10.1145/3577190.3614155","DOIUrl":"https://doi.org/10.1145/3577190.3614155","url":null,"abstract":"Autonomous Sensory Meridian Response (ASMR) is a sensory phenomenon involving pleasurable tingling sensations in response to stimuli such as whispering, tapping, and hair brushing. It is increasingly used to promote health and well-being, help with sleep, and reduce stress and anxiety. ASMR triggers are both highly individual and of great variety. Consequently, finding or identifying suitable ASMR content, e.g., by searching online platforms, can take time and effort. This work addresses this challenge by introducing a novel interactive approach for users to generate personalized ASMR sounds. The presented system utilizes a generative adversarial network (GAN) for sound generation and a graphical user interface (GUI) for user control. Our system allows users to create and manipulate audio samples by interacting with a visual representation of the GAN’s latent input vector. Further, we present the results of a first user study which indicates that our approach is suitable for triggering ASMR experiences.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems 面向自适应用户为中心的神经符号学习与自治系统的多模态交互
Amr Gomaa, Michael Feld
Recent advances in deep learning and data-driven approaches have facilitated the perception of objects and their environments in a perceptual subsymbolic manner. Thus, these autonomous systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is an increasing demand to further enhance these systems to attain a more conceptual and symbolic understanding of objects to acquire the underlying reasoning behind the learned tasks. Achieving this level of powerful artificial intelligence necessitates considering both explicit teachings provided by humans (e.g., explaining how to act) and implicit teaching obtained through observing human behavior (e.g., through system sensors). Hence, it is imperative to incorporate symbolic and subsymbolic learning approaches to support implicit and explicit interaction models. This integration enables the system to achieve multimodal input and output capabilities. In this Blue Sky paper, we argue for considering these input types, along with human-in-the-loop and incremental learning techniques, to advance the field of artificial intelligence and enable autonomous systems to emulate human learning. We propose several hypotheses and design guidelines aimed at achieving this objective.
深度学习和数据驱动方法的最新进展促进了以感知亚符号方式感知物体及其环境。因此,这些自主系统现在可以执行目标检测、传感器数据融合和语言理解任务。然而,人们越来越需要进一步增强这些系统,以获得对对象的更多概念性和符号化理解,从而获得学习任务背后的潜在推理。实现这种强大的人工智能需要考虑人类提供的显性教学(例如,解释如何行动)和通过观察人类行为获得的隐性教学(例如,通过系统传感器)。因此,必须结合符号和亚符号学习方法来支持隐式和显式交互模型。这种集成使系统能够实现多模态输入和输出能力。在这篇蓝天论文中,我们主张考虑这些输入类型,以及人在循环和增量学习技术,以推进人工智能领域,并使自主系统能够模仿人类学习。我们提出了几个假设和设计指南,旨在实现这一目标。
{"title":"Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems","authors":"Amr Gomaa, Michael Feld","doi":"10.1145/3577190.3616121","DOIUrl":"https://doi.org/10.1145/3577190.3616121","url":null,"abstract":"Recent advances in deep learning and data-driven approaches have facilitated the perception of objects and their environments in a perceptual subsymbolic manner. Thus, these autonomous systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is an increasing demand to further enhance these systems to attain a more conceptual and symbolic understanding of objects to acquire the underlying reasoning behind the learned tasks. Achieving this level of powerful artificial intelligence necessitates considering both explicit teachings provided by humans (e.g., explaining how to act) and implicit teaching obtained through observing human behavior (e.g., through system sensors). Hence, it is imperative to incorporate symbolic and subsymbolic learning approaches to support implicit and explicit interaction models. This integration enables the system to achieve multimodal input and output capabilities. In this Blue Sky paper, we argue for considering these input types, along with human-in-the-loop and incremental learning techniques, to advance the field of artificial intelligence and enable autonomous systems to emulate human learning. We propose several hypotheses and design guidelines aimed at achieving this objective.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews 动机性访谈中治疗师共情的多模态分析与评估
Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani
The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.
心理治疗的质量和效果很大程度上取决于治疗师与来访者建立有意义的联系的能力。治疗师共情的自动评估为评估治疗过程的质量提供了具有成本效益和系统的方法。在这项工作中,我们建议使用多模态行为数据来评估治疗师的同理心,即在现实世界中进行酒精滥用干预的动机访谈(MI)会话中的口语(文本)和音频。我们首先单独研究每种模态(文本与音频),然后使用不同的融合策略评估多模态方法,用于自动识别移情水平(高与低)。利用最近的文本(蒸馏roberta)和语音(HuBERT)的预训练模型作为强大的单峰基线,我们在早期和晚期融合的F1分数中获得了一致的2-3分的提高,并且在单峰基线上获得了最高的6-12分的绝对提高。我们的模型在治疗的早期阶段获得了68%的F1分数,而在依赖治疗师的情况下获得了72%的F1分数。此外,我们的研究结果表明,相对较小的一部分会话,特别是第二个四分位数,在移情预测中最重要,优于对后面部分和整个会话的预测。我们对后期融合结果的分析表明,在有限的数据设置中,融合模型更多地依赖于音频模式,例如在单个四分位数中,以及仅使用治疗师轮换时。此外,我们观察到MI不一致话语部分会话的错误分类率最高(所有模型错误分类20%),可能是由于这些类型的意图与感知共情相关的复杂性。
{"title":"Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews","authors":"Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani","doi":"10.1145/3577190.3614105","DOIUrl":"https://doi.org/10.1145/3577190.3614105","url":null,"abstract":"The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Garments for Immersive Home Rehabilitation Using VR 使用VR的沉浸式家庭康复智能服装
Luz Alejandra Magre, Shirley Coyle
Adherence to a rehabilitation programme is vital to recover from injury, failing to do so can keep a promising athlete off the field permanently. Although the importance to follow their home exercise programme (HEP) is broadly explained to patients by their physicians, few of them actually complete it correctly. In my PhD research, I focus on factors that could help increase engagement in home exercise programmes for patients recovering from knee injuries using VR and wearable sensors. This will be done through the gamification of the rehabilitation process, designing the system with a user-centered design approach to test different interactions that could affect the engagement of the users.
坚持康复计划对于从伤病中恢复是至关重要的,如果不这样做,可能会使一个有前途的运动员永远离开赛场。尽管医生向患者广泛解释了遵循家庭锻炼计划(HEP)的重要性,但很少有人真正正确地完成了这项计划。在我的博士研究中,我关注的是可以帮助使用VR和可穿戴传感器从膝盖受伤中恢复的患者增加家庭锻炼计划的因素。这将通过康复过程的游戏化来实现,用以用户为中心的设计方法来设计系统,以测试可能影响用户参与的不同交互。
{"title":"Smart Garments for Immersive Home Rehabilitation Using VR","authors":"Luz Alejandra Magre, Shirley Coyle","doi":"10.1145/3577190.3614229","DOIUrl":"https://doi.org/10.1145/3577190.3614229","url":null,"abstract":"Adherence to a rehabilitation programme is vital to recover from injury, failing to do so can keep a promising athlete off the field permanently. Although the importance to follow their home exercise programme (HEP) is broadly explained to patients by their physicians, few of them actually complete it correctly. In my PhD research, I focus on factors that could help increase engagement in home exercise programmes for patients recovering from knee injuries using VR and wearable sensors. This will be done through the gamification of the rehabilitation process, designing the system with a user-centered design approach to test different interactions that could affect the engagement of the users.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Influence of hand representation on a grasping task in augmented reality 手表征对增强现实中抓取任务的影响
Louis Lafuma, Guillaume Bouyer, Olivier Goguel, Jean-Yves Pascal Didier
Research has shown that modifying the aspect of the virtual hand in immersive virtual reality can convey objects properties to users. Whether we can achieve the same results in augmented reality is still to be determined since the user’s real hand is visible through the headset. Although displaying a virtual hand in augmented reality is usually not recommended, it could positively impact the user effectiveness or appreciation of the application.
研究表明,在沉浸式虚拟现实中,修改虚拟手的外观可以向用户传达物体的属性。我们能否在增强现实中取得同样的结果还有待确定,因为用户的真实手可以通过耳机看到。虽然通常不建议在增强现实中显示虚拟手,但它可能会对用户的有效性或应用程序的欣赏产生积极影响。
{"title":"Influence of hand representation on a grasping task in augmented reality","authors":"Louis Lafuma, Guillaume Bouyer, Olivier Goguel, Jean-Yves Pascal Didier","doi":"10.1145/3577190.3614128","DOIUrl":"https://doi.org/10.1145/3577190.3614128","url":null,"abstract":"Research has shown that modifying the aspect of the virtual hand in immersive virtual reality can convey objects properties to users. Whether we can achieve the same results in augmented reality is still to be determined since the user’s real hand is visible through the headset. Although displaying a virtual hand in augmented reality is usually not recommended, it could positively impact the user effectiveness or appreciation of the application.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotations from speech and heart rate: impact on multimodal emotion recognition 语音和心率注释:对多模态情感识别的影响
Kaushal Sharma, Guillaume Chanel
The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.
多模态情感识别的研究重点往往集中在几种融合策略的分析上。然而,很少有人关注情感线索(如生理和音频线索)对用于生成基础真相(gt)的外部注释的影响。在我们的研究中,我们通过收集三组情感线索的六个连续唤醒注释来分析这种效应:仅语音、仅心跳声音和它们的组合。我们的结果表明三组注释之间存在显著差异,从而给出了三种不同的线索特定的gt。通过训练多模态机器学习模型来回归语音、心率及其多模态融合对觉醒的影响,来估计这些gt的相关性。我们的分析表明,相应的模态可以更好地预测特定线索的GT。此外,对gt定义的几个情感线索的融合允许在单模态模型和多模态融合中达到类似的性能。总之,我们的研究结果表明,心率是生理GT产生的有效线索;结合多个情绪线索生成gt与执行输入多模态融合进行情绪预测同样重要。
{"title":"Annotations from speech and heart rate: impact on multimodal emotion recognition","authors":"Kaushal Sharma, Guillaume Chanel","doi":"10.1145/3577190.3614165","DOIUrl":"https://doi.org/10.1145/3577190.3614165","url":null,"abstract":"The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIUnet: Asymptotic inference with U2-Net for referring image segmentation AIUnet:基于u2net的参考图像分割渐近推理
Jiangquan Li, Shimin Shan, Yu Liu, Kaiping Xu, Xiwen Hu, Mingcheng Xue
Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.
参考图像分割旨在通过提供自然语言表达从图像中分割出目标对象。虽然最近的方法取得了显着的进步,但很少有人为跨模型特征设计有效的深度融合过程或专注于视觉的精细细节。本文提出了一种基于u2net的渐近推理方法AIUnet。AIUnet的核心是跨模型u2net (CMU)模块,该模块将文本引导视觉(TGV)模块集成到u2net中,实现了不同尺度下跨模型信息的高效交互。CMU更多地关注高级特征中的位置信息,并在低级特征中学习更精细的细节信息。此外,我们提出了一个特征增强解码器(FED)模块,以提高对精细细节的识别,并将跨模型特征解码为二进制掩码。FED模块利用一种简单的基于cnn的方法来增强多模态特征。我们的实验表明,AIUnet在三个标准数据集上取得了具有竞争力的结果。代码可从https://github.com/LJQbiu/AIUnet获得。
{"title":"AIUnet: Asymptotic inference with U2-Net for referring image segmentation","authors":"Jiangquan Li, Shimin Shan, Yu Liu, Kaiping Xu, Xiwen Hu, Mingcheng Xue","doi":"10.1145/3577190.3614176","DOIUrl":"https://doi.org/10.1145/3577190.3614176","url":null,"abstract":"Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1