首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning 基于深度学习的热视频自主生理信号提取
Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville
Using the thermal modality in order to extract physiological signals as a noncontact means of remote monitoring is gaining traction in applications, such as healthcare monitoring. However, existing methods rely heavily on traditional tracking and mostly unsupervised signal processing methods, which could be affected significantly by noise and subjects’ movements. Using a novel deep learning architecture based on convolutional long short-term memory networks on a diverse dataset of 36 subjects, we present a personalized approach to extract multimodal signals, including the heart rate, respiration rate, and body temperature from thermal videos. We perform multimodal signal extraction for subjects in states of both active speaking and silence, requiring no parameter tuning in an end-to-end deep learning approach with automatic feature extraction. We experiment with different data sampling methods for training our deep learning models, as well as different network designs. Our results indicate the effectiveness and improved efficiency of the proposed models reaching more than 90% accuracy based on the availability of proper training data for each subject.
利用热模态提取生理信号作为一种非接触式远程监测手段,在医疗保健监测等应用中越来越受到关注。然而,现有的方法严重依赖于传统的跟踪和大多数无监督的信号处理方法,这些方法可能受到噪声和受试者运动的显著影响。使用基于卷积长短期记忆网络的新颖深度学习架构,我们提出了一种个性化的方法,从热视频中提取多模态信号,包括心率、呼吸频率和体温。我们在主动说话和沉默状态下对受试者进行多模态信号提取,不需要在端到端深度学习方法中进行参数调整,并进行自动特征提取。我们尝试了不同的数据采样方法来训练我们的深度学习模型,以及不同的网络设计。我们的结果表明,基于每个主题适当的训练数据的可用性,所提出的模型的有效性和提高的效率达到90%以上的准确率。
{"title":"Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning","authors":"Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville","doi":"10.1145/3577190.3614123","DOIUrl":"https://doi.org/10.1145/3577190.3614123","url":null,"abstract":"Using the thermal modality in order to extract physiological signals as a noncontact means of remote monitoring is gaining traction in applications, such as healthcare monitoring. However, existing methods rely heavily on traditional tracking and mostly unsupervised signal processing methods, which could be affected significantly by noise and subjects’ movements. Using a novel deep learning architecture based on convolutional long short-term memory networks on a diverse dataset of 36 subjects, we present a personalized approach to extract multimodal signals, including the heart rate, respiration rate, and body temperature from thermal videos. We perform multimodal signal extraction for subjects in states of both active speaking and silence, requiring no parameter tuning in an end-to-end deep learning approach with automatic feature extraction. We experiment with different data sampling methods for training our deep learning models, as well as different network designs. Our results indicate the effectiveness and improved efficiency of the proposed models reaching more than 90% accuracy based on the availability of proper training data for each subject.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification 多模态融合相互作用:人与自动量化研究
Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency
In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.
为了执行异构信号的多模态融合,我们需要了解它们之间的相互作用:每个模态如何单独提供对任务有用的信息,以及这些信息在其他模态存在时如何变化。在本文中,我们对人类如何注释多模态交互的两种分类进行了比较研究:(1)部分标签,其中不同的注释者注释给定第一、第二和两种模态的标签;(2)反事实标签,其中相同的注释者注释给定第一模态的标签,然后要求他们明确地解释他们的答案在给定第二模态时如何变化。我们进一步提出了一种基于(3)信息分解的替代分类法,其中注释者注释冗余度:模式单独或一起给出相同预测的程度,唯一性:一种模式能够预测另一种模式无法预测的程度,以及协同性:两种模式能够做出预测的程度,否则使用单个模式无法做出预测。通过实验和注释,我们强调了每种方法的一些机会和局限性,并提出了一种将部分和反事实标签的注释自动转换为信息分解的方法,从而产生了一种准确有效的多模态交互量化方法。
{"title":"Multimodal Fusion Interactions: A Study of Human and Automatic Quantification","authors":"Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency","doi":"10.1145/3577190.3614151","DOIUrl":"https://doi.org/10.1145/3577190.3614151","url":null,"abstract":"In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews 动机性访谈中治疗师共情的多模态分析与评估
Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani
The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.
心理治疗的质量和效果很大程度上取决于治疗师与来访者建立有意义的联系的能力。治疗师共情的自动评估为评估治疗过程的质量提供了具有成本效益和系统的方法。在这项工作中,我们建议使用多模态行为数据来评估治疗师的同理心,即在现实世界中进行酒精滥用干预的动机访谈(MI)会话中的口语(文本)和音频。我们首先单独研究每种模态(文本与音频),然后使用不同的融合策略评估多模态方法,用于自动识别移情水平(高与低)。利用最近的文本(蒸馏roberta)和语音(HuBERT)的预训练模型作为强大的单峰基线,我们在早期和晚期融合的F1分数中获得了一致的2-3分的提高,并且在单峰基线上获得了最高的6-12分的绝对提高。我们的模型在治疗的早期阶段获得了68%的F1分数,而在依赖治疗师的情况下获得了72%的F1分数。此外,我们的研究结果表明,相对较小的一部分会话,特别是第二个四分位数,在移情预测中最重要,优于对后面部分和整个会话的预测。我们对后期融合结果的分析表明,在有限的数据设置中,融合模型更多地依赖于音频模式,例如在单个四分位数中,以及仅使用治疗师轮换时。此外,我们观察到MI不一致话语部分会话的错误分类率最高(所有模型错误分类20%),可能是由于这些类型的意图与感知共情相关的复杂性。
{"title":"Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews","authors":"Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani","doi":"10.1145/3577190.3614105","DOIUrl":"https://doi.org/10.1145/3577190.3614105","url":null,"abstract":"The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Natural to Non-Natural Interaction: Embracing Interaction Design Beyond the Accepted Convention of Natural 从自然到非自然的交互:拥抱超越自然公认惯例的交互设计
Radu-Daniel Vatavu
Natural interactions feel intuitive, familiar, and a good match to the task, user’s abilities, and context. Consequently, a wealth of scientific research has been conducted on natural interaction with computer systems. Contrary to conventional mainstream, we advocate for “non-natural interaction design” as a transformative, creative process that results in highly usable and effective interactions by deliberately deviating from users’ expectations and experience of engaging with the physical world. The non-natural approach to interaction design provokes a departure from the established notion of the “natural,” all the while prioritizing usability—albeit amidst the backdrop of the unconventional, unexpected, and intriguing.
自然交互感觉直观、熟悉,并且与任务、用户能力和上下文很好地匹配。因此,对与计算机系统的自然交互进行了大量的科学研究。与传统主流相反,我们提倡“非自然交互设计”,将其作为一种变革性的、创造性的过程,通过故意偏离用户与物理世界互动的期望和体验,从而产生高度可用性和有效的交互。交互设计的非自然方法引发了对“自然”的既定概念的背离,同时始终优先考虑可用性——尽管是在非常规的、意想不到的和有趣的背景下。
{"title":"From Natural to Non-Natural Interaction: Embracing Interaction Design Beyond the Accepted Convention of Natural","authors":"Radu-Daniel Vatavu","doi":"10.1145/3577190.3616122","DOIUrl":"https://doi.org/10.1145/3577190.3616122","url":null,"abstract":"Natural interactions feel intuitive, familiar, and a good match to the task, user’s abilities, and context. Consequently, a wealth of scientific research has been conducted on natural interaction with computer systems. Contrary to conventional mainstream, we advocate for “non-natural interaction design” as a transformative, creative process that results in highly usable and effective interactions by deliberately deviating from users’ expectations and experience of engaging with the physical world. The non-natural approach to interaction design provokes a departure from the established notion of the “natural,” all the while prioritizing usability—albeit amidst the backdrop of the unconventional, unexpected, and intriguing.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks 使用变形金刚和MediaPipe地标解释手语识别
Cristina Luna-Jiménez, Manuel Gil-Martín, Ricardo Kleinlein, Rubén San-Segundo, Fernando Fernández-Martínez
Sign Language Recognition (SLR) is a challenging task that aims to bridge the communication gap between the deaf and hearing communities. In recent years, deep learning-based approaches have shown promising results in SLR. However, the lack of interpretability remains a significant challenge. In this paper, we seek to understand which hand and pose MediaPipe Landmarks are deemed the most important for prediction as estimated by a Transformer model. We propose to embed a learnable array of parameters into the model that performs an element-wise multiplication of the inputs. This learned array highlights the most informative input features that contributed to solve the recognition task. Resulting in a human-interpretable vector that lets us interpret the model predictions. We evaluate our approach on public datasets called WLASL100 (SRL) and IPNHand (gesture recognition). We believe that the insights gained in this way could be exploited for the development of more efficient SLR pipelines.
手语识别(SLR)是一项具有挑战性的任务,旨在弥合聋人与听力健全群体之间的沟通差距。近年来,基于深度学习的方法在单反中显示出有希望的结果。然而,缺乏可解释性仍然是一个重大挑战。在本文中,我们试图了解哪只手和姿势的MediaPipe地标被认为是最重要的预测由变压器模型估计。我们建议将一个可学习的参数数组嵌入到模型中,该模型执行输入的元素乘法。这个学习数组突出了有助于解决识别任务的最有信息的输入特征。结果是一个人类可解释的向量,让我们可以解释模型预测。我们在名为WLASL100 (SRL)和IPNHand(手势识别)的公共数据集上评估了我们的方法。我们相信,通过这种方式获得的见解可以用于开发更高效的单反管道。
{"title":"Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks","authors":"Cristina Luna-Jiménez, Manuel Gil-Martín, Ricardo Kleinlein, Rubén San-Segundo, Fernando Fernández-Martínez","doi":"10.1145/3577190.3614143","DOIUrl":"https://doi.org/10.1145/3577190.3614143","url":null,"abstract":"Sign Language Recognition (SLR) is a challenging task that aims to bridge the communication gap between the deaf and hearing communities. In recent years, deep learning-based approaches have shown promising results in SLR. However, the lack of interpretability remains a significant challenge. In this paper, we seek to understand which hand and pose MediaPipe Landmarks are deemed the most important for prediction as estimated by a Transformer model. We propose to embed a learnable array of parameters into the model that performs an element-wise multiplication of the inputs. This learned array highlights the most informative input features that contributed to solve the recognition task. Resulting in a human-interpretable vector that lets us interpret the model predictions. We evaluate our approach on public datasets called WLASL100 (SRL) and IPNHand (gesture recognition). We believe that the insights gained in this way could be exploited for the development of more efficient SLR pipelines.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings 评估标题激活的潜力,以减轻虚拟会议中从面部手势推断的混淆
Melanie Heck, Jinhee Jeong, Christian Becker
Following the COVID-19 pandemic, virtual meetings have not only become an integral part of collaboration, but are now also a popular tool for disseminating information to a large audience through webinars, online lectures, and the like. Ideally, the meeting participants should understand discussed topics as smoothly as in physical encounters. However, many experience confusion, but are hesitant to express their doubts. In this paper, we present the results from a user study with 45 Google Meet users that investigates how auto-generated captions can be used to improve comprehension. The results show that captions can help overcome confusion caused by language barriers, but not if it is the result of distorted words. To mitigate negative side effects such as occlusion of important visual information when captions are not strictly needed, we propose to activate them dynamically only when a user effectively experiences confusion. To determine instances that require captioning, we test whether the subliminal cues from facial gestures can be used to detect confusion. We confirm that confusion activates six facial action units (AU4, AU6, AU7, AU10, AU17, and AU23).
在2019冠状病毒病大流行之后,虚拟会议不仅成为协作的一个组成部分,而且现在还成为通过网络研讨会、在线讲座等向广大受众传播信息的流行工具。理想情况下,会议参与者应该像在实际接触中一样顺利地理解讨论的主题。然而,许多人经历了困惑,却不愿表达自己的怀疑。在本文中,我们展示了来自45名Google Meet用户的用户研究结果,该研究调查了如何使用自动生成的字幕来提高理解能力。结果表明,字幕可以帮助克服语言障碍造成的混淆,但如果是由扭曲的单词造成的,则不是。为了减轻负面的副作用,例如当不需要字幕时遮挡重要的视觉信息,我们建议只有在用户有效地经历困惑时才动态地激活它们。为了确定需要字幕的实例,我们测试了面部手势的潜意识线索是否可以用来检测困惑。我们证实,混淆激活了6个面部动作单元(AU4、AU6、AU7、AU10、AU17和AU23)。
{"title":"Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings","authors":"Melanie Heck, Jinhee Jeong, Christian Becker","doi":"10.1145/3577190.3614142","DOIUrl":"https://doi.org/10.1145/3577190.3614142","url":null,"abstract":"Following the COVID-19 pandemic, virtual meetings have not only become an integral part of collaboration, but are now also a popular tool for disseminating information to a large audience through webinars, online lectures, and the like. Ideally, the meeting participants should understand discussed topics as smoothly as in physical encounters. However, many experience confusion, but are hesitant to express their doubts. In this paper, we present the results from a user study with 45 Google Meet users that investigates how auto-generated captions can be used to improve comprehension. The results show that captions can help overcome confusion caused by language barriers, but not if it is the result of distorted words. To mitigate negative side effects such as occlusion of important visual information when captions are not strictly needed, we propose to activate them dynamically only when a user effectively experiences confusion. To determine instances that require captioning, we test whether the subliminal cues from facial gestures can be used to detect confusion. We confirm that confusion activates six facial action units (AU4, AU6, AU7, AU10, AU17, and AU23).","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response ASMRcade:自主感觉经络反应的交互式音频触发器
Silvan Mertes, Marcel Strobl, Ruben Schlagowski, Elisabeth André
Autonomous Sensory Meridian Response (ASMR) is a sensory phenomenon involving pleasurable tingling sensations in response to stimuli such as whispering, tapping, and hair brushing. It is increasingly used to promote health and well-being, help with sleep, and reduce stress and anxiety. ASMR triggers are both highly individual and of great variety. Consequently, finding or identifying suitable ASMR content, e.g., by searching online platforms, can take time and effort. This work addresses this challenge by introducing a novel interactive approach for users to generate personalized ASMR sounds. The presented system utilizes a generative adversarial network (GAN) for sound generation and a graphical user interface (GUI) for user control. Our system allows users to create and manipulate audio samples by interacting with a visual representation of the GAN’s latent input vector. Further, we present the results of a first user study which indicates that our approach is suitable for triggering ASMR experiences.
自主感觉经络反应(ASMR)是一种感觉现象,包括对耳语、轻拍和梳头等刺激的愉悦刺痛感。它越来越多地用于促进健康和幸福,帮助睡眠,减少压力和焦虑。ASMR的触发是高度个性化和多样化的。因此,寻找或识别合适的ASMR内容,例如,通过搜索在线平台,可能需要时间和精力。这项工作通过为用户引入一种新的交互方法来生成个性化的ASMR声音来解决这一挑战。该系统利用生成对抗网络(GAN)进行声音生成,并利用图形用户界面(GUI)进行用户控制。我们的系统允许用户通过与GAN的潜在输入向量的视觉表示进行交互来创建和操作音频样本。此外,我们提出了第一个用户研究的结果,表明我们的方法适用于触发ASMR体验。
{"title":"ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response","authors":"Silvan Mertes, Marcel Strobl, Ruben Schlagowski, Elisabeth André","doi":"10.1145/3577190.3614155","DOIUrl":"https://doi.org/10.1145/3577190.3614155","url":null,"abstract":"Autonomous Sensory Meridian Response (ASMR) is a sensory phenomenon involving pleasurable tingling sensations in response to stimuli such as whispering, tapping, and hair brushing. It is increasingly used to promote health and well-being, help with sleep, and reduce stress and anxiety. ASMR triggers are both highly individual and of great variety. Consequently, finding or identifying suitable ASMR content, e.g., by searching online platforms, can take time and effort. This work addresses this challenge by introducing a novel interactive approach for users to generate personalized ASMR sounds. The presented system utilizes a generative adversarial network (GAN) for sound generation and a graphical user interface (GUI) for user control. Our system allows users to create and manipulate audio samples by interacting with a visual representation of the GAN’s latent input vector. Further, we present the results of a first user study which indicates that our approach is suitable for triggering ASMR experiences.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Influence of hand representation on a grasping task in augmented reality 手表征对增强现实中抓取任务的影响
Louis Lafuma, Guillaume Bouyer, Olivier Goguel, Jean-Yves Pascal Didier
Research has shown that modifying the aspect of the virtual hand in immersive virtual reality can convey objects properties to users. Whether we can achieve the same results in augmented reality is still to be determined since the user’s real hand is visible through the headset. Although displaying a virtual hand in augmented reality is usually not recommended, it could positively impact the user effectiveness or appreciation of the application.
研究表明,在沉浸式虚拟现实中,修改虚拟手的外观可以向用户传达物体的属性。我们能否在增强现实中取得同样的结果还有待确定,因为用户的真实手可以通过耳机看到。虽然通常不建议在增强现实中显示虚拟手,但它可能会对用户的有效性或应用程序的欣赏产生积极影响。
{"title":"Influence of hand representation on a grasping task in augmented reality","authors":"Louis Lafuma, Guillaume Bouyer, Olivier Goguel, Jean-Yves Pascal Didier","doi":"10.1145/3577190.3614128","DOIUrl":"https://doi.org/10.1145/3577190.3614128","url":null,"abstract":"Research has shown that modifying the aspect of the virtual hand in immersive virtual reality can convey objects properties to users. Whether we can achieve the same results in augmented reality is still to be determined since the user’s real hand is visible through the headset. Although displaying a virtual hand in augmented reality is usually not recommended, it could positively impact the user effectiveness or appreciation of the application.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotations from speech and heart rate: impact on multimodal emotion recognition 语音和心率注释:对多模态情感识别的影响
Kaushal Sharma, Guillaume Chanel
The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.
多模态情感识别的研究重点往往集中在几种融合策略的分析上。然而,很少有人关注情感线索(如生理和音频线索)对用于生成基础真相(gt)的外部注释的影响。在我们的研究中,我们通过收集三组情感线索的六个连续唤醒注释来分析这种效应:仅语音、仅心跳声音和它们的组合。我们的结果表明三组注释之间存在显著差异,从而给出了三种不同的线索特定的gt。通过训练多模态机器学习模型来回归语音、心率及其多模态融合对觉醒的影响,来估计这些gt的相关性。我们的分析表明,相应的模态可以更好地预测特定线索的GT。此外,对gt定义的几个情感线索的融合允许在单模态模型和多模态融合中达到类似的性能。总之,我们的研究结果表明,心率是生理GT产生的有效线索;结合多个情绪线索生成gt与执行输入多模态融合进行情绪预测同样重要。
{"title":"Annotations from speech and heart rate: impact on multimodal emotion recognition","authors":"Kaushal Sharma, Guillaume Chanel","doi":"10.1145/3577190.3614165","DOIUrl":"https://doi.org/10.1145/3577190.3614165","url":null,"abstract":"The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIUnet: Asymptotic inference with U2-Net for referring image segmentation AIUnet:基于u2net的参考图像分割渐近推理
Jiangquan Li, Shimin Shan, Yu Liu, Kaiping Xu, Xiwen Hu, Mingcheng Xue
Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.
参考图像分割旨在通过提供自然语言表达从图像中分割出目标对象。虽然最近的方法取得了显着的进步,但很少有人为跨模型特征设计有效的深度融合过程或专注于视觉的精细细节。本文提出了一种基于u2net的渐近推理方法AIUnet。AIUnet的核心是跨模型u2net (CMU)模块,该模块将文本引导视觉(TGV)模块集成到u2net中,实现了不同尺度下跨模型信息的高效交互。CMU更多地关注高级特征中的位置信息,并在低级特征中学习更精细的细节信息。此外,我们提出了一个特征增强解码器(FED)模块,以提高对精细细节的识别,并将跨模型特征解码为二进制掩码。FED模块利用一种简单的基于cnn的方法来增强多模态特征。我们的实验表明,AIUnet在三个标准数据集上取得了具有竞争力的结果。代码可从https://github.com/LJQbiu/AIUnet获得。
{"title":"AIUnet: Asymptotic inference with U2-Net for referring image segmentation","authors":"Jiangquan Li, Shimin Shan, Yu Liu, Kaiping Xu, Xiwen Hu, Mingcheng Xue","doi":"10.1145/3577190.3614176","DOIUrl":"https://doi.org/10.1145/3577190.3614176","url":null,"abstract":"Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1