首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Crucial Clues: Investigating Psychophysiological Behaviors for Measuring Trust in Human-Robot Interaction 关键线索:调查人机交互中测量信任的心理生理行为
Muneeb Ahmad, Abdullah Alzahrani
Existing work on the measurements of trust during Human-Robot Interaction (HRI) indicates that psychophysiological behaviours (PBs) have the potential to measure trust. However, we see limited work on the use of multiple PBs in combination to calibrate human’s trust in robots in real-time during HRI. Therefore, this study aims to estimate human trust in robots by examining the differences in PBs between trust and distrust states. It further investigates the changes in PBs across repeated HRI and also explores the potential of machine learning classifiers in predicting trust levels during HRI. We collected participants’ electrodermal activity (EDA), blood volume pulse (BVP), heart rate (HR), skin temperature (SKT), blinking rate (BR), and blinking duration (BD) during repeated HRI. The results showed significant differences in HR and SKT between trust and distrust groups and no significant interaction effect of session and decision for all PBs. Random Forest classifier achieved the best accuracy of 68.6% to classify trust, while SKT, HR, BR, and BD were the important features. These findings highlight the value of PBs in measuring trust in real-time during HRI and encourage further investigation of trust measures with PBs in various HRI settings.
现有的关于人机交互信任测量的研究表明,心理生理行为(PBs)具有测量信任的潜力。然而,我们看到在HRI期间使用多个PBs组合来实时校准人类对机器人的信任方面的工作有限。因此,本研究旨在通过检查信任和不信任状态之间的PBs差异来估计人类对机器人的信任。它进一步研究了重复HRI中PBs的变化,并探索了机器学习分类器在预测HRI期间信任水平方面的潜力。我们收集了受试者在重复HRI期间的皮肤电活动(EDA)、血容量脉冲(BVP)、心率(HR)、皮肤温度(SKT)、眨眼频率(BR)和眨眼持续时间(BD)。结果显示,信任组和不信任组的HR和SKT存在显著差异,所有PBs的会话和决策没有显著的交互效应。随机森林分类器对信任的分类准确率达到68.6%,其中SKT、HR、BR和BD是重要特征。这些发现突出了PBs在HRI过程中实时测量信任的价值,并鼓励在各种HRI设置中进一步研究PBs的信任测量。
{"title":"Crucial Clues: Investigating Psychophysiological Behaviors for Measuring Trust in Human-Robot Interaction","authors":"Muneeb Ahmad, Abdullah Alzahrani","doi":"10.1145/3577190.3614148","DOIUrl":"https://doi.org/10.1145/3577190.3614148","url":null,"abstract":"Existing work on the measurements of trust during Human-Robot Interaction (HRI) indicates that psychophysiological behaviours (PBs) have the potential to measure trust. However, we see limited work on the use of multiple PBs in combination to calibrate human’s trust in robots in real-time during HRI. Therefore, this study aims to estimate human trust in robots by examining the differences in PBs between trust and distrust states. It further investigates the changes in PBs across repeated HRI and also explores the potential of machine learning classifiers in predicting trust levels during HRI. We collected participants’ electrodermal activity (EDA), blood volume pulse (BVP), heart rate (HR), skin temperature (SKT), blinking rate (BR), and blinking duration (BD) during repeated HRI. The results showed significant differences in HR and SKT between trust and distrust groups and no significant interaction effect of session and decision for all PBs. Random Forest classifier achieved the best accuracy of 68.6% to classify trust, while SKT, HR, BR, and BD were the important features. These findings highlight the value of PBs in measuring trust in real-time during HRI and encourage further investigation of trust measures with PBs in various HRI settings.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Conversational Agents for People with Neurodevelopmental Disorders 神经发育障碍患者的多模式对话代理
Fabio Catania, Tanya Talkar, Franca Garzotto, Benjamin R. Cowan, Thomas F. Quatieri, Satrajit Ghosh
Neurodevelopmental Disorders (NDD) involve developmental deficits in cognition, social interaction, and communication. Despite growing interest, gaps persist in understanding usability, effectiveness, and perceptions of such agents. We organize a workshop focusing on the use of conversational agents with multi-modal capabilities for therapeutic interventions in NDD. The workshop brings together researchers and practitioners to discuss design, evaluation, and ethical considerations. Anticipated outcomes include identifying challenges, sharing advancements, fostering collaboration, and charting future research directions.
神经发育障碍(NDD)涉及认知、社会交往和沟通方面的发育缺陷。尽管越来越多的人感兴趣,但在理解这些代理的可用性、有效性和感知方面仍然存在差距。我们组织了一个研讨会,重点讨论在NDD治疗干预中使用具有多模态能力的会话代理。研讨会汇集了研究人员和实践者来讨论设计、评估和伦理考虑。预期的成果包括识别挑战、分享进展、促进合作以及规划未来的研究方向。
{"title":"Multimodal Conversational Agents for People with Neurodevelopmental Disorders","authors":"Fabio Catania, Tanya Talkar, Franca Garzotto, Benjamin R. Cowan, Thomas F. Quatieri, Satrajit Ghosh","doi":"10.1145/3577190.3617133","DOIUrl":"https://doi.org/10.1145/3577190.3617133","url":null,"abstract":"Neurodevelopmental Disorders (NDD) involve developmental deficits in cognition, social interaction, and communication. Despite growing interest, gaps persist in understanding usability, effectiveness, and perceptions of such agents. We organize a workshop focusing on the use of conversational agents with multi-modal capabilities for therapeutic interventions in NDD. The workshop brings together researchers and practitioners to discuss design, evaluation, and ethical considerations. Anticipated outcomes include identifying challenges, sharing advancements, fostering collaboration, and charting future research directions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Exploration of RNN Variants for Recognizing Daily Life Stress Levels by Using Multimodal Physiological Signals 利用多模态生理信号识别日常生活压力水平的RNN变体性能探索
Yekta Said Can, Elisabeth André
Enduring stress can have negative impacts on human health and behavior. Widely used wearable devices are promising for assessing, monitoring and potentially alleviating high stress in daily life. Although numerous automatic stress recognition studies have been carried out in the laboratory environment with high accuracy, the performance of daily life studies is still far away from what the literature has in laboratory environments. Since the physiological signals obtained from these devices are time-series data, Recursive Neural Network (RNN) based classifiers promise better results than other machine learning methods. However, the performance of RNN-based classifiers has not been extensively evaluated (i.e., with several variants and different application techniques) for detecting daily life stress yet. They could be combined with CNN architectures, applied to raw data or handcrafted features. In this study, we created different RNN architecture variants and explored their performance for recognizing daily life stress to guide researchers in the field.
持续的压力会对人的健康和行为产生负面影响。广泛使用的可穿戴设备有望评估、监测和潜在地减轻日常生活中的高压力。尽管在实验室环境下进行了大量高精度的自动应力识别研究,但日常生活研究的表现与文献在实验室环境下的表现仍然相距甚远。由于从这些设备获得的生理信号是时间序列数据,基于递归神经网络(RNN)的分类器比其他机器学习方法有更好的结果。然而,基于rnn的分类器在检测日常生活压力方面的性能尚未得到广泛评估(即使用几种变体和不同的应用技术)。它们可以与CNN架构相结合,应用于原始数据或手工制作的特征。在本研究中,我们创建了不同的RNN架构变体,并探讨了它们在识别日常生活压力方面的表现,以指导该领域的研究人员。
{"title":"Performance Exploration of RNN Variants for Recognizing Daily Life Stress Levels by Using Multimodal Physiological Signals","authors":"Yekta Said Can, Elisabeth André","doi":"10.1145/3577190.3614159","DOIUrl":"https://doi.org/10.1145/3577190.3614159","url":null,"abstract":"Enduring stress can have negative impacts on human health and behavior. Widely used wearable devices are promising for assessing, monitoring and potentially alleviating high stress in daily life. Although numerous automatic stress recognition studies have been carried out in the laboratory environment with high accuracy, the performance of daily life studies is still far away from what the literature has in laboratory environments. Since the physiological signals obtained from these devices are time-series data, Recursive Neural Network (RNN) based classifiers promise better results than other machine learning methods. However, the performance of RNN-based classifiers has not been extensively evaluated (i.e., with several variants and different application techniques) for detecting daily life stress yet. They could be combined with CNN architectures, applied to raw data or handcrafted features. In this study, we created different RNN architecture variants and explored their performance for recognizing daily life stress to guide researchers in the field.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User Feedback-based Online Learning for Intent Classification 基于用户反馈的意图分类在线学习
Kaan Gönç, Baturay Sağlam, Onat Dalmaz, Tolga Çukur, Serdar Kozat, Hamdi Dibeklioglu
Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.
意图分类是自然语言处理(NLP)中的一项关键任务,旨在推断用户查询背后的目标或意图。大多数现有的意图分类方法依赖于在文本意图对的大型注释数据集上训练的监督深度模型。然而,在现实世界中,获得这样的数据集往往是昂贵和不切实际的。此外,随着时间的推移,当新的意图、话语或数据分布出现时,监督模型可能会过度拟合或面临分布变化,需要频繁的再训练。基于用户反馈的在线学习方法可以克服这一限制,因为它们在收集数据和不断调整模型时不需要访问意图。在本文中,我们提出了一种新的多臂上下文强盗框架,该框架利用基于大型语言模型(LLM)的文本编码器来提取给定话语的潜在特征,并共同学习编码文本特征和意图的多模态表示。我们的框架包括两个阶段:离线预训练和在线微调。在离线阶段,我们使用上下文强盗方法在一个小的标记数据集上训练策略。在在线阶段,我们使用基于用户反馈目标的强化算法微调策略参数,而不依赖于真实意图。我们进一步引入滑动窗口策略来模拟在线训练过程中数据样本的检索。这种新颖的两阶段方法使我们的方法能够有效地适应动态用户偏好和数据分布,并提高性能。一组广泛的实证研究表明,我们的方法显著优于忽略离线预训练或在线微调的策略,同时与在更大的标记数据集上训练的监督基准相比,获得了具有竞争力的性能。
{"title":"User Feedback-based Online Learning for Intent Classification","authors":"Kaan Gönç, Baturay Sağlam, Onat Dalmaz, Tolga Çukur, Serdar Kozat, Hamdi Dibeklioglu","doi":"10.1145/3577190.3614137","DOIUrl":"https://doi.org/10.1145/3577190.3614137","url":null,"abstract":"Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GENEA Workshop 2023: The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents GENEA工作坊2023:第四届具身代理非言语行为的产生和评估研讨会
Youngwoo Yoon, Taras Kucherenko, Jieyeon Woo, Pieter Wolfert, Rajmund Nagy, Gustav Eje Henter
Non-verbal behavior is advantageous for embodied agents when interacting with humans. Despite many years of research on the generation of non-verbal behavior, there is no established benchmarking practice in the field. Most researchers do not compare their results to prior work, and if they do, they often do so in a manner that is not compatible with other approaches. The GENEA Workshop 2023 seeks to bring the community together to discuss the major challenges and solutions, and to identify the best ways to progress the field.
在与人类互动时,非语言行为对具身主体是有利的。尽管对非语言行为的产生进行了多年的研究,但在该领域还没有建立基准实践。大多数研究人员不会将他们的结果与先前的工作进行比较,即使他们这样做了,他们也经常以一种与其他方法不兼容的方式进行比较。2023年GENEA研讨会旨在将社区聚集在一起,讨论主要挑战和解决方案,并确定推动该领域发展的最佳途径。
{"title":"GENEA Workshop 2023: The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents","authors":"Youngwoo Yoon, Taras Kucherenko, Jieyeon Woo, Pieter Wolfert, Rajmund Nagy, Gustav Eje Henter","doi":"10.1145/3577190.3616856","DOIUrl":"https://doi.org/10.1145/3577190.3616856","url":null,"abstract":"Non-verbal behavior is advantageous for embodied agents when interacting with humans. Despite many years of research on the generation of non-verbal behavior, there is no established benchmarking practice in the field. Most researchers do not compare their results to prior work, and if they do, they often do so in a manner that is not compatible with other approaches. The GENEA Workshop 2023 seeks to bring the community together to discuss the major challenges and solutions, and to identify the best ways to progress the field.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embracing Contact: Detecting Parent-Infant Interactions 拥抱接触:检测亲子互动
Metehan Doyran, Ronald Poppe, Albert Ali Salah
We focus on a largely overlooked but crucial modality for parent-child interaction analysis: physical contact. In this paper, we provide a feasibility study to automatically detect contact between a parent and child from videos. Our multimodal CNN model uses a combination of 2D pose heatmaps, body part heatmaps, and cropped images. Two datasets (FlickrCI3D and YOUth PCI) are used to explore the generalization capabilities across different contact scenarios. Our experiments demonstrate that using 2D pose heatmaps and body part heatmaps yields the best performance in contact classification when trained from scratch on parent-infant interactions. We further investigate the influence of proximity on our classification performance. Our results indicate that there are unique challenges in parent-infant contact classification. Finally, we show that contact rates from aggregating frame-level predictions provide decent approximations of the true contact rates, suggesting that they can serve as an automated proxy for measuring the quality of parent-child interactions. By releasing the annotations for the YOUth PCI dataset and our code1, we encourage further research to deepen our understanding of parent-infant interactions and their implications for attachment and development.
我们关注的是一种很大程度上被忽视但对亲子互动分析至关重要的方式:身体接触。在本文中,我们提供了一个可行性研究,从视频中自动检测父母和孩子之间的接触。我们的多模态CNN模型使用了2D姿势热图、身体部位热图和裁剪图像的组合。使用两个数据集(FlickrCI3D和YOUth PCI)来探索不同接触场景的泛化能力。我们的实验表明,使用2D姿势热图和身体部位热图在亲子互动的从头开始训练时,在接触分类中产生了最好的性能。我们进一步研究了接近度对分类性能的影响。我们的研究结果表明,在亲子接触分类中存在独特的挑战。最后,我们展示了聚合框架级预测的接触率提供了真实接触率的良好近似值,这表明它们可以作为衡量亲子互动质量的自动代理。通过发布YOUth PCI数据集和我们的code1的注释,我们鼓励进一步的研究,以加深我们对亲子互动及其对依恋和发展的影响的理解。
{"title":"Embracing Contact: Detecting Parent-Infant Interactions","authors":"Metehan Doyran, Ronald Poppe, Albert Ali Salah","doi":"10.1145/3577190.3614147","DOIUrl":"https://doi.org/10.1145/3577190.3614147","url":null,"abstract":"We focus on a largely overlooked but crucial modality for parent-child interaction analysis: physical contact. In this paper, we provide a feasibility study to automatically detect contact between a parent and child from videos. Our multimodal CNN model uses a combination of 2D pose heatmaps, body part heatmaps, and cropped images. Two datasets (FlickrCI3D and YOUth PCI) are used to explore the generalization capabilities across different contact scenarios. Our experiments demonstrate that using 2D pose heatmaps and body part heatmaps yields the best performance in contact classification when trained from scratch on parent-infant interactions. We further investigate the influence of proximity on our classification performance. Our results indicate that there are unique challenges in parent-infant contact classification. Finally, we show that contact rates from aggregating frame-level predictions provide decent approximations of the true contact rates, suggesting that they can serve as an automated proxy for measuring the quality of parent-child interactions. By releasing the annotations for the YOUth PCI dataset and our code1, we encourage further research to deepen our understanding of parent-infant interactions and their implications for attachment and development.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization 基于多尺度块正则化增强音频文本情感识别中缺失数据的复原力
Wei-Cheng Lin, Lucas Goncalves, Carlos Busso
Most existing audio-text emotion recognition studies have focused on the computational modeling aspects, including strategies for fusing the modalities. An area that has received less attention is understanding the role of proper temporal synchronization between the modalities in the model performance. This study presents a transformer-based model designed with a word-chunk concept, which offers an ideal framework to explore different strategies to align text and speech. The approach creates chunks with alternative alignment strategies with different levels of dependency on the underlying lexical boundaries. A key contribution of this study is the multi-scale chunk alignment strategy, which generates random alignments to create the chunks without considering lexical boundaries. For every epoch, the approach generates a different alignment for each sentence, serving as an effective regularization method for temporal dependency. Our experimental results based on the MSP-Podcast corpus indicate that providing precise temporal alignment information to create the audio-text chunks does not improve the performance of the system. The attention mechanisms in the transformer-based approach are able to compensate for imperfect synchronization between the modalities. However, using exact lexical boundaries makes the system highly vulnerable to missing modalities. In contrast, the model trained with the proposed multi-scale chunk regularization strategy using random alignment can significantly increase its robustness against missing data and remain effective, even under a single audio-only emotion recognition task. The code is available at: https://github.com/winston-lin-wei-cheng/MultiScale-Chunk-Regularization
大多数现有的音频文本情感识别研究都集中在计算建模方面,包括融合模式的策略。一个受到较少关注的领域是理解模型性能中模式之间适当的时间同步的作用。本研究提出了一个基于转换器的模型,该模型采用词块概念设计,为探索文本和语音对齐的不同策略提供了一个理想的框架。该方法创建具有可选对齐策略的块,这些对齐策略对底层词法边界的依赖程度不同。本研究的一个关键贡献是多尺度块对齐策略,该策略在不考虑词法边界的情况下生成随机对齐来创建块。对于每个epoch,该方法为每个句子生成不同的对齐,作为一种有效的时间依赖性正则化方法。我们基于MSP-Podcast语料库的实验结果表明,提供精确的时间对齐信息来创建音频-文本块并不能提高系统的性能。基于变压器的方法中的注意机制能够弥补模态之间不完美的同步。然而,使用精确的词法边界使得系统非常容易丢失模态。相比之下,使用随机对齐的多尺度块正则化策略训练的模型可以显著提高其对缺失数据的鲁棒性,并且即使在单一的音频情感识别任务下也保持有效。代码可从https://github.com/winston-lin-wei-cheng/MultiScale-Chunk-Regularization获得
{"title":"Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization","authors":"Wei-Cheng Lin, Lucas Goncalves, Carlos Busso","doi":"10.1145/3577190.3614110","DOIUrl":"https://doi.org/10.1145/3577190.3614110","url":null,"abstract":"Most existing audio-text emotion recognition studies have focused on the computational modeling aspects, including strategies for fusing the modalities. An area that has received less attention is understanding the role of proper temporal synchronization between the modalities in the model performance. This study presents a transformer-based model designed with a word-chunk concept, which offers an ideal framework to explore different strategies to align text and speech. The approach creates chunks with alternative alignment strategies with different levels of dependency on the underlying lexical boundaries. A key contribution of this study is the multi-scale chunk alignment strategy, which generates random alignments to create the chunks without considering lexical boundaries. For every epoch, the approach generates a different alignment for each sentence, serving as an effective regularization method for temporal dependency. Our experimental results based on the MSP-Podcast corpus indicate that providing precise temporal alignment information to create the audio-text chunks does not improve the performance of the system. The attention mechanisms in the transformer-based approach are able to compensate for imperfect synchronization between the modalities. However, using exact lexical boundaries makes the system highly vulnerable to missing modalities. In contrast, the model trained with the proposed multi-scale chunk regularization strategy using random alignment can significantly increase its robustness against missing data and remain effective, even under a single audio-only emotion recognition task. The code is available at: https://github.com/winston-lin-wei-cheng/MultiScale-Chunk-Regularization","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMASD: A Multimodal Dataset for Autism Intervention Analysis 自闭症干预分析的多模态数据集
Jicheng Li, Vuthea Chheang, Pinar Kullu, Eli Brignac, Zhang Guo, Anjana Bhat, Kenneth E. Barner, Roghayeh Leila Barmaki
Autism spectrum disorder (ASD) is a developmental disorder characterized by significant impairments in social communication and difficulties perceiving and presenting communication signals. Machine learning techniques have been widely used to facilitate autism studies and assessments. However, computational models are primarily concentrated on very specific analysis and validated on private, non-public datasets in the autism community, which limits comparisons across models due to privacy-preserving data-sharing complications. This work presents a novel open source privacy-preserving dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions for children with autism. The MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from more than 100 hours of intervention recordings. To promote the privacy of children while offering public access, each sample consists of four privacy-preserving modalities, some of which are derived from original videos: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children. MMASD aims to assist researchers and therapists in understanding children’s cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also inspires downstream social tasks such as action quality assessment and interpersonal synchrony estimation. The dataset is publicly accessible via the MMASD project website.
自闭症谱系障碍(Autism spectrum disorder, ASD)是一种发展性障碍,其特征是社交障碍、感知和呈现沟通信号困难。机器学习技术已被广泛用于促进自闭症的研究和评估。然而,计算模型主要集中在非常具体的分析上,并在自闭症社区的私人、非公共数据集上进行验证,由于隐私保护数据共享的复杂性,这限制了模型之间的比较。这项工作提出了一个新的开源隐私保护数据集,MMASD作为多模态ASD基准数据集,收集自自闭症儿童的游戏治疗干预措施。MMASD包括来自32名自闭症儿童的数据,以及从100多个小时的干预记录中分割出来的1315个数据样本。为了在提供公共访问的同时保护儿童的隐私,每个样本由四种隐私保护模式组成,其中一些模式来源于原始视频:(1)光流,(2)2D骨架,(3)3D骨架,(4)临床医生对儿童ASD的评估评分。MMASD旨在帮助研究人员和治疗师了解儿童的认知状态,在治疗过程中监测他们的进展,并相应地制定治疗计划。它还启发了下游的社会任务,如行动质量评估和人际同步估计。该数据集可通过MMASD项目网站公开访问。
{"title":"MMASD: A Multimodal Dataset for Autism Intervention Analysis","authors":"Jicheng Li, Vuthea Chheang, Pinar Kullu, Eli Brignac, Zhang Guo, Anjana Bhat, Kenneth E. Barner, Roghayeh Leila Barmaki","doi":"10.1145/3577190.3614117","DOIUrl":"https://doi.org/10.1145/3577190.3614117","url":null,"abstract":"Autism spectrum disorder (ASD) is a developmental disorder characterized by significant impairments in social communication and difficulties perceiving and presenting communication signals. Machine learning techniques have been widely used to facilitate autism studies and assessments. However, computational models are primarily concentrated on very specific analysis and validated on private, non-public datasets in the autism community, which limits comparisons across models due to privacy-preserving data-sharing complications. This work presents a novel open source privacy-preserving dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions for children with autism. The MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from more than 100 hours of intervention recordings. To promote the privacy of children while offering public access, each sample consists of four privacy-preserving modalities, some of which are derived from original videos: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children. MMASD aims to assist researchers and therapists in understanding children’s cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also inspires downstream social tasks such as action quality assessment and interpersonal synchrony estimation. The dataset is publicly accessible via the MMASD project website.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Come Fl.. Run with Me: Understanding the Utilization of Drones to Support Recreational Runners' Well Being 来Fl . .与我一起跑:了解无人机的使用,以支持休闲跑步者的健康
Aswin Balasubramaniam
The utilization of drones to assist runners in real-time and post-run remains a promising yet unexplored field within human-drone interaction (HDI). Hence, in my doctoral research, I aim to delve into the concepts and relationships surrounding drones in the context of running, than focusing solely on one specific application. I plan on accomplishing this through a three-stage research plan: 1) investigate the feasibility of drones to support outdoor running research, 2) empathize with runners to assess their preferences and experiences running with drone, and 3) implement and test an interactive running with drone scenario. Each stage has specific objectives and research questions aimed at providing valuable insights into the utilization of drones to support runners. This paper outlines the work conducted during my Ph.D. research along with future plans, with the goal of advancing the knowledge in the field of runner drone interaction.
在人机交互(HDI)中,利用无人机在实时和跑后辅助跑步者仍然是一个有前景但尚未开发的领域。因此,在我的博士研究中,我的目标是深入研究在运行环境中围绕无人机的概念和关系,而不是仅仅关注一个特定的应用程序。我计划通过三个阶段的研究计划来实现这一目标:1)调查无人机支持户外跑步研究的可行性,2)与跑步者感同身受,评估他们使用无人机跑步的偏好和体验,以及3)实施和测试无人机互动跑步场景。每个阶段都有特定的目标和研究问题,旨在为利用无人机支持跑步者提供有价值的见解。本文概述了我在博士研究期间所做的工作以及未来的计划,目标是推进跑步者无人机交互领域的知识。
{"title":"Come Fl.. Run with Me: Understanding the Utilization of Drones to Support Recreational Runners' Well Being","authors":"Aswin Balasubramaniam","doi":"10.1145/3577190.3614228","DOIUrl":"https://doi.org/10.1145/3577190.3614228","url":null,"abstract":"The utilization of drones to assist runners in real-time and post-run remains a promising yet unexplored field within human-drone interaction (HDI). Hence, in my doctoral research, I aim to delve into the concepts and relationships surrounding drones in the context of running, than focusing solely on one specific application. I plan on accomplishing this through a three-stage research plan: 1) investigate the feasibility of drones to support outdoor running research, 2) empathize with runners to assess their preferences and experiences running with drone, and 3) implement and test an interactive running with drone scenario. Each stage has specific objectives and research questions aimed at providing valuable insights into the utilization of drones to support runners. This paper outlines the work conducted during my Ph.D. research along with future plans, with the goal of advancing the knowledge in the field of runner drone interaction.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal information processing in communication: thenature of faces and voices 交际中的多模态信息处理:面孔和声音的性质
Sophie Scott
In this talk I will take a neurobiological perspective on human communication, and explore the ways in which visual and auditory channels express common and distinct patterns of information. I will extend this to that ways in which facial and vocal information is processed neurally and how they interact in communication.
在这次演讲中,我将从神经生物学的角度来探讨人类交流,并探讨视觉和听觉通道如何表达共同和独特的信息模式。我将扩展到面部和声音信息的神经处理方式以及它们如何在交流中相互作用。
{"title":"Multimodal information processing in communication: thenature of faces and voices","authors":"Sophie Scott","doi":"10.1145/3577190.3616523","DOIUrl":"https://doi.org/10.1145/3577190.3616523","url":null,"abstract":"In this talk I will take a neurobiological perspective on human communication, and explore the ways in which visual and auditory channels express common and distinct patterns of information. I will extend this to that ways in which facial and vocal information is processed neurally and how they interact in communication.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1