首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Expanding the Role of Affective Phenomena in Multimodal Interaction Research 扩展情感现象在多模态交互研究中的作用
Leena Mathur, Maja Mataric, Louis-Philippe Morency
In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing trends and identifying understudied areas. We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing: ACM International Conference on Multimodal Interaction, AAAC International Conference on Affective Computing and Intelligent Interaction, Annual Meeting of the Association for Computational Linguistics, and Conference on Empirical Methods in Natural Language Processing. We identified 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find that this body of research has primarily focused on enabling machines to recognize or express affect and emotion; there has been limited research on how affect and emotion predictions might, in turn, be used by AI systems to enhance machine understanding of human social behaviors and cognitive states. Based on our analysis, we discuss directions to expand the role of affective phenomena in multimodal interaction research.
近几十年来,情感计算领域在推进人工智能系统在人机交互过程中识别和表达情感现象(如情感和情绪)的能力方面取得了实质性进展。本文描述了我们对多模态交互和情感计算交叉研究的研究,目的是观察趋势并确定研究不足的领域。我们研究了来自多模态交互、情感计算和自然语言处理的精选会议的16,000多篇论文:ACM多模态交互国际会议、AAAC情感计算和智能交互国际会议、计算语言学协会年会和自然语言处理经验方法会议。我们确定了910篇与情感相关的论文,并对这些论文中情感现象的作用进行了分析。我们发现,这方面的研究主要集中在使机器能够识别或表达情感和情绪;人工智能系统如何利用情感和情绪预测来增强机器对人类社会行为和认知状态的理解,这方面的研究一直很有限。在此基础上,探讨了情感现象在多模态交互研究中的作用拓展方向。
{"title":"Expanding the Role of Affective Phenomena in Multimodal Interaction Research","authors":"Leena Mathur, Maja Mataric, Louis-Philippe Morency","doi":"10.1145/3577190.3614171","DOIUrl":"https://doi.org/10.1145/3577190.3614171","url":null,"abstract":"In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing trends and identifying understudied areas. We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing: ACM International Conference on Multimodal Interaction, AAAC International Conference on Affective Computing and Intelligent Interaction, Annual Meeting of the Association for Computational Linguistics, and Conference on Empirical Methods in Natural Language Processing. We identified 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find that this body of research has primarily focused on enabling machines to recognize or express affect and emotion; there has been limited research on how affect and emotion predictions might, in turn, be used by AI systems to enhance machine understanding of human social behaviors and cognitive states. Based on our analysis, we discuss directions to expand the role of affective phenomena in multimodal interaction research.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACE: how Artificial Character Embodiment shapes user behaviour in multi-modal interaction ACE:在多模态交互中,人工角色体现如何塑造用户行为
Eleonora Ceccaldi, Beatrice Biancardi, Sara Falcone, Silvia Ferrando, Geoffrey Gorisse, Thomas Janssoone, Anna Martin Coesel, Pierre Raimbaud
The ACE - how Artificial Character Embodiment shapes user behavior in multi-modal interactions - workshop aims to bring together researchers, practitioners and experts on the topic of embodiment, to analyze and foster discussion on its effects on user behavior in multi-modal interaction. ACE is aimed at stimulating multidisciplinary discussions on the topic, sharing recent progress, and providing participants with a forum to debate current and future challenges. The workshop includes contributions from computational, neuroscientific and psychological perspectives, as well as technical applications.
ACE——人工角色体现如何在多模态交互中塑造用户行为——研讨会旨在将研究人员、实践者和专家聚集在一起,分析和促进对其在多模态交互中对用户行为的影响的讨论。ACE旨在促进多学科的讨论,分享最近的进展,并为参与者提供一个讨论当前和未来挑战的论坛。研讨会包括来自计算、神经科学和心理学的观点,以及技术应用的贡献。
{"title":"ACE: how Artificial Character Embodiment shapes user behaviour in multi-modal interaction","authors":"Eleonora Ceccaldi, Beatrice Biancardi, Sara Falcone, Silvia Ferrando, Geoffrey Gorisse, Thomas Janssoone, Anna Martin Coesel, Pierre Raimbaud","doi":"10.1145/3577190.3617134","DOIUrl":"https://doi.org/10.1145/3577190.3617134","url":null,"abstract":"The ACE - how Artificial Character Embodiment shapes user behavior in multi-modal interactions - workshop aims to bring together researchers, practitioners and experts on the topic of embodiment, to analyze and foster discussion on its effects on user behavior in multi-modal interaction. ACE is aimed at stimulating multidisciplinary discussions on the topic, sharing recent progress, and providing participants with a forum to debate current and future challenges. The workshop includes contributions from computational, neuroscientific and psychological perspectives, as well as technical applications.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models in textual analysis for gesture selection 用于手势选择的文本分析中的大型语言模型
Laura Birka Hensel, Nutchanon Yongsatianchot, Parisa Torshizi, Elena Minucci, Stacy Marsella
Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using large language models (LLMs) to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.
手势具有多种交流功能,有力地影响着人类面对面的互动。这种交际功能的实现方式因人而异,取决于说话者的角色和互动的语境。自动手势生成的方法不仅在依赖数据驱动技术的程度上有所不同,而且在产生上下文和说话人特定手势的程度上也有所不同。然而,这些方法面临两个主要挑战:第一个挑战是获得适合上下文和应用程序目标的足够的训练数据。第二个与设计人员控制有关,以实现他们对应用程序的特定意图。在这里,我们通过使用大型语言模型(llm)来解决这些挑战,以表明这些强大的大量数据模型可以用于手势分析和生成。具体来说,我们使用ChatGPT作为一种工具来建议上下文特定的手势,这些手势可以基于最小的提示来实现设计师的意图。我们还发现,ChatGPT可以建议在最小的训练数据中不存在的新颖而适当的手势。llm的使用是手势生成的一个很有前途的途径,它减少了对费力的注释的需要,并且有可能灵活快速地适应不同的设计意图。
{"title":"Large language models in textual analysis for gesture selection","authors":"Laura Birka Hensel, Nutchanon Yongsatianchot, Parisa Torshizi, Elena Minucci, Stacy Marsella","doi":"10.1145/3577190.3614158","DOIUrl":"https://doi.org/10.1145/3577190.3614158","url":null,"abstract":"Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using large language models (LLMs) to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Explainability for Bias Mitigation: A Case Study for Fair Recruitment Assessment 利用可解释性减轻偏见:公平招聘评估的案例研究
Gizem Sogancioglu, Heysem Kaya, Albert Ali Salah
In this study, we propose a bias-mitigation algorithm, dubbed ProxyMute, that uses an explainability method to detect proxy features of a given sensitive attribute (e.g., gender) and reduces their effects on decisions by disabling them during prediction time. We evaluate our method for a job recruitment use-case, on two different multimodal datasets, namely, FairCVdb and ChaLearn LAP-FI. The exhaustive set of experiments shows that information regarding the proxy features that are provided by explainability methods is beneficial and can be successfully used for the problem of bias mitigation. Furthermore, when combined with a target label normalization method, the proposed approach shows a good performance by yielding one of the fairest results without deteriorating the performance significantly compared to previous works on both experimental datasets. The scripts to reproduce the results are available at: https://github.com/gizemsogancioglu/expl-bias-mitigation.
在这项研究中,我们提出了一种偏见缓解算法,称为ProxyMute,它使用可解释性方法来检测给定敏感属性(例如,性别)的代理特征,并通过在预测期间禁用它们来减少它们对决策的影响。我们在两个不同的多模态数据集(即FairCVdb和ChaLearn LAP-FI)上评估了我们的招聘用例方法。详尽的一组实验表明,可解释性方法提供的关于代理特征的信息是有益的,可以成功地用于减轻偏差的问题。此外,当与目标标签归一化方法相结合时,与之前在两个实验数据集上的工作相比,所提出的方法显示出良好的性能,产生了最公平的结果之一,而不会显着降低性能。复制结果的脚本可在:https://github.com/gizemsogancioglu/expl-bias-mitigation获得。
{"title":"Using Explainability for Bias Mitigation: A Case Study for Fair Recruitment Assessment","authors":"Gizem Sogancioglu, Heysem Kaya, Albert Ali Salah","doi":"10.1145/3577190.3614170","DOIUrl":"https://doi.org/10.1145/3577190.3614170","url":null,"abstract":"In this study, we propose a bias-mitigation algorithm, dubbed ProxyMute, that uses an explainability method to detect proxy features of a given sensitive attribute (e.g., gender) and reduces their effects on decisions by disabling them during prediction time. We evaluate our method for a job recruitment use-case, on two different multimodal datasets, namely, FairCVdb and ChaLearn LAP-FI. The exhaustive set of experiments shows that information regarding the proxy features that are provided by explainability methods is beneficial and can be successfully used for the problem of bias mitigation. Furthermore, when combined with a target label normalization method, the proposed approach shows a good performance by yielding one of the fairest results without deteriorating the performance significantly compared to previous works on both experimental datasets. The scripts to reproduce the results are available at: https://github.com/gizemsogancioglu/expl-bias-mitigation.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Assessment of Pain (AAP) 疼痛自动评估(AAP)
Zakia Hammal, Steffen Walter, Nadia Berthouze
Pain communication varies, with some patients being highly expressive regarding their pain and others exhibiting stoic forbearance and minimal verbal account of discomfort. Considerable progress has been made in defining behavioral indices of pain [1-3]. An abundant literature shows that a limited subset of facial movements, in several non-human species, encode pain intensity across the lifespan [2]. To advance reliable pain monitoring, automated assessment of pain is emerging as a powerful mean to realize that goal. Though progress has been made, this field remains in its infancy. The workshop aims to promote current research and support growth of interdisciplinary collaborations to advance this groundbreaking research.
疼痛的沟通方式各不相同,一些患者对自己的疼痛表现出高度的表达,而另一些患者表现出坚忍的忍耐,对不适的言语描述很少。在定义疼痛行为指标方面已经取得了相当大的进展[1-3]。大量文献表明,在一些非人类物种中,面部运动的有限子集在整个生命周期中编码疼痛强度[2]。为了推进可靠的疼痛监测,疼痛的自动评估正在成为实现这一目标的有力手段。虽然取得了进展,但这一领域仍处于起步阶段。研讨会旨在促进当前的研究和支持跨学科合作的增长,以推进这一突破性的研究。
{"title":"Automated Assessment of Pain (AAP)","authors":"Zakia Hammal, Steffen Walter, Nadia Berthouze","doi":"10.1145/3577190.3617147","DOIUrl":"https://doi.org/10.1145/3577190.3617147","url":null,"abstract":"Pain communication varies, with some patients being highly expressive regarding their pain and others exhibiting stoic forbearance and minimal verbal account of discomfort. Considerable progress has been made in defining behavioral indices of pain [1-3]. An abundant literature shows that a limited subset of facial movements, in several non-human species, encode pain intensity across the lifespan [2]. To advance reliable pain monitoring, automated assessment of pain is emerging as a powerful mean to realize that goal. Though progress has been made, this field remains in its infancy. The workshop aims to promote current research and support growth of interdisciplinary collaborations to advance this groundbreaking research.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity 为COPD评估注入新活力:多感官家庭监测预测严重程度
Zixuan Xiao, Michal Muszynski, Ričards Marcinkevičs, Lukas Zimmerli, Adam Daniel Ivankay, Dario Kohlbrenner, Manuel Kuhn, Yves Nordmann, Ulrich Muehlner, Christian Clarenbach, Julia E. Vogt, Thomas Brunschwiler
Chronic obstructive pulmonary disease (COPD) is a significant public health issue, affecting more than 100 million people worldwide. Remote patient monitoring has shown great promise in the efficient management of patients with chronic diseases. This work presents the analysis of the data from a monitoring system developed to track COPD symptoms alongside patients’ self-reports. In particular, we investigate the assessment of COPD severity using multisensory home-monitoring device data acquired from 30 patients over a period of three months. We describe a comprehensive data pre-processing and feature engineering pipeline for multimodal data from the remote home-monitoring of COPD patients. We develop and validate predictive models forecasting i) the absolute and ii) differenced COPD Assessment Test (CAT) scores based on the multisensory data. The best obtained models achieve Pearson’s correlation coefficient of 0.93 and 0.37 for absolute and differenced CAT scores. In addition, we investigate the importance of individual sensor modalities for predicting CAT scores using group sparse regularization techniques. Our results suggest that feature groups indicative of the patient’s general condition, such as static medical and physiological information, date, spirometer, and air quality, are crucial for predicting the absolute CAT score. For predicting changes in CAT scores, sleep and physical activity features are most important, alongside the previous CAT score value. Our analysis demonstrates the potential of remote patient monitoring for COPD management and investigates which sensor modalities are most indicative of COPD severity as assessed by the CAT score. Our findings contribute to the development of effective and data-driven COPD management strategies.
慢性阻塞性肺疾病(COPD)是一个重大的公共卫生问题,影响到全世界1亿多人。远程患者监测在慢性病患者的有效管理方面显示出巨大的前景。这项工作分析了来自监测系统的数据,该系统用于跟踪COPD症状和患者的自我报告。特别地,我们研究了在三个月内从30名患者中获得的多感官家庭监测设备数据对COPD严重程度的评估。我们描述了一种全面的数据预处理和特征工程管道,用于远程家庭监测COPD患者的多模式数据。我们开发并验证了基于多感官数据预测i)绝对和ii)差异COPD评估测试(CAT)分数的预测模型。获得的最佳模型的绝对CAT评分和差异CAT评分的Pearson相关系数分别为0.93和0.37。此外,我们研究了使用组稀疏正则化技术预测CAT分数的单个传感器模式的重要性。我们的研究结果表明,指示患者一般状况的特征组,如静态医学和生理信息、日期、肺活量计和空气质量,对于预测绝对CAT评分至关重要。为了预测CAT评分的变化,除了之前的CAT评分值外,睡眠和身体活动特征是最重要的。我们的分析证明了远程患者监测COPD管理的潜力,并调查了CAT评分评估的哪种传感器模式最能指示COPD严重程度。我们的研究结果有助于制定有效的数据驱动的COPD管理策略。
{"title":"Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity","authors":"Zixuan Xiao, Michal Muszynski, Ričards Marcinkevičs, Lukas Zimmerli, Adam Daniel Ivankay, Dario Kohlbrenner, Manuel Kuhn, Yves Nordmann, Ulrich Muehlner, Christian Clarenbach, Julia E. Vogt, Thomas Brunschwiler","doi":"10.1145/3577190.3614109","DOIUrl":"https://doi.org/10.1145/3577190.3614109","url":null,"abstract":"Chronic obstructive pulmonary disease (COPD) is a significant public health issue, affecting more than 100 million people worldwide. Remote patient monitoring has shown great promise in the efficient management of patients with chronic diseases. This work presents the analysis of the data from a monitoring system developed to track COPD symptoms alongside patients’ self-reports. In particular, we investigate the assessment of COPD severity using multisensory home-monitoring device data acquired from 30 patients over a period of three months. We describe a comprehensive data pre-processing and feature engineering pipeline for multimodal data from the remote home-monitoring of COPD patients. We develop and validate predictive models forecasting i) the absolute and ii) differenced COPD Assessment Test (CAT) scores based on the multisensory data. The best obtained models achieve Pearson’s correlation coefficient of 0.93 and 0.37 for absolute and differenced CAT scores. In addition, we investigate the importance of individual sensor modalities for predicting CAT scores using group sparse regularization techniques. Our results suggest that feature groups indicative of the patient’s general condition, such as static medical and physiological information, date, spirometer, and air quality, are crucial for predicting the absolute CAT score. For predicting changes in CAT scores, sleep and physical activity features are most important, alongside the previous CAT score value. Our analysis demonstrates the potential of remote patient monitoring for COPD management and investigates which sensor modalities are most indicative of COPD severity as assessed by the CAT score. Our findings contribute to the development of effective and data-driven COPD management strategies.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents 桥接多媒体模式:增强多模态人工智能理解和智能代理
Sushant Gautam
With the increasing availability of multimodal data, especially in the sports and medical domains, there is growing interest in developing Artificial Intelligence (AI) models capable of comprehending the world in a more holistic manner. Nevertheless, various challenges exist in multimodal understanding, including the integration of multiple modalities and the resolution of semantic gaps between them. The proposed research aims to leverage multiple input modalities for the multimodal understanding of AI models, enhancing their reasoning, generation, and intelligent behavior. The research objectives focus on developing novel methods for multimodal AI, integrating them into conversational agents with optimizations for domain-specific requirements. The research methodology encompasses literature review, data curation, model development and implementation, evaluation and performance analysis, domain-specific applications, and documentation and reporting. Ethical considerations will be thoroughly addressed, and a comprehensive research plan is outlined to provide guidance. The research contributes to the field of multimodal AI understanding and the advancement of sophisticated AI systems by experimenting with multimodal data to enhance the performance of state-of-the-art neural networks.
随着多模式数据的日益可用性,特别是在体育和医学领域,人们对开发能够以更全面的方式理解世界的人工智能(AI)模型越来越感兴趣。然而,在多模态理解中存在着各种挑战,包括多模态的整合和它们之间语义差距的解决。提出的研究旨在利用多种输入模式来理解人工智能模型,增强其推理、生成和智能行为。研究目标侧重于开发多模态人工智能的新方法,将它们集成到具有特定领域需求优化的会话代理中。研究方法包括文献回顾、数据管理、模型开发和实现、评估和性能分析、特定领域的应用程序以及文档和报告。伦理考虑将彻底解决,并概述了一个全面的研究计划,以提供指导。该研究通过对多模态数据进行实验,以提高最先进的神经网络的性能,为多模态人工智能理解领域和复杂人工智能系统的进步做出了贡献。
{"title":"Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents","authors":"Sushant Gautam","doi":"10.1145/3577190.3614225","DOIUrl":"https://doi.org/10.1145/3577190.3614225","url":null,"abstract":"With the increasing availability of multimodal data, especially in the sports and medical domains, there is growing interest in developing Artificial Intelligence (AI) models capable of comprehending the world in a more holistic manner. Nevertheless, various challenges exist in multimodal understanding, including the integration of multiple modalities and the resolution of semantic gaps between them. The proposed research aims to leverage multiple input modalities for the multimodal understanding of AI models, enhancing their reasoning, generation, and intelligent behavior. The research objectives focus on developing novel methods for multimodal AI, integrating them into conversational agents with optimizations for domain-specific requirements. The research methodology encompasses literature review, data curation, model development and implementation, evaluation and performance analysis, domain-specific applications, and documentation and reporting. Ethical considerations will be thoroughly addressed, and a comprehensive research plan is outlined to provide guidance. The research contributes to the field of multimodal AI understanding and the advancement of sophisticated AI systems by experimenting with multimodal data to enhance the performance of state-of-the-art neural networks.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conversational Grounding in Multimodal Dialog Systems 多模态对话系统的会话基础
Biswesh Mohapatra
The process of “conversational grounding” is an interactive process that has been studied extensively in cognitive science, whereby participants in a conversation check to make sure their interlocutors understand what is being referred to. This interactive process uses multiple modes of communication to establish the information between the participants. This could include information provided through eye-gaze, head movements, intonation in speech, along with the content of the speech. While the process is essential to successful communication between humans and between humans and machines, work needs to be done on testing and building the capabilities of the current dialogue system in managing conversational grounding, especially in multimodal medium of communication. Recent work such as Benotti and Blackburn [3] have shown the importance of conversational grounding in dialog systems and how current systems fail in them which is essential for the advancement of Embodied Conversational Agents and Social Robots. Thus my Ph.D. project aims to test, understand and improve the functioning of current dialog models with respect to Conversational Grounding.
“会话基础”过程是一个互动过程,在认知科学中得到了广泛的研究,在这个过程中,对话的参与者会检查以确保他们的对话者理解所提到的内容。这种交互过程使用多种通信模式来建立参与者之间的信息。这可能包括通过眼睛注视、头部运动、讲话语调以及讲话内容提供的信息。虽然这一过程对于人与人之间以及人与机器之间的成功交流是必不可少的,但需要在测试和建立当前对话系统管理对话基础的能力方面进行工作,特别是在多模式交流媒介方面。最近的工作,如Benotti和Blackburn[3]已经显示了对话系统中会话基础的重要性,以及当前系统如何在对话系统中失败,这对于具体化会话代理和社交机器人的进步至关重要。因此,我的博士项目旨在测试、理解和改进当前对话模型在会话基础方面的功能。
{"title":"Conversational Grounding in Multimodal Dialog Systems","authors":"Biswesh Mohapatra","doi":"10.1145/3577190.3614226","DOIUrl":"https://doi.org/10.1145/3577190.3614226","url":null,"abstract":"The process of “conversational grounding” is an interactive process that has been studied extensively in cognitive science, whereby participants in a conversation check to make sure their interlocutors understand what is being referred to. This interactive process uses multiple modes of communication to establish the information between the participants. This could include information provided through eye-gaze, head movements, intonation in speech, along with the content of the speech. While the process is essential to successful communication between humans and between humans and machines, work needs to be done on testing and building the capabilities of the current dialogue system in managing conversational grounding, especially in multimodal medium of communication. Recent work such as Benotti and Blackburn [3] have shown the importance of conversational grounding in dialog systems and how current systems fail in them which is essential for the advancement of Embodied Conversational Agents and Social Robots. Thus my Ph.D. project aims to test, understand and improve the functioning of current dialog models with respect to Conversational Grounding.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Mixed Effects for Nonlinear Personalized Predictions 非线性个性化预测的神经混合效应
Torsten Wörtwein, Nicholas B. Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, Louis-Philippe Morency
Personalized prediction is a machine learning approach that predicts a person’s future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner1. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother’s depression symptoms.
个性化预测是一种机器学习方法,它根据一个人过去标记的观察结果来预测一个人未来的观察结果,通常用于顺序任务,例如预测日常情绪评级。当进行个性化预测时,模型可以结合两种类型的趋势:(a)人与人之间共享的趋势,即个人一般的趋势,例如周末更快乐;(b)每个人的独特趋势,即个人特定的趋势,例如压力大的每周会议。混合效应模型是一种流行的统计模型,通过结合人的一般和特定参数来研究这两种趋势。尽管通过将线性混合效应模型与神经网络集成在一起,在机器学习中越来越受欢迎,但这些集成目前仅限于线性的个人特定参数:排除了非线性的个人特定趋势。在本文中,我们提出了神经混合效应(NME)模型,以可扩展的方式优化神经网络中任何位置的非线性个人特定参数1。NME将神经网络优化的效率与非线性混合效应建模相结合。根据经验,我们观察到NME在六个单模态和多模态数据集上提高了性能,包括预测日常情绪的智能手机数据集和预测一半母亲经历抑郁症状的母亲-青少年数据集的情感状态序列。此外,我们评估了两种模型架构的NME,包括用于预测情感状态序列的神经条件随机场(CRF),其中CRF学习情感状态之间的非线性个人特定时间转换。对母亲-青少年数据集的这些个人特异性转变的分析显示了与母亲抑郁症状相关的可解释趋势。
{"title":"Neural Mixed Effects for Nonlinear Personalized Predictions","authors":"Torsten Wörtwein, Nicholas B. Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, Louis-Philippe Morency","doi":"10.1145/3577190.3614115","DOIUrl":"https://doi.org/10.1145/3577190.3614115","url":null,"abstract":"Personalized prediction is a machine learning approach that predicts a person’s future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner1. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother’s depression symptoms.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Component attention network for multimodal dance improvisation recognition 多模态舞蹈即兴识别的分量注意网络
Jia Fu, Jiarui Tan, Wenjie Yin, Sepideh Pashami, Mårten Björkman
Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance.
舞蹈即兴创作是艺术界一个活跃的研究课题。即兴舞蹈由于其独特的动态特性,动作分析具有一定的挑战性。数据驱动的舞蹈动作分析,包括识别和生成,通常仅限于骨骼数据。然而,其他形式的数据,如音频,可以被记录下来,并有利于下游任务。本文探讨了多模态融合方法在舞蹈即兴环境下人体动作识别中的应用与实现。我们提出了一种基于注意力的多模态融合模型,即组件注意力网络(CANet),该模型在三个层面上进行融合:1)与CANet的特征融合,2)与CANet和图卷积网络(GCN)的模型融合,3)与投票策略的后期融合。我们进行了深入的实验,分析了不同融合方法中每种模态的影响,并区分了关键的时间或成分特征。我们表明,我们提出的模型优于两种基线方法,展示了其分析舞蹈即兴表演的潜力。
{"title":"Component attention network for multimodal dance improvisation recognition","authors":"Jia Fu, Jiarui Tan, Wenjie Yin, Sepideh Pashami, Mårten Björkman","doi":"10.1145/3577190.3614114","DOIUrl":"https://doi.org/10.1145/3577190.3614114","url":null,"abstract":"Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1