首页 > 最新文献

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence最新文献

英文 中文
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health. 利用适应性匪徒实验来提高和调查心理健康方面的参与度。
Pub Date : 2024-03-25 Epub Date: 2024-03-24 DOI: 10.1609/aaai.v38i21.30328
Harsh Kumar, Tong Li, Jiakai Shi, Ilya Musabirov, Rachel Kornfield, Jonah Meyerhoff, Ananya Bhattacharjee, Chris Karr, Theresa Nguyen, David Mohr, Anna Rafferty, Sofia Villar, Nina Deliu, Joseph Jay Williams

Digital mental health (DMH) interventions, such as text-message-based lessons and activities, offer immense potential for accessible mental health support. While these interventions can be effective, real-world experimental testing can further enhance their design and impact. Adaptive experimentation, utilizing algorithms like Thompson Sampling for (contextual) multi-armed bandit (MAB) problems, can lead to continuous improvement and personalization. However, it remains unclear when these algorithms can simultaneously increase user experience rewards and facilitate appropriate data collection for social-behavioral scientists to analyze with sufficient statistical confidence. Although a growing body of research addresses the practical and statistical aspects of MAB and other adaptive algorithms, further exploration is needed to assess their impact across diverse real-world contexts. This paper presents a software system developed over two years that allows text-messaging intervention components to be adapted using bandit and other algorithms while collecting data for side-by-side comparison with traditional uniform random non-adaptive experiments. We evaluate the system by deploying a text-message-based DMH intervention to 1100 users, recruited through a large mental health non-profit organization, and share the path forward for deploying this system at scale. This system not only enables applications in mental health but could also serve as a model testbed for adaptive experimentation algorithms in other domains.

数字心理健康(DMH)干预措施,如基于短信的课程和活动,为提供便捷的心理健康支持提供了巨大的潜力。虽然这些干预措施可能很有效,但真实世界的实验测试可以进一步加强其设计和影响。利用汤普森采样(Thompson Sampling)等算法对(上下文)多臂匪徒(MAB)问题进行自适应实验,可以实现持续改进和个性化。然而,目前仍不清楚这些算法何时能同时提高用户体验奖励,并促进适当的数据收集,使社会行为科学家能够以足够的统计信心进行分析。尽管越来越多的研究涉及到了 MAB 和其他自适应算法的实用性和统计方面,但仍需进一步探索,以评估它们在不同现实环境中的影响。本文介绍了一个历时两年开发的软件系统,该系统允许使用强盗算法和其他算法调整文本信息干预组件,同时收集数据,以便与传统的统一随机非适应性实验进行并排比较。我们通过向 1100 名用户部署基于文本信息的 DMH 干预来评估该系统,这些用户是通过一家大型心理健康非营利组织招募的,我们还分享了大规模部署该系统的前进之路。该系统不仅可以应用于心理健康领域,还可以作为自适应实验算法在其他领域的示范测试平台。
{"title":"Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health.","authors":"Harsh Kumar, Tong Li, Jiakai Shi, Ilya Musabirov, Rachel Kornfield, Jonah Meyerhoff, Ananya Bhattacharjee, Chris Karr, Theresa Nguyen, David Mohr, Anna Rafferty, Sofia Villar, Nina Deliu, Joseph Jay Williams","doi":"10.1609/aaai.v38i21.30328","DOIUrl":"https://doi.org/10.1609/aaai.v38i21.30328","url":null,"abstract":"<p><p>Digital mental health (DMH) interventions, such as text-message-based lessons and activities, offer immense potential for accessible mental health support. While these interventions can be effective, real-world experimental testing can further enhance their design and impact. Adaptive experimentation, utilizing algorithms like Thompson Sampling for (contextual) multi-armed bandit (MAB) problems, can lead to continuous improvement and personalization. However, it remains unclear when these algorithms can simultaneously increase user experience rewards and facilitate appropriate data collection for social-behavioral scientists to analyze with sufficient statistical confidence. Although a growing body of research addresses the practical and statistical aspects of MAB and other adaptive algorithms, further exploration is needed to assess their impact across diverse real-world contexts. This paper presents a software system developed over two years that allows text-messaging intervention components to be adapted using bandit and other algorithms while collecting data for side-by-side comparison with traditional uniform random non-adaptive experiments. We evaluate the system by deploying a text-message-based DMH intervention to 1100 users, recruited through a large mental health non-profit organization, and share the path forward for deploying this system at scale. This system not only enables applications in mental health but could also serve as a model testbed for adaptive experimentation algorithms in other domains.</p>","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"38 21","pages":"22906-22912"},"PeriodicalIF":0.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140866976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pharmacokinetics-Informed Neural Network for Predicting Opioid Administration Moments with Wearable Sensors. 利用可穿戴传感器预测阿片类药物用药时刻的药代动力学神经网络。
Pub Date : 2024-02-01 Epub Date: 2024-03-24 DOI: 10.1609/aaai.v38i21.30326
Bhanu Teja Gullapalli, Stephanie Carreiro, Brittany P Chapman, Eric L Garland, Tauhidur Rahman

Long-term and high-dose prescription opioid use places individuals at risk for opioid misuse, opioid use disorder (OUD), and overdose. Existing methods for monitoring opioid use and detecting misuse rely on self-reports, which are prone to reporting bias, and toxicology testing, which may be infeasible in outpatient settings. Although wearable technologies for monitoring day-to-day health metrics have gained significant traction in recent years due to their ease of use, flexibility, and advancements in sensor technology, their application within the opioid use space remains underexplored. In the current work, we demonstrate that oral opioid administrations can be detected using physiological signals collected from a wrist sensor. More importantly, we show that models informed by opioid pharmacokinetics increase reliability in predicting the timing of opioid administrations. Forty-two individuals who were prescribed opioids as a part of their medical treatment in-hospital and after discharge were enrolled. Participants wore a wrist sensor throughout the study, while opioid administrations were tracked using electronic medical records and self-reports. We collected 1,983 hours of sensor data containing 187 opioid administrations from the inpatient setting and 927 hours of sensor data containing 40 opioid administrations from the outpatient setting. We demonstrate that a self-supervised pre-trained model, capable of learning the canonical time series of plasma concentration of the drug derived from opioid pharmacokinetics, can reliably detect opioid administration in both settings. Our work suggests the potential of pharmacokinetic-informed, data-driven models to objectively detect opioid use in daily life.

长期和大剂量使用处方类阿片会使患者面临类阿片滥用、类阿片使用障碍(OUD)和用药过量的风险。监测阿片类药物使用和检测滥用的现有方法依赖于自我报告和毒理学检测,前者容易产生报告偏差,后者在门诊环境中可能不可行。近年来,用于监测日常健康指标的可穿戴技术因其易用性、灵活性和传感器技术的进步而备受青睐,但其在阿片类药物使用领域的应用仍未得到充分探索。在目前的工作中,我们证明了可以利用腕部传感器收集的生理信号检测口服阿片类药物的情况。更重要的是,我们证明了根据阿片类药物动力学建立的模型可以提高预测阿片类药物给药时间的可靠性。我们招募了 42 名在院内和出院后接受阿片类药物治疗的患者。在整个研究过程中,受试者一直佩戴着腕部传感器,同时使用电子病历和自我报告跟踪阿片类药物的给药情况。我们收集了 1,983 个小时的传感器数据,其中包括 187 次住院阿片类药物给药,以及 927 个小时的传感器数据,其中包括 40 次门诊阿片类药物给药。我们证明,自我监督预训练模型能够学习从阿片类药物动力学中得出的药物血浆浓度的典型时间序列,能够可靠地检测两种环境中的阿片类药物给药情况。我们的工作表明,以药代动力学为依据的数据驱动模型具有在日常生活中客观检测阿片类药物使用情况的潜力。
{"title":"Pharmacokinetics-Informed Neural Network for Predicting Opioid Administration Moments with Wearable Sensors.","authors":"Bhanu Teja Gullapalli, Stephanie Carreiro, Brittany P Chapman, Eric L Garland, Tauhidur Rahman","doi":"10.1609/aaai.v38i21.30326","DOIUrl":"10.1609/aaai.v38i21.30326","url":null,"abstract":"<p><p>Long-term and high-dose prescription opioid use places individuals at risk for opioid misuse, opioid use disorder (OUD), and overdose. Existing methods for monitoring opioid use and detecting misuse rely on self-reports, which are prone to reporting bias, and toxicology testing, which may be infeasible in outpatient settings. Although wearable technologies for monitoring day-to-day health metrics have gained significant traction in recent years due to their ease of use, flexibility, and advancements in sensor technology, their application within the opioid use space remains underexplored. In the current work, we demonstrate that oral opioid administrations can be detected using physiological signals collected from a wrist sensor. More importantly, we show that models informed by opioid pharmacokinetics increase reliability in predicting the timing of opioid administrations. Forty-two individuals who were prescribed opioids as a part of their medical treatment in-hospital and after discharge were enrolled. Participants wore a wrist sensor throughout the study, while opioid administrations were tracked using electronic medical records and self-reports. We collected 1,983 hours of sensor data containing 187 opioid administrations from the inpatient setting and 927 hours of sensor data containing 40 opioid administrations from the outpatient setting. We demonstrate that a self-supervised pre-trained model, capable of learning the canonical time series of plasma concentration of the drug derived from opioid pharmacokinetics, can reliably detect opioid administration in both settings. Our work suggests the potential of pharmacokinetic-informed, data-driven models to objectively detect opioid use in daily life.</p>","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"38 21","pages":"22892-22898"},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11027727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140861820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity. IGAMT:具有异质性和不规则性的隐私保护电子病历综合。
Pub Date : 2024-01-01 Epub Date: 2024-03-24 DOI: 10.1609/aaai.v38i14.29491
Wenjie Wang, Pengfei Tang, Jian Lou, Yuanming Shao, Lance Waller, Yi-An Ko, Li Xiong

Utilizing electronic health records (EHR) for machine learning-driven clinical research has great potential to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary use of EHR data is regulated, constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the privacy risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist: addressing the heterogeneity of features including temporal and non-temporal features, structurally missing values, and irregularity of the temporal measures, and ensuring rigorous privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose IGAMT, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintains high quality with heterogeneous features, missing values, and irregular measures but also achieves differential privacy with enhanced privacy-utility trade-off. Extensive experiments prove that IGAMT significantly outperforms baseline and state-of-the-art models in terms of resemblance to real data and performance of downstream applications. Ablation studies also prove the effectiveness of the techniques applied in IGAMT.

利用电子健康记录(EHR)进行机器学习驱动的临床研究,在增强结果预测和治疗个性化方面具有巨大的潜力。然而,出于隐私和安全考虑,电子病历数据的二次使用受到了监管,限制了研究人员对电子病历数据的访问。使用深度学习方法生成合成EHR数据是一种可行且有前途的方法,可以减轻隐私问题,不仅为下游应用程序提供补充资源,还可以避免与真实患者数据相关的隐私风险。虽然之前的工作集中在EHR数据合成上,但仍然存在重大挑战:解决特征的异质性,包括时间和非时间特征、结构缺失值和时间度量的不规则性,并确保用于模型训练的真实数据的严格隐私。该领域的现有工作只集中于解决上述一两个挑战。在这项工作中,我们提出了IGAMT,这是一个创新的框架,用于生成隐私保护的合成电子病历数据,该数据不仅具有异构特征,缺失值和不规则度量,而且可以通过增强隐私-效用权衡来实现差分隐私。大量的实验证明,IGAMT在与真实数据的相似性和下游应用程序的性能方面明显优于基线和最先进的模型。消融研究也证明了应用于IGAMT的技术的有效性。
{"title":"IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity.","authors":"Wenjie Wang, Pengfei Tang, Jian Lou, Yuanming Shao, Lance Waller, Yi-An Ko, Li Xiong","doi":"10.1609/aaai.v38i14.29491","DOIUrl":"https://doi.org/10.1609/aaai.v38i14.29491","url":null,"abstract":"<p><p>Utilizing electronic health records (EHR) for machine learning-driven clinical research has great potential to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary use of EHR data is regulated, constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the privacy risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist: addressing the heterogeneity of features including temporal and non-temporal features, structurally missing values, and irregularity of the temporal measures, and ensuring rigorous privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose <i>IGAMT</i>, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintains high quality with heterogeneous features, missing values, and irregular measures but also achieves differential privacy with enhanced privacy-utility trade-off. Extensive experiments prove that <i>IGAMT</i> significantly outperforms baseline and state-of-the-art models in terms of resemblance to real data and performance of downstream applications. Ablation studies also prove the effectiveness of the techniques applied in <i>IGAMT</i>.</p>","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"38 14","pages":"15634-15643"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation 更正:3D- togo:迈向文本引导的跨类别3D对象生成
Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu
The Original Article was published on 26 June 2023.  
原文发表于2023年6月26日。
{"title":"Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation","authors":"Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu","doi":"10.1609/aaai.v37i13.27320","DOIUrl":"https://doi.org/10.1609/aaai.v37i13.27320","url":null,"abstract":"The Original Article was published on 26 June 2023. \u0000 ","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82809113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Deep Generative Models for Remote Medical Applications 远程医疗应用的多模态深度生成模型
Catherine Ordun
Visible-to-Thermal (VT) face translation is an under-studied problem of image-to-image translation that offers an AI-enabled alternative to traditional thermal sensors. Over three phases, my Doctoral Proposal explores developing multimodal deep generative solutions that can be applied towards telemedicine applications. These include the contribution of a novel Thermal Face Contrastive GAN (TFC-GAN), exploration of hybridized diffusion-GAN models, application on real clinical thermal data at the National Institutes of Health, and exploration of strategies for Federated Learning (FL) in heterogenous data settings.
可视到热(VT)人脸翻译是一个尚未得到充分研究的图像到图像翻译问题,它为传统的热传感器提供了一种人工智能支持的替代方案。在三个阶段,我的博士提案探索开发可应用于远程医疗应用的多模态深度生成解决方案。其中包括一种新型热面对比GAN (TFC-GAN)的贡献,混合扩散GAN模型的探索,在美国国立卫生研究院的实际临床热数据上的应用,以及在异构数据设置中探索联邦学习(FL)策略。
{"title":"Multimodal Deep Generative Models for Remote Medical Applications","authors":"Catherine Ordun","doi":"10.1609/aaai.v37i13.26924","DOIUrl":"https://doi.org/10.1609/aaai.v37i13.26924","url":null,"abstract":"Visible-to-Thermal (VT) face translation is an under-studied problem of image-to-image translation that offers an AI-enabled alternative to traditional thermal sensors. Over three phases, my Doctoral Proposal explores developing multimodal deep generative solutions that can be applied towards telemedicine applications. These include the contribution of a novel Thermal Face Contrastive GAN (TFC-GAN), exploration of hybridized diffusion-GAN models, application on real clinical thermal data at the National Institutes of Health, and exploration of strategies for Federated Learning (FL) in heterogenous data settings.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"50 1","pages":"16127-16128"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74147410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
McOmet: Multimodal Fusion Transformer for Physical Audiovisual Commonsense Reasoning 物理视听常识推理的多模态融合变压器
Daoming Zong, Shiliang Sun
Physical commonsense reasoning is essential for building reliable and interpretable AI systems, which involves a general understanding of the physical properties and affordances of everyday objects, how these objects can be manipulated, and how they interact with others. It is fundamentally a multi-modal task, as physical properties are manifested through multiple modalities, including vision and acoustics. In this work, we present a unified framework, named Multimodal Commonsense Transformer (MCOMET), for physical audiovisual commonsense reasoning. MCOMET has two intriguing properties: i) it fully mines higher-ordered temporal relationships across modalities (e.g., pairs, triplets, and quadruplets); and ii) it restricts the cross-modal flow through the feature collection and propagation mechanism along with tight fusion bottlenecks, forcing the model to attend the most relevant parts in each modality and suppressing the dissemination of noisy information. We evaluate our model on a very recent public benchmark, PACS. Results show that MCOMET significantly outperforms a variety of strong baselines, revealing powerful multi-modal commonsense reasoning capabilities.
物理常识推理对于构建可靠和可解释的人工智能系统至关重要,这涉及到对日常物品的物理特性和功能的一般理解,这些物品如何被操纵,以及它们如何与他人互动。它基本上是一个多模态任务,因为物理特性通过多种模态表现出来,包括视觉和声学。在这项工作中,我们提出了一个统一的框架,称为多模态常识转换器(MCOMET),用于物理视听常识推理。MCOMET有两个有趣的特性:i)它完全挖掘跨模态(例如,对、三联体和四联体)的高阶时间关系;ii)通过特征收集和传播机制限制了跨模态的流动,融合瓶颈较紧,迫使模型关注每个模态中最相关的部分,抑制了噪声信息的传播。我们用最近的公共基准PACS来评估我们的模型。结果表明,MCOMET显著优于各种强基线,显示出强大的多模态常识推理能力。
{"title":"McOmet: Multimodal Fusion Transformer for Physical Audiovisual Commonsense Reasoning","authors":"Daoming Zong, Shiliang Sun","doi":"10.1609/aaai.v37i5.25813","DOIUrl":"https://doi.org/10.1609/aaai.v37i5.25813","url":null,"abstract":"Physical commonsense reasoning is essential for building reliable and interpretable AI systems, which involves a general understanding of the physical properties and affordances of everyday objects, how these objects can be manipulated, and how they interact with others. It is fundamentally a multi-modal task, as physical properties are manifested through multiple modalities, including vision and acoustics. In this work, we present a unified framework, named Multimodal Commonsense Transformer (MCOMET), for physical audiovisual commonsense reasoning. MCOMET has two intriguing properties: i) it fully mines higher-ordered temporal relationships across modalities (e.g., pairs, triplets, and quadruplets); and ii) it restricts the cross-modal flow through the feature collection and propagation mechanism along with tight fusion bottlenecks, forcing the model to attend the most relevant parts in each modality and suppressing the dissemination of noisy information. We evaluate our model on a very recent public benchmark, PACS. Results show that MCOMET significantly outperforms a variety of strong baselines, revealing powerful multi-modal commonsense reasoning capabilities.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"42 1","pages":"6621-6629"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75194890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Revisiting Unsupervised Local Descriptor Learning 回顾无监督局部描述符学习
Wu‐ru Wang, Lei Zhang, Hua Huang
Constructing accurate training tuples is crucial for unsupervised local descriptor learning, yet challenging due to the absence of patch labels. The state-of-the-art approach constructs tuples with heuristic rules, which struggle to precisely depict real-world patch transformations, in spite of enabling fast model convergence. A possible solution to alleviate the problem is the clustering-based approach, which can capture realistic patch variations and learn more accurate class decision boundaries, but suffers from slow model convergence. This paper presents HybridDesc, an unsupervised approach that learns powerful local descriptor models with fast convergence speed by combining the rule-based and clustering-based approaches to construct training tuples. In addition, HybridDesc also contributes two concrete enhancing mechanisms: (1) a Differentiable Hyperparameter Search (DHS) strategy to find the optimal hyperparameter setting of the rule-based approach so as to provide accurate prior for the clustering-based approach, (2) an On-Demand Clustering (ODC) method to reduce the clustering overhead of the clustering-based approach without eroding its advantage. Extensive experimental results show that HybridDesc can efficiently learn local descriptors that surpass existing unsupervised local descriptors and even rival competitive supervised ones.
构建准确的训练元组对于无监督局部描述符学习至关重要,但由于缺乏补丁标签而具有挑战性。最先进的方法构建具有启发式规则的元组,尽管能够实现快速模型收敛,但难以精确描述现实世界的补丁转换。缓解这个问题的一个可能的解决方案是基于聚类的方法,该方法可以捕获真实的补丁变化并学习更准确的类决策边界,但存在模型收敛缓慢的问题。HybridDesc是一种无监督的方法,它结合基于规则和基于聚类的方法来构造训练元组,学习功能强大的局部描述符模型,收敛速度快。此外,HybridDesc还提供了两种具体的增强机制:(1)可微分超参数搜索(DHS)策略,用于寻找基于规则的方法的最优超参数设置,从而为基于聚类的方法提供准确的先验;(2)按需聚类(ODC)方法,用于减少基于聚类的方法的聚类开销,同时又不损害其优势。大量的实验结果表明,HybridDesc可以有效地学习超越现有无监督局部描述符甚至竞争的有监督局部描述符的局部描述符。
{"title":"Revisiting Unsupervised Local Descriptor Learning","authors":"Wu‐ru Wang, Lei Zhang, Hua Huang","doi":"10.1609/aaai.v37i3.25367","DOIUrl":"https://doi.org/10.1609/aaai.v37i3.25367","url":null,"abstract":"Constructing accurate training tuples is crucial for unsupervised local descriptor learning, yet challenging due to the absence of patch labels. The state-of-the-art approach constructs tuples with heuristic rules, which struggle to precisely depict real-world patch transformations, in spite of enabling fast model convergence. A possible solution to alleviate the problem is the clustering-based approach, which can capture realistic patch variations and learn more accurate class decision boundaries, but suffers from slow model convergence. This paper presents HybridDesc, an unsupervised approach that learns powerful local descriptor models with fast convergence speed by combining the rule-based and clustering-based approaches to construct training tuples. In addition, HybridDesc also contributes two concrete enhancing mechanisms: (1) a Differentiable Hyperparameter Search (DHS) strategy to find the optimal hyperparameter setting of the rule-based approach so as to provide accurate prior for the clustering-based approach, (2) an On-Demand Clustering (ODC) method to reduce the clustering overhead of the clustering-based approach without eroding its advantage. Extensive experimental results show that HybridDesc can efficiently learn local descriptors that surpass existing unsupervised local descriptors and even rival competitive supervised ones.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"26 1","pages":"2680-2688"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75665780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FC-TrackNet: Fast Convergence Net for 6D Pose Tracking in Synthetic Domains FC-TrackNet:用于合成域6D姿态跟踪的快速收敛网络
Di Jia, Qianqian Wang, Jun Cao, Peng Cai, Zhiyang Jin
In this work, we propose a fast convergence track net, or FC-TrackNet, based on a synthetic data-driven approach to maintaining long-term 6D pose tracking. Comparison experiments are performed on two different datasets, The results demonstrate that our approach can achieve a consistent tracking frequency of 90.9 Hz as well as higher accuracy than the state-of-the art approaches.
在这项工作中,我们提出了一种基于综合数据驱动方法的快速收敛跟踪网络,或FC-TrackNet,以维持长期的6D姿态跟踪。在两个不同的数据集上进行了对比实验,结果表明,我们的方法可以实现90.9 Hz的一致跟踪频率,并且比目前的方法具有更高的精度。
{"title":"FC-TrackNet: Fast Convergence Net for 6D Pose Tracking in Synthetic Domains","authors":"Di Jia, Qianqian Wang, Jun Cao, Peng Cai, Zhiyang Jin","doi":"10.1609/aaai.v37i13.27077","DOIUrl":"https://doi.org/10.1609/aaai.v37i13.27077","url":null,"abstract":"In this work, we propose a fast convergence track net, or FC-TrackNet, based on a synthetic data-driven approach to maintaining long-term 6D pose tracking. Comparison experiments are performed on two different datasets, The results demonstrate that our approach can achieve a consistent tracking frequency of 90.9 Hz as well as higher accuracy than the state-of-the art approaches.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"32 1","pages":"16455-16457"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74441212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoMusic: A Motion-Driven Human-AI Collaborative Music Composition and Performing System MoMusic:一个动作驱动的人类-人工智能协作音乐创作和表演系统
Weizhen Bian, Yijin Song, Nianzhen Gu, Tin Yan Chan, Tsz To Lo, Tsun Sun Li, King Chak Wong, Wei Xue, R. Trillo
The significant development of artificial neural network architectures has facilitated the increasing adoption of automated music composition models over the past few years. However, most existing systems feature algorithmic generative structures based on hard code and predefined rules, generally excluding interactive or improvised behaviors. We propose a motion based music system, MoMusic, as a AI real time music generation system. MoMusic features a partially randomized harmonic sequencing model based on a probabilistic analysis of tonal chord progressions, mathematically abstracted through musical set theory. This model is presented against a dual dimension grid that produces resulting sounds through a posture recognition mechanism. A camera captures the users' fingers' movement and trajectories, creating coherent, partially improvised harmonic progressions. MoMusic integrates several timbrical registers, from traditional classical instruments such as the piano to a new ''human voice instrument'' created using a voice conversion technique. Our research demonstrates MoMusic's interactiveness, ability to inspire musicians, and ability to generate coherent musical material with various timbrical registers. MoMusic's capabilities could be easily expanded to incorporate different forms of posture controlled timbrical transformation, rhythmic transformation, dynamic transformation, or even digital sound processing techniques.
在过去的几年中,人工神经网络架构的重大发展促进了自动化音乐作曲模型的日益普及。然而,大多数现有系统的特点是基于硬代码和预定义规则的算法生成结构,通常排除交互或临时行为。我们提出了一个基于动作的音乐系统MoMusic,作为一个人工智能实时音乐生成系统。MoMusic的特点是基于音调和弦进行的概率分析的部分随机谐波排序模型,通过音乐集理论进行数学抽象。该模型是针对通过姿势识别机制产生声音的二维网格提出的。摄像机捕捉使用者手指的运动和轨迹,创造出连贯的、部分即兴的和声。MoMusic集成了几个音质音域,从传统的古典乐器如钢琴到使用声音转换技术创建的新“人声乐器”。我们的研究证明了mommusic的互动性、激励音乐家的能力,以及用各种音域产生连贯音乐材料的能力。mommusic的功能可以很容易地扩展到包含不同形式的姿势控制的音色转换、节奏转换、动态转换,甚至是数字声音处理技术。
{"title":"MoMusic: A Motion-Driven Human-AI Collaborative Music Composition and Performing System","authors":"Weizhen Bian, Yijin Song, Nianzhen Gu, Tin Yan Chan, Tsz To Lo, Tsun Sun Li, King Chak Wong, Wei Xue, R. Trillo","doi":"10.1609/aaai.v37i13.26907","DOIUrl":"https://doi.org/10.1609/aaai.v37i13.26907","url":null,"abstract":"The significant development of artificial neural network architectures has facilitated the increasing adoption of automated music composition models over the past few years. However, most existing systems feature algorithmic generative structures based on hard code and predefined rules, generally excluding interactive or improvised behaviors. We propose a motion based music system, MoMusic, as a AI real time music generation system. MoMusic features a partially randomized harmonic sequencing model based on a probabilistic analysis of tonal chord progressions, mathematically abstracted through musical set theory. This model is presented against a dual dimension grid that produces resulting sounds through a posture recognition mechanism. A camera captures the users' fingers' movement and trajectories, creating coherent, partially improvised harmonic progressions. MoMusic integrates several timbrical registers, from traditional classical instruments such as the piano to a new ''human voice instrument'' created using a voice conversion technique. Our research demonstrates MoMusic's interactiveness, ability to inspire musicians, and ability to generate coherent musical material with various timbrical registers. MoMusic's capabilities could be easily expanded to incorporate different forms of posture controlled timbrical transformation, rhythmic transformation, dynamic transformation, or even digital sound processing techniques.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"28 1","pages":"16057-16062"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74526568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Music-to-Facial Expressions: Emotion-Based Music Visualization for the Hearing Impaired 音乐到面部表情:听觉受损者基于情感的音乐可视化
Yubo Wang, Fengzhou Pan, Danni Liu, Jiaxiong Hu
While music is made to convey messages and emotions, auditory music is not equally accessible to everyone. Music visualization is a common approach to augment the listening experiences of the hearing users and to provide music experiences for the hearing-impaired. In this paper, we present a music visualization system that can turn the input of a piece of music into a series of facial expressions representative of the continuously changing sentiments in the music. The resulting facial expressions, recorded as action units, can later animate a static virtual avatar to be emotive synchronously with the music.
虽然音乐是用来传达信息和情感的,但听觉音乐并不是人人都能接受的。音乐可视化是增强听觉使用者听觉体验和为听障人士提供音乐体验的一种常用方法。在本文中,我们提出了一个音乐可视化系统,它可以将一段音乐的输入转化为一系列代表音乐中不断变化的情绪的面部表情。由此产生的面部表情被记录为动作单元,随后可以使静态虚拟化身与音乐同步产生情感。
{"title":"Music-to-Facial Expressions: Emotion-Based Music Visualization for the Hearing Impaired","authors":"Yubo Wang, Fengzhou Pan, Danni Liu, Jiaxiong Hu","doi":"10.1609/aaai.v37i13.26912","DOIUrl":"https://doi.org/10.1609/aaai.v37i13.26912","url":null,"abstract":"While music is made to convey messages and emotions, auditory music is not equally accessible to everyone. Music visualization is a common approach to augment the listening experiences of the hearing users and to provide music experiences for the hearing-impaired. In this paper, we present a music visualization system that can turn the input of a piece of music into a series of facial expressions representative of the continuously changing sentiments in the music. The resulting facial expressions, recorded as action units, can later animate a static virtual avatar to be emotive synchronously with the music.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"43 2","pages":"16096-16102"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72482483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1