KDD : proceedings. International Conference on Knowledge Discovery & Data Mining最新文献

A Deep Subgrouping Framework for Precision Drug Repurposing via Emulating Clinical Trials on Real-world Patient Data. 通过模拟真实世界患者数据的临床试验实现精确药物再利用的深度亚分组框架。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-07-20 DOI: 10.1145/3690624.3709418

Seungyeon Lee, Ruoqi Liu, Feixiong Cheng, Ping Zhang

Drug repurposing identifies new therapeutic uses for existing drugs, reducing the time and costs compared to traditional de novo drug discovery. Most existing drug repurposing studies using real-world patient data often treat the entire population as homogeneous, ignoring the heterogeneity of treatment responses across patient subgroups. This approach may overlook promising drugs that benefit specific subgroups but lack notable treatment effects across the entire population, potentially limiting the number of repurposable candidates identified. To address this, we introduce STEDR, a novel drug repurposing framework that integrates subgroup analysis with treatment effect estimation. Our approach first identifies repurposing candidates by emulating multiple clinical trials on real-world patient data and then characterizes patient subgroups by learning subgroup-specific treatment effects. We deploy STEDR to Alzheimer's Disease (AD), a condition with few approved drugs and known heterogeneity in treatment responses. We emulate trials for over one thousand medications on a large-scale real-world database covering over 8 million patients, identifying 14 drug candidates with beneficial effects to AD in characterized subgroups. Experiments demonstrate STEDR's superior capability in identifying repurposing candidates compared to existing approaches. Additionally, our method can characterize clinically relevant patient subgroups associated with important AD-related risk factors, paving the way for precision drug repurposing.

药物再利用确定了现有药物的新治疗用途，与传统的新药物发现相比，减少了时间和成本。大多数使用真实患者数据的现有药物再利用研究通常将整个人群视为均匀的，忽略了患者亚组治疗反应的异质性。这种方法可能会忽略对特定亚群有益但对整个人群缺乏显着治疗效果的有希望的药物，潜在地限制了确定的可重复使用的候选药物的数量。为了解决这个问题，我们引入了STEDR，这是一种新的药物再利用框架，将亚组分析与治疗效果评估相结合。我们的方法首先通过模拟真实世界患者数据的多个临床试验来确定重新使用的候选人，然后通过学习亚组特异性治疗效果来确定患者亚组的特征。我们将STEDR应用于阿尔茨海默病（AD），这是一种批准药物很少且已知治疗反应异质性的疾病。我们在一个覆盖800多万患者的大规模真实世界数据库中模拟了1000多种药物的试验，确定了14种候选药物在特征亚组中对AD有有益作用。实验表明，与现有方法相比，STEDR在识别候选候选对象方面具有优越的能力。此外，我们的方法可以描述与ad相关的重要危险因素相关的临床相关患者亚组，为精确药物再利用铺平道路。

{"title":"A Deep Subgrouping Framework for Precision Drug Repurposing via Emulating Clinical Trials on Real-world Patient Data.","authors":"Seungyeon Lee, Ruoqi Liu, Feixiong Cheng, Ping Zhang","doi":"10.1145/3690624.3709418","DOIUrl":"10.1145/3690624.3709418","url":null,"abstract":"Drug repurposing identifies new therapeutic uses for existing drugs, reducing the time and costs compared to traditional de novo drug discovery. Most existing drug repurposing studies using real-world patient data often treat the entire population as homogeneous, ignoring the heterogeneity of treatment responses across patient subgroups. This approach may overlook promising drugs that benefit specific subgroups but lack notable treatment effects across the entire population, potentially limiting the number of repurposable candidates identified. To address this, we introduce STEDR, a novel drug repurposing framework that integrates subgroup analysis with treatment effect estimation. Our approach first identifies repurposing candidates by emulating multiple clinical trials on real-world patient data and then characterizes patient subgroups by learning subgroup-specific treatment effects. We deploy STEDR to Alzheimer's Disease (AD), a condition with few approved drugs and known heterogeneity in treatment responses. We emulate trials for over one thousand medications on a large-scale real-world database covering over 8 million patients, identifying 14 drug candidates with beneficial effects to AD in characterized subgroups. Experiments demonstrate STEDR's superior capability in identifying repurposing candidates compared to existing approaches. Additionally, our method can characterize clinically relevant patient subgroups associated with important AD-related risk factors, paving the way for precision drug repurposing.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 v1","pages":"2347-2358"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001032/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SepsisCalc: Integrating Clinical Calculators into Early Sepsis Prediction via Dynamic Temporal Graph Construction. SepsisCalc：通过动态时间图构建将临床计算器集成到早期脓毒症预测中。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-07-20 DOI: 10.1145/3690624.3709402

Changchang Yin, Shihan Fu, Bingsheng Yao, Thai-Hoang Pham, Weidan Cao, Dakuo Wang, Jeffrey Caterino, Ping Zhang

Sepsis is an organ dysfunction caused by a deregulated immune response to an infection. Early sepsis prediction and identification allow for timely intervention, leading to improved clinical outcomes. Clinical calculators (e.g., the six-organ dysfunction assessment of SOFA in Figure 1) play a vital role in sepsis identification within clinicians' workflow, providing evidence-based risk assessments essential for sepsis diagnosis. However, artificial intelligence (AI) sepsis prediction models typically generate a single sepsis risk score without incorporating clinical calculators for assessing organ dysfunctions, making the models less convincing and transparent to clinicians. To bridge the gap, we propose to mimic clinicians' workflow with a novel framework SepsisCalc to integrate clinical calculators into the predictive model, yielding a clinically transparent and precise model for utilization in clinical settings. Practically, clinical calculators usually combine information from multiple component variables in Electronic Health Records (EHR), and might not be applicable when the variables are (partially) missing. We mitigate this issue by representing EHRs as temporal graphs and integrating a learning module to dynamically add the accurately estimated calculator to the graphs. Experimental results on real-world datasets show that the proposed model outperforms state-of-the-art methods on sepsis prediction tasks. Moreover, we developed a system to identify organ dysfunctions and potential sepsis risks, providing a human-AI interaction tool for deployment, which can help clinicians understand the prediction outputs and prepare timely interventions for the corresponding dysfunctions, paving the way for actionable clinical decision-making support for early intervention.

败血症是一种器官功能障碍，由对感染的免疫反应失调引起。早期脓毒症的预测和识别可以及时干预，从而改善临床结果。临床计算器（如图1中SOFA的六器官功能障碍评估）在临床医生的工作流程中对脓毒症的识别起着至关重要的作用，为脓毒症诊断提供了必要的循证风险评估。然而，人工智能（AI）脓毒症预测模型通常只生成一个脓毒症风险评分，而不纳入用于评估器官功能障碍的临床计算器，这使得模型对临床医生来说缺乏说服力和透明度。为了弥补这一差距，我们建议用一个新颖的SepsisCalc框架来模拟临床医生的工作流程，将临床计算器集成到预测模型中，从而产生一个临床透明和精确的模型，供临床环境中使用。实际上，临床计算器通常结合来自电子健康记录（EHR）中多个组件变量的信息，当变量（部分）缺失时可能不适用。我们通过将电子病历表示为时间图并集成学习模块来动态地将准确估计的计算器添加到图中来缓解这个问题。在真实数据集上的实验结果表明，所提出的模型在脓毒症预测任务上优于最先进的方法。此外，我们开发了一个识别器官功能障碍和潜在败血症风险的系统，为部署提供了一个人机交互工具，可以帮助临床医生了解预测结果，并针对相应的功能障碍及时准备干预措施，为早期干预提供可操作的临床决策支持。

{"title":"SepsisCalc: Integrating Clinical Calculators into Early Sepsis Prediction via Dynamic Temporal Graph Construction.","authors":"Changchang Yin, Shihan Fu, Bingsheng Yao, Thai-Hoang Pham, Weidan Cao, Dakuo Wang, Jeffrey Caterino, Ping Zhang","doi":"10.1145/3690624.3709402","DOIUrl":"10.1145/3690624.3709402","url":null,"abstract":"Sepsis is an organ dysfunction caused by a deregulated immune response to an infection. Early sepsis prediction and identification allow for timely intervention, leading to improved clinical outcomes. Clinical calculators (e.g., the six-organ dysfunction assessment of SOFA in Figure 1) play a vital role in sepsis identification within clinicians' workflow, providing evidence-based risk assessments essential for sepsis diagnosis. However, artificial intelligence (AI) sepsis prediction models typically generate a single sepsis risk score without incorporating clinical calculators for assessing organ dysfunctions, making the models less convincing and transparent to clinicians. To bridge the gap, we propose to mimic clinicians' workflow with a novel framework SepsisCalc to integrate clinical calculators into the predictive model, yielding a clinically transparent and precise model for utilization in clinical settings. Practically, clinical calculators usually combine information from multiple component variables in Electronic Health Records (EHR), and might not be applicable when the variables are (partially) missing. We mitigate this issue by representing EHRs as temporal graphs and integrating a learning module to dynamically add the accurately estimated calculator to the graphs. Experimental results on real-world datasets show that the proposed model outperforms state-of-the-art methods on sepsis prediction tasks. Moreover, we developed a system to identify organ dysfunctions and potential sepsis risks, providing a human-AI interaction tool for deployment, which can help clinicians understand the prediction outputs and prepare timely interventions for the corresponding dysfunctions, paving the way for actionable clinical decision-making support for early intervention.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 v1","pages":"2779-2790"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11998859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144058805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying Combinatorial Regulatory Genes for Cell Fate Decision via Reparameterizable Subset Explanations. 通过可重新参数化的子集解释确定细胞命运决定的组合调控基因。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-08-03 DOI: 10.1145/3711896.3737000

Junhao Liu, Pengpeng Zhang, Martin Renqiang Min, Jing Zhang

Cell fate decisions are highly coordinated processes governed by complex interactions among numerous regulatory genes, while disruptions in these mechanisms can lead to developmental abnormalities and disease. Traditional methods often fail to capture such combinatorial interactions, limiting their ability to fully model cell fate dynamics. Here, we introduce MetaVelo, a global feature explanation framework for identifying key regulatory gene sets influencing cell fate transitions. MetaVelo models these transitions as a black-box function and employs a differentiable neural ordinary differential equation (ODE) surrogate to enable efficient optimization. By reparameterizing the problem as a controllable data generation process, MetaVelo overcomes the challenges posed by the non-differentiable nature of cell fate dynamics. Benchmarking across diverse stand-alone and longitudinal single-cell RNA-seq datasets and three black-box cell fate models demonstrates its superiority over 12 baseline methods in predicting developmental trajectories and identifying combinatorial regulatory gene sets. MetaVelo further distinguishes independent from synergistic regulatory genes, offering novel insights into the gene interactions governing cell fate. With the growing availability of high-resolution single-cell data, MetaVelo provides a scalable and effective framework for advancing developmental biology and therapeutic applications.

细胞命运的决定是一个高度协调的过程，由众多调控基因之间复杂的相互作用所控制，而这些机制的破坏可能导致发育异常和疾病。传统的方法往往不能捕捉到这种组合相互作用，限制了它们完全模拟细胞命运动力学的能力。在这里，我们介绍MetaVelo，一个全球特征解释框架，用于识别影响细胞命运转变的关键调控基因集。MetaVelo将这些转换建模为黑盒函数，并采用可微神经常微分方程（ODE）代理来实现高效优化。通过将问题重新参数化为可控的数据生成过程，MetaVelo克服了细胞命运动力学不可微特性带来的挑战。对不同的独立和纵向单细胞RNA-seq数据集和三个黑箱细胞命运模型进行基准测试表明，它在预测发育轨迹和识别组合调控基因集方面优于12种基线方法。MetaVelo进一步将独立调控基因与协同调控基因区分开来，为调控细胞命运的基因相互作用提供了新的见解。随着高分辨率单细胞数据的不断增加，MetaVelo为推进发育生物学和治疗应用提供了一个可扩展和有效的框架。

{"title":"Identifying Combinatorial Regulatory Genes for Cell Fate Decision via Reparameterizable Subset Explanations.","authors":"Junhao Liu, Pengpeng Zhang, Martin Renqiang Min, Jing Zhang","doi":"10.1145/3711896.3737000","DOIUrl":"10.1145/3711896.3737000","url":null,"abstract":"Cell fate decisions are highly coordinated processes governed by complex interactions among numerous regulatory genes, while disruptions in these mechanisms can lead to developmental abnormalities and disease. Traditional methods often fail to capture such combinatorial interactions, limiting their ability to fully model cell fate dynamics. Here, we introduce MetaVelo, a global feature explanation framework for identifying key regulatory gene sets influencing cell fate transitions. MetaVelo models these transitions as a black-box function and employs a differentiable neural ordinary differential equation (ODE) surrogate to enable efficient optimization. By reparameterizing the problem as a controllable data generation process, MetaVelo overcomes the challenges posed by the non-differentiable nature of cell fate dynamics. Benchmarking across diverse stand-alone and longitudinal single-cell RNA-seq datasets and three black-box cell fate models demonstrates its superiority over 12 baseline methods in predicting developmental trajectories and identifying combinatorial regulatory gene sets. MetaVelo further distinguishes independent from synergistic regulatory genes, offering novel insights into the gene interactions governing cell fate. With the growing availability of high-resolution single-cell data, MetaVelo provides a scalable and effective framework for advancing developmental biology and therapeutic applications.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 v2","pages":"1823-1832"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12718083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance. MentalChat16K：会话心理健康援助的基准数据集。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-08-03 DOI: 10.1145/3711896.3737393

Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wagenaar, George Demiris, Li Shen

We introduce MentalChat16K, an English benchmark dataset combining a synthetic mental health counseling dataset and a dataset of anonymized transcripts from interventions between Behavioral Health Coaches and Caregivers of patients in palliative or hospice care. Covering a diverse range of conditions like depression, anxiety, and grief, this curated dataset is designed to facilitate the development and evaluation of large language models for conversational mental health assistance. By providing a high-quality resource tailored to this critical domain, MentalChat16K aims to advance research on empathetic, personalized AI solutions to improve access to mental health support services. The dataset prioritizes patient privacy, ethical considerations, and responsible data usage. MentalChat16K presents a valuable opportunity for the research community to innovate AI technologies that can positively impact mental well-being. The dataset is available at https://huggingface.co/datasets/ShenLab/MentalChat16K and the code and documentation are hosted on GitHub at https://github.com/PennShenLab/MentalChat16K.

我们介绍了MentalChat16K，这是一个英语基准数据集，结合了合成的心理健康咨询数据集和行为健康教练和姑息治疗或临终关怀患者护理人员之间干预的匿名记录数据集。这个精心策划的数据集涵盖了抑郁、焦虑和悲伤等多种情况，旨在促进大型语言模型的开发和评估，以用于对话式心理健康援助。通过提供针对这一关键领域的高质量资源，MentalChat16K旨在推进移情、个性化人工智能解决方案的研究，以改善心理健康支持服务的获取。该数据集优先考虑患者隐私、伦理考虑和负责任的数据使用。MentalChat16K为研究界提供了一个创新人工智能技术的宝贵机会，这些技术可以对心理健康产生积极影响。数据集可在https://huggingface.co/datasets/ShenLab/MentalChat16K上获得，代码和文档托管在GitHub https://github.com/PennShenLab/MentalChat16K上。

{"title":"MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance.","authors":"Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wagenaar, George Demiris, Li Shen","doi":"10.1145/3711896.3737393","DOIUrl":"10.1145/3711896.3737393","url":null,"abstract":"We introduce MentalChat16K, an English benchmark dataset combining a synthetic mental health counseling dataset and a dataset of anonymized transcripts from interventions between Behavioral Health Coaches and Caregivers of patients in palliative or hospice care. Covering a diverse range of conditions like depression, anxiety, and grief, this curated dataset is designed to facilitate the development and evaluation of large language models for conversational mental health assistance. By providing a high-quality resource tailored to this critical domain, MentalChat16K aims to advance research on empathetic, personalized AI solutions to improve access to mental health support services. The dataset prioritizes patient privacy, ethical considerations, and responsible data usage. MentalChat16K presents a valuable opportunity for the research community to innovate AI technologies that can positively impact mental well-being. The dataset is available at https://huggingface.co/datasets/ShenLab/MentalChat16K and the code and documentation are hosted on GitHub at https://github.com/PennShenLab/MentalChat16K.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 ","pages":"5367-5378"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12520247/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145304848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SatHealth: A Multimodal Public Health Dataset with Satellite-based Environmental Factors. 健康：基于卫星环境因子的多模式公共卫生数据集。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-08-03 DOI: 10.1145/3711896.3737440

Yuanlong Wang, Pengqi Wang, Changchang Yin, Ping Zhang

Living environments play a vital role in the prevalence and progression of diseases, and understanding their impact on patient's health status becomes increasingly crucial for developing AI models. However, due to the lack of long-term and fine-grained spatial and temporal data in public and population health studies, most existing studies fail to incorporate environmental data, limiting the models' performance and real-world application. To address this shortage, we developed SatHealth, a novel dataset combining multimodal spatiotemporal data, including environmental data, satellite images, all-disease prevalences estimated from medical claims, and social determinants of health (SDoH) indicators. We conducted experiments under two use cases with SatHealth: regional public health modeling and personal disease risk prediction. Experimental results show that living environmental information can significantly improve AI models' performance and temporal-spatial generalizability on various tasks. Finally, we deploy a web-based application to provide an exploration tool for SatHealth and one-click access to both our data and regional environmental embedding to facilitate plug-and-play utilization. SatHealth is now published with data in Ohio, and we will keep updating SatHealth to cover the other parts of the US. With the web application and published code pipeline, our work provides valuable angles and resources to include environmental data in healthcare research and establishes a foundational framework for future research in environmental health informatics.

生活环境在疾病的流行和发展中起着至关重要的作用，了解它们对患者健康状况的影响对于开发人工智能模型变得越来越重要。然而，由于公共和人口健康研究中缺乏长期和细粒度的时空数据，大多数现有研究未能纳入环境数据，限制了模型的性能和实际应用。为了解决这一不足，我们开发了SatHealth，这是一个结合多模态时空数据的新数据集，包括环境数据、卫星图像、从医疗索赔中估计的所有疾病患病率和健康的社会决定因素（SDoH）指标。我们在SatHealth的两个用例下进行了实验：区域公共卫生建模和个人疾病风险预测。实验结果表明，生活环境信息可以显著提高人工智能模型在各种任务上的性能和时空泛化能力。最后，我们部署了一个基于网络的应用程序，为SatHealth提供了一个探索工具，一键访问我们的数据和区域环境嵌入，以促进即插即用的利用。目前，俄亥俄州的数据已经发布，我们将继续更新，以覆盖美国的其他地区。通过web应用程序和已发布的代码管道，我们的工作为将环境数据纳入医疗保健研究提供了宝贵的角度和资源，并为未来的环境健康信息学研究建立了基础框架。

{"title":"SatHealth: A Multimodal Public Health Dataset with Satellite-based Environmental Factors.","authors":"Yuanlong Wang, Pengqi Wang, Changchang Yin, Ping Zhang","doi":"10.1145/3711896.3737440","DOIUrl":"10.1145/3711896.3737440","url":null,"abstract":"Living environments play a vital role in the prevalence and progression of diseases, and understanding their impact on patient's health status becomes increasingly crucial for developing AI models. However, due to the lack of long-term and fine-grained spatial and temporal data in public and population health studies, most existing studies fail to incorporate environmental data, limiting the models' performance and real-world application. To address this shortage, we developed SatHealth, a novel dataset combining multimodal spatiotemporal data, including environmental data, satellite images, all-disease prevalences estimated from medical claims, and social determinants of health (SDoH) indicators. We conducted experiments under two use cases with SatHealth: regional public health modeling and personal disease risk prediction. Experimental results show that living environmental information can significantly improve AI models' performance and temporal-spatial generalizability on various tasks. Finally, we deploy a web-based application to provide an exploration tool for SatHealth and one-click access to both our data and regional environmental embedding to facilitate plug-and-play utilization. SatHealth is now published with data in Ohio, and we will keep updating SatHealth to cover the other parts of the US. With the web application and published code pipeline, our work provides valuable angles and resources to include environmental data in healthcare research and establishes a foundational framework for future research in environmental health informatics.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 ","pages":"5819-5830"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144839290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks. 图ode及其以后：用图神经网络积分微分方程的综合研究。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2025-08-01 Epub Date: 2025-08-03 DOI: 10.1145/3711896.3736559

Zewen Liu, Xiaoda Wang, Bohan Wang, Zijie Huang, Carl Yang, Wei Jin

Graph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years. GNNs have emerged as powerful tools for learning on graph-structured data, while differential equations provide a principled framework for modeling continuous dynamics across time and space. The intersection of these fields has led to innovative approaches that leverage the strengths of both, enabling applications in physics-informed learning, spatiotemporal modeling, and scientific computing. This survey aims to provide a comprehensive overview of the burgeoning research at the intersection of GNNs and DEs. We will categorize existing methods, discuss their underlying principles, and highlight their applications across domains such as molecular modeling, traffic prediction, and epidemic spreading. Furthermore, we identify open challenges and outline future research directions to advance this interdisciplinary field. A comprehensive paper list is provided at https://github.com/Emory-Melody/Awesome-Graph-NDEs.

图神经网络（GNNs）和微分方程（DEs）是近年来发展迅速的两个研究领域，它们表现出了显著的协同作用。gnn已成为学习图结构数据的强大工具，而微分方程为跨时间和空间的连续动力学建模提供了原则框架。这些领域的交叉导致了利用两者优势的创新方法，使物理知识学习，时空建模和科学计算的应用成为可能。本调查旨在全面概述gnn和DEs交叉领域的新兴研究。我们将对现有方法进行分类，讨论其基本原理，并重点介绍它们在分子建模、交通预测和流行病传播等领域的应用。此外，我们确定了开放的挑战，并概述了未来的研究方向，以推进这一跨学科领域。全面的论文清单可在https://github.com/Emory-Melody/Awesome-Graph-NDEs上找到。

{"title":"Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks.","authors":"Zewen Liu, Xiaoda Wang, Bohan Wang, Zijie Huang, Carl Yang, Wei Jin","doi":"10.1145/3711896.3736559","DOIUrl":"10.1145/3711896.3736559","url":null,"abstract":"Graph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years. GNNs have emerged as powerful tools for learning on graph-structured data, while differential equations provide a principled framework for modeling continuous dynamics across time and space. The intersection of these fields has led to innovative approaches that leverage the strengths of both, enabling applications in physics-informed learning, spatiotemporal modeling, and scientific computing. This survey aims to provide a comprehensive overview of the burgeoning research at the intersection of GNNs and DEs. We will categorize existing methods, discuss their underlying principles, and highlight their applications across domains such as molecular modeling, traffic prediction, and epidemic spreading. Furthermore, we identify open challenges and outline future research directions to advance this interdisciplinary field. A comprehensive paper list is provided at https://github.com/Emory-Melody/Awesome-Graph-NDEs.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2025 ","pages":"6118-6128"},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12363673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models. 通过预测扩散模型综合多模式电子健康记录。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2024-08-01 Epub Date: 2024-08-24 DOI: 10.1145/3637528.3671836

Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate input visits, resulting in inadequate modeling of temporal dependencies between visits and overlooking the generation of time information, a crucial element in EHR data. Moreover, their ability to learn visit representations is limited due to simple linear mapping functions, thus compromising generation quality. To address these limitations, we propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. To enhance generation quality and diversity, we introduce a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (P-DDPM). Additionally, we devise a predictive U-Net (PU-Net) to optimize P-DDPM. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives. The experimental results demonstrate the efficacy and utility of the proposed EHRPD in addressing the aforementioned limitations and advancing EHR data generation.

综合电子健康记录（EHR）数据已成为解决医疗保健中数据短缺、提高数据质量和模型公平性的首选策略。然而，现有的EHR数据生成方法主要依赖于最先进的生成技术，如生成对抗网络、变分自动编码器和语言模型。这些方法通常重复输入访问，导致访问之间的时间依赖关系建模不足，并且忽略了时间信息的生成，而时间信息是EHR数据的关键元素。此外，由于简单的线性映射函数，它们学习访问表示的能力受到限制，从而影响了生成质量。为了解决这些限制，我们提出了一种新的EHR数据生成模型，称为EHRPD。这是一个基于扩散的模型，旨在根据当前访问预测下一次访问，同时还结合了时间间隔估计。为了提高生成质量和多样性，我们引入了一种新的时间感知访问嵌入模块和一种开创性的预测去噪扩散概率模型（P-DDPM）。此外，我们设计了一个预测U-Net （PU-Net）来优化P-DDPM。我们在两个公共数据集上进行了实验，并从保真度、隐私性和实用性的角度评估了EHRPD。实验结果表明，所提出的EHRPD在解决上述局限性和推进电子病历数据生成方面的有效性和实用性。

{"title":"Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models.","authors":"Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma","doi":"10.1145/3637528.3671836","DOIUrl":"https://doi.org/10.1145/3637528.3671836","url":null,"abstract":"Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate input visits, resulting in inadequate modeling of temporal dependencies between visits and overlooking the generation of time information, a crucial element in EHR data. Moreover, their ability to learn visit representations is limited due to simple linear mapping functions, thus compromising generation quality. To address these limitations, we propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. To enhance generation quality and diversity, we introduce a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (P-DDPM). Additionally, we devise a predictive U-Net (PU-Net) to optimize P-DDPM. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives. The experimental results demonstrate the efficacy and utility of the proposed EHRPD in addressing the aforementioned limitations and advancing EHR data generation.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2024 ","pages":"4607-4618"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144030156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing. 败血症实验室：利用不确定性量化和主动传感技术进行早期败血症预测。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2024-08-01 Epub Date: 2024-08-24 DOI: 10.1145/3637528.3671586

Changchang Yin, Pin-Yu Chen, Bingsheng Yao, Dakuo Wang, Jeffrey Caterino, Ping Zhang

Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), which causes a significant decrease in accuracy and an increase in uncertainty for the predictive models. The common method to handle missing values is imputation, which replaces the unavailable variables with estimates from the observed data. The uncertainty of imputation results can be propagated to the sepsis prediction outputs, which have not been studied in existing works on either sepsis prediction or uncertainty quantification. In this study, we first define such propagated uncertainty as the variance of prediction output and then introduce uncertainty propagation methods to quantify the propagated uncertainty. Moreover, for the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm to increase confidence by actively recommending clinicians to observe the most informative variables. We validate the proposed models in both publicly available data (i.e., MIMIC-III and AmsterdamUMCdb) and proprietary data in The Ohio State University Wexner Medical Center (OSUWMC). The experimental results show that the propagated uncertainty is dominant at the beginning of admissions to hospitals and the proposed algorithm outperforms state-of-the-art active sensing methods. Finally, we implement a SepsisLab system for early sepsis prediction and active sensing based on our pre-trained models. Clinicians and potential sepsis patients can benefit from the system in early prediction and diagnosis of sepsis.

在美国，败血症是导致院内死亡的主要原因。早期脓毒症发病预测和诊断可显著提高脓毒症患者的生存率。现有的预测模型通常是在缺失信息较少的高质量数据基础上进行训练的，而缺失值却广泛存在于现实世界的临床场景中（尤其是入院后的最初几个小时），这导致预测模型的准确性大大降低，不确定性增加。处理缺失值的常用方法是估算，即用观测数据的估计值替换不可用的变量。估算结果的不确定性会传播到脓毒症预测输出结果中，而现有的脓毒症预测或不确定性量化研究都没有对这一点进行研究。在本研究中，我们首先将这种传播的不确定性定义为预测输出的方差，然后引入不确定性传播方法来量化传播的不确定性。此外，对于因观察结果有限而置信度较低的潜在高危患者，我们提出了一种稳健的主动感应算法，通过积极建议临床医生观察信息量最大的变量来提高置信度。我们在公开数据（即 MIMIC-III 和 AmsterdamUMCdb）和俄亥俄州立大学韦克斯纳医疗中心（OSUWMC）的专有数据中验证了所提出的模型。实验结果表明，传播的不确定性在入院初期占主导地位，所提出的算法优于最先进的主动感应方法。最后，我们基于预先训练好的模型实施了一个用于早期败血症预测和主动感知的 SepsisLab 系统。临床医生和潜在的败血症患者可以从该系统的早期预测和诊断中获益。

{"title":"SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing.","authors":"Changchang Yin, Pin-Yu Chen, Bingsheng Yao, Dakuo Wang, Jeffrey Caterino, Ping Zhang","doi":"10.1145/3637528.3671586","DOIUrl":"https://doi.org/10.1145/3637528.3671586","url":null,"abstract":"Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), which causes a significant decrease in accuracy and an increase in uncertainty for the predictive models. The common method to handle missing values is imputation, which replaces the unavailable variables with estimates from the observed data. The uncertainty of imputation results can be propagated to the sepsis prediction outputs, which have not been studied in existing works on either sepsis prediction or uncertainty quantification. In this study, we first define such propagated uncertainty as the variance of prediction output and then introduce uncertainty propagation methods to quantify the propagated uncertainty. Moreover, for the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm to increase confidence by actively recommending clinicians to observe the most informative variables. We validate the proposed models in both publicly available data (i.e., MIMIC-III and AmsterdamUMCdb) and proprietary data in The Ohio State University Wexner Medical Center (OSUWMC). The experimental results show that the propagated uncertainty is dominant at the beginning of admissions to hospitals and the proposed algorithm outperforms state-of-the-art active sensing methods. Finally, we implement a SepsisLab system for early sepsis prediction and active sensing based on our pre-trained models. Clinicians and potential sepsis patients can benefit from the system in early prediction and diagnosis of sepsis.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2024 ","pages":"6158-6168"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data. TACCO：基于EHR数据的疾病亚型临床概念和患者就诊的任务导向共聚类。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2024-08-01 Epub Date: 2024-08-24 DOI: 10.1145/3637528.3671594

Ziyang Zhang, Hejie Cui, Ran Xu, Yuzhang Xie, Joyce C Ho, Carl Yang

The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO.

组织良好的电子健康记录（EHR）数据的日益可用性使得各种机器学习模型能够用于疾病风险预测。然而，现有的风险预测方法忽视了复杂疾病的异质性，未能对其对应的就诊人数和临床概念亚组的潜在疾病亚型进行建模。在这项工作中，我们介绍了TACCO，这是一个基于EHR数据超图建模的新框架，可以共同发现临床概念和患者就诊的集群。具体而言，我们开发了一种新的自监督共聚类框架，可以通过特定疾病的风险预测任务来指导。此外，我们通过文本嵌入增强了EHR数据的超图模型，并通过对比目标加强了临床概念和患者就诊之间的一致性。在公开的MIMIC-III数据集和Emory内部的CRADLE数据集上进行的关于表型分类和心血管风险预测的下游临床任务的综合实验表明，与传统的ML基线相比，平均性能提高了31.25%，在没有我们的共聚类机制的情况下，在香草超图模型的基础上提高了5.26%。深入的模型分析、聚类结果分析和临床案例研究进一步验证了TACCO提供的改进的效用和深刻的解释。代码可从https://github.com/PericlesHat/TACCO获得。

{"title":"TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data.","authors":"Ziyang Zhang, Hejie Cui, Ran Xu, Yuzhang Xie, Joyce C Ho, Carl Yang","doi":"10.1145/3637528.3671594","DOIUrl":"10.1145/3637528.3671594","url":null,"abstract":"The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2024 ","pages":"6324-6334"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11868038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization. 分布式协调：联邦集群批量效应调整和泛化。

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Pub Date : 2024-01-01 Epub Date: 2024-08-24 DOI: 10.1145/3637528.3671590

Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul M Thompson, Jiayu Zhou

Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The COMBAT is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, COMBAT lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of COMBAT harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.

独立且同分布（i.i.d.）的数据对许多数据分析和建模技术至关重要。在医疗领域，从多个地点或机构收集数据是一种常见的策略，可以保证足够的临床多样性，这是由医疗数据的分散性决定的。然而，来自不同地点的数据很容易受到当地环境或设施的影响，从而违反 i.i.d. 规则。一种常见的策略是在保留重要生物信息的同时，协调不同地点的偏差。COMBAT 是最常用的协调方法之一，最近已扩展到处理分布式站点。然而，当遇到在训练中涉及新加入的研究点或评估来自未知/未见研究点的数据时，COMBAT 缺乏兼容性，需要使用来自所有研究点的数据进行重新训练。重新训练会带来巨大的计算和物流开销，通常是难以承受的。在这项工作中，我们开发了一种新颖的集群 ComBat 协调算法，该算法利用不同地点数据的集群模式，大大提高了 COMBAT 协调的可用性。我们使用大量模拟和 ADNI 的真实医学影像数据来证明所提方法的优越性。我们的代码见 https://github.com/illidanlab/distributed-cluster-harmonization。

{"title":"Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization.","authors":"Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul M Thompson, Jiayu Zhou","doi":"10.1145/3637528.3671590","DOIUrl":"10.1145/3637528.3671590","url":null,"abstract":"Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The COMBAT is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, COMBAT lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of COMBAT harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":"2024 ","pages":"5105-5115"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0