首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Multi-channel causal variational autoencoder for multimodal biomedical causal disentanglement. 用于多模态生物医学因果解缠的多通道因果变分自编码器。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-05 DOI: 10.1016/j.jbi.2026.104995
Safaa Al-Ali, Irene Balelli

The multimodal nature of clinical assessment and decision-making, and the high rate of healthcare data generation, motivate the need to develop approaches specifically tailored to the analysis of these complex and potentially high-dimensional multimodal datasets. This poses both technical and conceptual challenges: how can such heterogeneous data be analyzed jointly? How can modality-specific information be identified from shared information? Variational autoencoders (VAEs) offer a robust framework for learning latent representations of complex data distributions, while being flexible enough to adapt to different data types and structures, and having already been successfully applied for latent disentanglement of multimodal (multi-channel) data. We aim at tackling multi-channel disentanglement from a causal perspective, and seek at identifying causal relationships between channels, beyond simple statistical associations. To do that, we propose Multi-Channel Causal VAE (MC2VAE), a novel causal disentanglement approach for multi-channel data, whose objective is to jointly learn modality-specific latent representations from a multi-channel dataset, and identify a causal structure between the latent channels. Each channel is projected into its own latent space, where a causal discovery step is integrated to learn the hidden causal graph. Finally, the decoder takes into account the discovered graph to predict the data. Covariate of interest can be integrated as well when available, and accounted in the causal graph structure. Extensive experiments on synthetically generated multi-channel datasets demonstrate the ability of MC2VAE in effectively uncovering the underlying latent causal structures across multiple channels, hence making it a strong candidate for real-world multi-channel causal disentanglement. Application to multi-channel data on neurodegeneration extracted from the Alzheimer's Disease Neuroimaging Initiative highlights the existence of a biologically meaningful latent causal structure, whose pertinence is supported by multiple previous experimental and modelling work, and provides actionable insight for disease progression.

临床评估和决策的多模式性质,以及医疗保健数据的高生成率,促使需要开发专门针对分析这些复杂且可能高维的多模式数据集的方法。这带来了技术和概念上的挑战:如何联合分析这些异构数据?如何从共享信息中识别特定于模态的信息?变分自编码器(VAEs)为学习复杂数据分布的潜在表示提供了一个强大的框架,同时足够灵活地适应不同的数据类型和结构,并且已经成功地应用于多模态(多通道)数据的潜在解纠集。我们的目标是从因果关系的角度来解决多渠道的分离,并寻求识别渠道之间的因果关系,而不仅仅是简单的统计关联。为此,我们提出了Multi-Channel Causal VAE (MC2VAE),这是一种针对多通道数据的新型因果解纠缠方法,其目标是从多通道数据集中共同学习模态特定的潜在表征,并识别潜在通道之间的因果结构。每个通道都被投射到自己的潜在空间中,其中一个因果发现步骤被集成以学习隐藏的因果图。最后,解码器根据发现的图来预测数据。感兴趣的协变量也可以在可用的情况下进行整合,并在因果图结构中进行计算。在综合生成的多通道数据集上进行的大量实验表明,MC2VAE能够有效地揭示跨多个通道的潜在因果结构,从而使其成为现实世界中多通道因果解纠缠的有力候选。应用于从阿尔茨海默病神经影像学倡议中提取的神经退行性疾病的多通道数据,突出了生物学上有意义的潜在因果结构的存在,其相关性得到了先前多个实验和建模工作的支持,并为疾病进展提供了可操作的见解。
{"title":"Multi-channel causal variational autoencoder for multimodal biomedical causal disentanglement.","authors":"Safaa Al-Ali, Irene Balelli","doi":"10.1016/j.jbi.2026.104995","DOIUrl":"https://doi.org/10.1016/j.jbi.2026.104995","url":null,"abstract":"<p><p>The multimodal nature of clinical assessment and decision-making, and the high rate of healthcare data generation, motivate the need to develop approaches specifically tailored to the analysis of these complex and potentially high-dimensional multimodal datasets. This poses both technical and conceptual challenges: how can such heterogeneous data be analyzed jointly? How can modality-specific information be identified from shared information? Variational autoencoders (VAEs) offer a robust framework for learning latent representations of complex data distributions, while being flexible enough to adapt to different data types and structures, and having already been successfully applied for latent disentanglement of multimodal (multi-channel) data. We aim at tackling multi-channel disentanglement from a causal perspective, and seek at identifying causal relationships between channels, beyond simple statistical associations. To do that, we propose Multi-Channel Causal VAE (MC<sup>2</sup>VAE), a novel causal disentanglement approach for multi-channel data, whose objective is to jointly learn modality-specific latent representations from a multi-channel dataset, and identify a causal structure between the latent channels. Each channel is projected into its own latent space, where a causal discovery step is integrated to learn the hidden causal graph. Finally, the decoder takes into account the discovered graph to predict the data. Covariate of interest can be integrated as well when available, and accounted in the causal graph structure. Extensive experiments on synthetically generated multi-channel datasets demonstrate the ability of MC<sup>2</sup>VAE in effectively uncovering the underlying latent causal structures across multiple channels, hence making it a strong candidate for real-world multi-channel causal disentanglement. Application to multi-channel data on neurodegeneration extracted from the Alzheimer's Disease Neuroimaging Initiative highlights the existence of a biologically meaningful latent causal structure, whose pertinence is supported by multiple previous experimental and modelling work, and provides actionable insight for disease progression.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104995"},"PeriodicalIF":4.5,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146137458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KATMed: a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. KATMed:一种知识增强变压器模型,用于在合并症中禁忌意识药物推荐。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-30 DOI: 10.1016/j.jbi.2026.104991
Ziqian Qiao, Shaofu Lin, Jiatong Fan, Jianhui Chen, Zhiyi Tang, Zitong Zhang

Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model's feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.

药物疾病合并症禁忌症(DDCC)是临床治疗的一个重大挑战和重点。这些禁忌症表现出典型的长尾分布,其特点是低频率、高度多样化和大量的个体差异。这些独特的特性对基于电子健康记录的药物推荐建模造成了重大限制,最终提高了临床实践中的安全风险。为了应对这一挑战,本研究提出了KATMed,这是一个知识增强的变压器模型,用于在合并症中进行禁忌症意识药物推荐。该模型采用基于transformer的患者记录编码,并利用两个自监督任务来捕获丰富的时间和语义依赖关系。在此基础上,开发了一个混合知识增强框架,以整合双向药物相关的临床关联。通过使用动态语义关联矩阵来扩展输入信息,从而增强了模型在稀疏而多样的共病记录上的特征学习能力。负DDCC规则作为可微逻辑约束纳入损失函数,以抑制不安全药物。在MIMIC-III和MIMIC-IV数据集上的实验表明,KATMed显著提高了性能,准确率提高了5.2%,安全违规率降低了2.04%。
{"title":"KATMed: a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities.","authors":"Ziqian Qiao, Shaofu Lin, Jiatong Fan, Jianhui Chen, Zhiyi Tang, Zitong Zhang","doi":"10.1016/j.jbi.2026.104991","DOIUrl":"10.1016/j.jbi.2026.104991","url":null,"abstract":"<p><p>Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model's feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104991"},"PeriodicalIF":4.5,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology-grounded knowledge graphs for mitigating hallucinations in large language models for clinical question answering 基于本体的知识图谱在临床问题回答的大型语言模型中减轻幻觉
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-28 DOI: 10.1016/j.jbi.2026.104993
Mohamed Ali, Zaki Taha, Mohamed Mabrouk Morsey

Objective:

Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.

Methods:

We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.

Results:

Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.

Conclusion:

Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.
目的:大型语言模型(LLMs)在生物医学信息学中显示出强大的潜力,但经常产生幻觉或事实错误的反应,限制了它们的临床应用。本研究旨在使用基于本体的知识图谱开发和评估GraphRAG框架,以减轻临床问题回答中的幻觉。方法:我们使用来自多个埃及机构的临床和医院数据设计了一个特定领域的资源描述框架(RDF)/Web本体语言(OWL)本体和知识图谱。本体与llm相结合,在问答过程中加强结构化语义基础。临床问题在三种情况下进行评估:(i)基线ChatGPT-4, (ii) DeepSeek-R1, (iii)我们的本体基础框架。准确性是根据临床报告的参考答案进行评估,这些答案来自五个同行评审的埃及医院研究。结果:我们的GraphRAG框架显著优于基线模型。ChatGPT-4的准确率为37%,DeepSeek-R1的准确率为52%,而基于本体的方法的准确率为98%(59/60个问题)。在我们的框架中,幻觉率从ChatGPT-4的约63%和DeepSeek-R1的48%降至1.7%,相对于最先进的llm,幻觉率降低了61%以上。该框架进一步确保了与临床知识一致的、可重复的答案,展示了其对医疗保健应用程序的健壮性。结论:基于本体的知识图谱为减轻法学硕士临床问答中的幻觉提供了一种可靠、可验证的方法。通过将结构化的临床语义嵌入到推理过程中,该框架提高了生物医学信息学的事实准确性、可重复性和安全性。这项工作强调了语义知识表示在为医疗保健决策支持构建可信赖的人工智能(AI)系统中的关键作用。
{"title":"Ontology-grounded knowledge graphs for mitigating hallucinations in large language models for clinical question answering","authors":"Mohamed Ali,&nbsp;Zaki Taha,&nbsp;Mohamed Mabrouk Morsey","doi":"10.1016/j.jbi.2026.104993","DOIUrl":"10.1016/j.jbi.2026.104993","url":null,"abstract":"<div><h3>Objective:</h3><div>Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.</div></div><div><h3>Methods:</h3><div>We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.</div></div><div><h3>Results:</h3><div>Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.</div></div><div><h3>Conclusion:</h3><div>Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104993"},"PeriodicalIF":4.5,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146076154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the transfer learning behavior of domain-specific vision–language models in screening mammography 乳腺筛查中特定领域视觉语言模型的迁移学习行为研究。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-22 DOI: 10.1016/j.jbi.2026.104986
Aisha Urooj Khan , Gokul Ramasamy , Muhammad Danish Khan , John Garrett , Tyler Bradshaw , Lonie Salkowski , Imon Banerjee
Vision-Language models have shown remarkable performance for natural images and text. Given the homology of the anatomy, high gray-scale image dimension, and the unbalanced datasets, the traditional VLMs do not adapt well to radiological applications. In this work, we empirically adapted image encoder trained within domain-specific VLMs to be applied in two downstream tasks for 2D mammogram image analysis: tissue density estimation and BI-RADS prediction. We study the transfer learning behavior using linear probing, fine-tuning, and online self distillation. We analyze that knowledge driven domain-specific VLM backbones with frozen weights perform better than MammoClip VLM model as well as supervised baselines such as ViT and CNNs even with only 5% of training data. Generalization capabilities are further studied of these models on two external datasets.
视觉语言模型在处理自然图像和文本方面表现出了显著的效果。传统的VLMs由于解剖学的同源性、高灰度图像维数和数据集不平衡等问题,不能很好地适应放射学应用。在这项工作中,我们经验地调整了在特定领域VLMs中训练的图像编码器,将其应用于二维乳房x线照片图像分析的两个下游任务:组织密度估计和BI-RADS预测。我们使用线性探测、微调和在线自蒸馏来研究迁移学习行为。我们分析了即使只有5%的训练数据,具有冻结权重的知识驱动的特定领域VLM骨干也比乳房剪辑VLM模型以及ViT和cnn等监督基线表现更好。在两个外部数据集上进一步研究了这些模型的泛化能力。
{"title":"On the transfer learning behavior of domain-specific vision–language models in screening mammography","authors":"Aisha Urooj Khan ,&nbsp;Gokul Ramasamy ,&nbsp;Muhammad Danish Khan ,&nbsp;John Garrett ,&nbsp;Tyler Bradshaw ,&nbsp;Lonie Salkowski ,&nbsp;Imon Banerjee","doi":"10.1016/j.jbi.2026.104986","DOIUrl":"10.1016/j.jbi.2026.104986","url":null,"abstract":"<div><div>Vision-Language models have shown remarkable performance for natural images and text. Given the homology of the anatomy, high gray-scale image dimension, and the unbalanced datasets, the traditional VLMs do not adapt well to radiological applications. In this work, we empirically adapted image encoder trained within domain-specific VLMs to be applied in two downstream tasks for 2D mammogram image analysis: tissue density estimation and BI-RADS prediction. We study the transfer learning behavior using linear probing, fine-tuning, and online self distillation. We analyze that knowledge driven domain-specific VLM backbones with frozen weights perform better than MammoClip VLM model as well as supervised baselines such as ViT and CNNs even with only 5% of training data. Generalization capabilities are further studied of these models on two external datasets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104986"},"PeriodicalIF":4.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146043748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review 现实世界医疗证据偏倚来源的多维层次框架:范围审查。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-20 DOI: 10.1016/j.jbi.2026.104989
Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy

Objective

This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.

Methods

A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.

Results

Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).

Conclusion

This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.
目的:本研究在电子健康记录(EHRs)的真实世界证据(RWE)生成过程中识别和分类偏倚来源,并开发了一个多维概念框架来表征大规模跨国联合网络研究中偏倚是如何产生的。方法:通过综合现有框架、结构化文献综述和多学科专家小组的反复评估,开发了一个涵盖医疗保健服务、数据管理和研究的三期偏倚框架。根据PRISMA-ScR指南进行了范围审查,分析了2016年至2025年在PubMed和Web of Science上的研究,并重点关注使用真实数据的观察性研究的偏倚。根据偏差源在RWE生成过程中的发生阶段,使用定向内容分析对其进行分类。结果:分析220篇论文在这个框架确定了209种不同的偏见来源分为七个具体的水平:获得医疗保健(n = 40),提供保健(n = 29),数据采集和测量(n = 39),临床文档和编码实践(n = 32),数据提取(n = 22),数据建模(n = 11),和数据分析(n = 36)。医疗保健阶段偏差最普遍(n = 108),其次是数据管理(n = 54)和研究水平(n = 47)。结论:该多维框架揭示了RWE生成中的偏倚来源在患者、提供者、管理、信息技术、信息学和分析领域之间相互关联,并为理解大规模观察性研究中RWE过程中的偏倚在何处以及如何产生提供了结构性基础。
{"title":"A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review","authors":"Haeun Lee ,&nbsp;Christelle Xiong ,&nbsp;Derek Baughman ,&nbsp;Chen Dun ,&nbsp;Jiayi Tong ,&nbsp;Benjamin Martin ,&nbsp;Harold Lehmann ,&nbsp;Paul Nagy","doi":"10.1016/j.jbi.2026.104989","DOIUrl":"10.1016/j.jbi.2026.104989","url":null,"abstract":"<div><h3>Objective</h3><div>This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.</div></div><div><h3>Methods</h3><div>A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.</div></div><div><h3>Results</h3><div>Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).</div></div><div><h3>Conclusion</h3><div>This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104989"},"PeriodicalIF":4.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection PEER:通过基于患者的排斥早期退出来实现可靠和有效的推断
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-20 DOI: 10.1016/j.jbi.2026.104988
Zaifu Zhan , Shuang Zhou , Rui Zhang

Objective:

In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.

Methods:

We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.

Results:

Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.

Conclusion:

PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.
目的:在生物医学应用中,模型必须平衡推理效率和可靠预测。基于患者的早期退出(PABEE)加速了推理,但在不确定性下往往失效。方法:我们提出了PEER (patient -based Early exit with Rejection),这是一个统一的框架,将拒绝机制集成到PABEE中,以提高效率和可靠性。使用PEER,模型通过使用耐心计数器来跟踪跨层的预测一致性来决定输入或拒绝。这种设计避免了不可靠的最终层预测,并将早期退出与选择性弃权统一起来,而无需再训练。我们在11个生物医学数据集上评估PEER,包括临床文本和医学图像。实验涉及多个基于transformer的主干,包括视觉变压器,通过精度、宏观f1和加速比进行测量。结果:实验表明,PEER在保持提前退出的效率收益的同时,持续提高了可靠性。例如,在MIMIC-III数据集上,PEER通过拒绝2.79%的不确定样本,达到了90.73%(超过89.48%的基线)的准确率。或者,在高效设置中,它可以在保持相当性能的同时实现80%的加速比。在不同的数据集上,PEER成功地避免了基线方法错误分类的不确定情况,从而导致更可信的预测。它有效地泛化了不同的模型体系结构、规模和模式,在语言和视觉任务中都显示出鲁棒性。案例研究进一步证实,PEER通过推迟模棱两可的病例供人审查,与临床工作流程保持一致。结论:PEER提供了一个简单的、与体系结构无关的框架,共同确保了快速和可信的推理。它在语言和视觉模型中的通用性突出了在临床决策支持中部署的强大潜力。
{"title":"PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection","authors":"Zaifu Zhan ,&nbsp;Shuang Zhou ,&nbsp;Rui Zhang","doi":"10.1016/j.jbi.2026.104988","DOIUrl":"10.1016/j.jbi.2026.104988","url":null,"abstract":"<div><h3>Objective:</h3><div>In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.</div></div><div><h3>Methods:</h3><div>We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.</div></div><div><h3>Results:</h3><div>Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.</div></div><div><h3>Conclusion:</h3><div>PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104988"},"PeriodicalIF":4.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lattice-based privacy-preserving multimodal retrieval for healthcare 用于医疗保健的基于格的隐私保护多模态检索。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-20 DOI: 10.1016/j.jbi.2026.104990
Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li
Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.
多模态数据在推进个性化诊断和精准医疗方面发挥着至关重要的作用。然而,在跨机构共享和协作分析过程中,保护患者隐私变得越来越重要,特别是在安全存储和细粒度检索敏感医疗数据方面。现有的隐私保护技术无法满足对多模态医疗数据安全高效检索的需求。为了应对这一挑战,我们为医疗保健应用程序提出了一个通用的多用户多模态可搜索加密框架,该框架支持基于从对应于任意模态的密文生成的活门的跨模态检索。我们进一步设计了一个分布式解密可搜索的加密方案,该方案首次将AudioCLIP和多密钥全同态加密相结合,以有效地检索加密的多模态数据。此外,我们构造了一个基于属性的多模态可搜索加密方案,作为实现细粒度访问控制的补充解决方案。这使得对多模态密文检索权限的灵活可控管理成为可能。在MedMNIST和AudioSet上的实验结果表明,我们的方案具有较高的检索效率和抗量子安全性,满足现实医疗应用的要求。
{"title":"Lattice-based privacy-preserving multimodal retrieval for healthcare","authors":"Yingying Hou,&nbsp;Wenbin Yao,&nbsp;Xikang Zhu,&nbsp;Zeyu Li","doi":"10.1016/j.jbi.2026.104990","DOIUrl":"10.1016/j.jbi.2026.104990","url":null,"abstract":"<div><div>Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104990"},"PeriodicalIF":4.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track TREC生物医学摘要平语适应(PLABA)轨道的经验教训。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-17 DOI: 10.1016/j.jbi.2026.104983
Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman

Objective:

Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.

Methods:

We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.

Results:

Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.

Conclusion:

The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.
目的:语言模型的最新进展表明,有可能将面向专业的生物医学文献改编为通俗易懂的语言,使其易于患者和护理人员使用。然而,它们在该领域的不可预测性和高潜在危害意味着有必要进行严格的评估。我们在这条赛道上的目标是刺激研究,并提供最有前途的系统的高质量评估。方法:我们在2023年和2024年的文本检索会议上主持了生物医学摘要的通俗语言改编(PLABA)专题。任务包括完整的、句子级的重写涉及40个消费者问题的400篇摘要(任务1),以及识别和替换涉及30个消费者问题的300篇摘要中的困难术语(任务2)。为了自动评估任务1,我们开发了一个四层专业编写的参考集。生物医学专家还对这两项任务提交的材料进行了广泛的手工评价。结果:来自12个国家的12个团队参与了研究,他们的模型从多层感知器到大型预训练变压器。在任务1的人工判断中,表现最好的模型与人类事实的准确性和完整性相媲美,但不是简单性或简洁性。自动的、基于参考的度量标准通常与人工判断不太相关。在任务2中,系统努力识别困难的术语并对如何替换它们进行分类。然而,在生成替换时,基于llm的系统在人工判断的准确性、完整性和简单性方面做得很好,尽管在简洁性方面做得不好。结论:PLABA轨道显示了使用大语言模型适应普通公众的生物医学文献的希望,同时也突出了它们的不足和改进自动基准工具的必要性。
{"title":"Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track","authors":"Brian Ondov ,&nbsp;William Xia ,&nbsp;Kush Attal ,&nbsp;Ishita Unde ,&nbsp;Jerry He ,&nbsp;Dina Demner-Fushman","doi":"10.1016/j.jbi.2026.104983","DOIUrl":"10.1016/j.jbi.2026.104983","url":null,"abstract":"<div><h3>Objective:</h3><div>Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.</div></div><div><h3>Methods:</h3><div>We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.</div></div><div><h3>Results:</h3><div>Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.</div></div><div><h3>Conclusion:</h3><div>The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104983"},"PeriodicalIF":4.5,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A federated learning framework for ethical dynamic treatment allocation across heterogeneous hospitals 跨异构医院伦理动态治疗分配的联邦学习框架。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-16 DOI: 10.1016/j.jbi.2026.104987
Xenia Konti , Nicoleta J. Economou-Zavlanos , Yi Shen , Giorgos Stamou , Armando Bedoya , Michael J. Pencina , Chuan Hong , Michael M. Zavlanos

Objective

In this paper, we propose an adaptive federated learning framework to learn optimal treatments for individual hospitals that possibly serve different patient populations. The proposed framework can enable the design of more efficient treatment allocation problems.

Methods

We propose a federated treatment recommendation strategy that for each hospital is formulated as a Multi-Armed Bandit (MAB) problem. The process is coordinated by a lead hospital that adaptively learns and transfers Upper Confidence Bounds (UCB) across similar hospitals and Personalized Upper Bounds across heterogeneous hospitals. We test our proposed method on a simulated clinical trial environment created using real Covid-19 data from the Duke University Health System.

Results

Our method relies on collaboration among hospitals, which allows for fewer data samples needed per institution, while protecting the privacy of the individual patient data. At the same time, it ensures fairness of the learned treatments by mitigating possible biases due to differences in the patient populations treated across different hospitals. Finally, our method improves the safety of the learning procedure by reducing the number of patients administered with sub-optimal treatments at each hospital. In the experiments, we show that our proposed method outperforms other state of the art approaches in that it requires up to 36%–75% fewer patient data to learn the optimal treatment for each hospital and administers the optimal treatment to 0.95%-48.6% more patients.

Conclusion

In this paper, we propose an adaptive federated learning strategy for treatment recommendation tasks, that learns optimal treatments for individual hospitals that possibly serve different patient populations, while satisfying privacy, fairness, and safety considerations.
目的:在本文中,我们提出了一个自适应联邦学习框架,以学习可能服务于不同患者群体的个别医院的最佳治疗方法。提出的框架可以使设计更有效的处理分配问题。方法:我们提出了一个联合治疗推荐策略,为每个医院制定了一个多武装强盗(MAB)问题。该过程由一家领先的医院协调,该医院可自适应地学习并在类似医院之间传递上限置信界限(UCB),并在异构医院之间传递个性化上限界限。我们在使用杜克大学卫生系统的真实Covid-19数据创建的模拟临床试验环境中测试了我们提出的方法。结果:我们的方法依赖于医院之间的协作,这使得每个机构所需的数据样本更少,同时保护了个体患者数据的隐私。同时,它通过减少因不同医院治疗的患者群体差异而可能产生的偏见,确保了所学治疗方法的公平性。最后,我们的方法通过减少在每家医院接受次优治疗的患者数量,提高了学习过程的安全性。在实验中,我们表明,我们提出的方法优于其他最先进的方法,因为它需要多达36%-75%的患者数据来学习每个医院的最佳治疗方法,并为0.95%-48.6%的患者提供最佳治疗。结论:在本文中,我们提出了一种针对治疗推荐任务的自适应联邦学习策略,该策略可以为可能服务于不同患者群体的单个医院学习最佳治疗方法,同时满足隐私、公平和安全方面的考虑。
{"title":"A federated learning framework for ethical dynamic treatment allocation across heterogeneous hospitals","authors":"Xenia Konti ,&nbsp;Nicoleta J. Economou-Zavlanos ,&nbsp;Yi Shen ,&nbsp;Giorgos Stamou ,&nbsp;Armando Bedoya ,&nbsp;Michael J. Pencina ,&nbsp;Chuan Hong ,&nbsp;Michael M. Zavlanos","doi":"10.1016/j.jbi.2026.104987","DOIUrl":"10.1016/j.jbi.2026.104987","url":null,"abstract":"<div><h3>Objective</h3><div>In this paper, we propose an adaptive federated learning framework to learn optimal treatments for individual hospitals that possibly serve different patient populations. The proposed framework can enable the design of more efficient treatment allocation problems.</div></div><div><h3>Methods</h3><div>We propose a federated treatment recommendation strategy that for each hospital is formulated as a Multi-Armed Bandit (MAB) problem. The process is coordinated by a lead hospital that adaptively learns and transfers Upper Confidence Bounds (UCB) across similar hospitals and Personalized Upper Bounds across heterogeneous hospitals. We test our proposed method on a simulated clinical trial environment created using real Covid-19 data from the Duke University Health System.</div></div><div><h3>Results</h3><div>Our method relies on collaboration among hospitals, which allows for fewer data samples needed per institution, while protecting the privacy of the individual patient data. At the same time, it ensures fairness of the learned treatments by mitigating possible biases due to differences in the patient populations treated across different hospitals. Finally, our method improves the safety of the learning procedure by reducing the number of patients administered with sub-optimal treatments at each hospital. In the experiments, we show that our proposed method outperforms other state of the art approaches in that it requires up to 36%–75% fewer patient data to learn the optimal treatment for each hospital and administers the optimal treatment to 0.95%-48.6% more patients.</div></div><div><h3>Conclusion</h3><div>In this paper, we propose an adaptive federated learning strategy for treatment recommendation tasks, that learns optimal treatments for individual hospitals that possibly serve different patient populations, while satisfying privacy, fairness, and safety considerations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104987"},"PeriodicalIF":4.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference DMDGRN:一种基于数据增强的多层有向图卷积网络,用于基因调控网络推理。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-14 DOI: 10.1016/j.jbi.2026.104985
Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng

Objective

Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.

Methods

In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.

Results

Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.

Conclusions

The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.
目的:基因调控网络(grn)提供了转录因子(tf)与其靶基因之间的调控相互作用的图形表示,控制着定义细胞身份和功能的转录状态。破解grn是破解疾病发病机制的基础,也是系统生物学的核心挑战。近年来,基于图神经网络的方法由于其对图结构生物数据建模的卓越能力,在GRN推理方面取得了重大进展。然而,grn的方向性、稀疏性和丰富的高阶调控相互作用等固有特性往往被忽视。方法:在这项研究中,我们提出了DMDGRN,一种基于数据增强的多层有向图卷积网络,用于GRN推理。为了捕获grn的方向,DMDGRN采用相位矩阵构造拉普拉斯算子,可以跟踪消息的传播路径。考虑到已知grn固有的稀疏性,DMDGRN引入了数据增强技术来克服网络的稀疏性。此外,DMDGRN采用带残差连接的多层有向网络架构提取高阶邻域信息。结果:对基准数据集的综合评估表明,DMDGRN显著提高了GRN推理精度。值得注意的是,在乳腺癌上的应用表明,我们的框架成功地确定了人类乳腺癌的相关治疗候选者。结论:研究结果表明,我们采用的策略对推断grn是有效的。乳腺癌数据的成功应用进一步凸显了DMDGRN在揭示疾病相关调控机制和确定治疗靶点方面的潜力,使其成为推进计算生物学和转化医学的有前途的工具。
{"title":"DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference","authors":"Pi-Jing Wei ,&nbsp;Mingzhu Sun ,&nbsp;Zheng Ding ,&nbsp;Rui-Fen Cao ,&nbsp;Zhen Gao ,&nbsp;Chun-Hou Zheng","doi":"10.1016/j.jbi.2026.104985","DOIUrl":"10.1016/j.jbi.2026.104985","url":null,"abstract":"<div><h3>Objective</h3><div>Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.</div></div><div><h3>Methods</h3><div>In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.</div></div><div><h3>Results</h3><div>Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.</div></div><div><h3>Conclusions</h3><div>The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104985"},"PeriodicalIF":4.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1