首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
KATMed: a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities KATMed:一种知识增强变压器模型,用于在合并症中禁忌意识药物推荐。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-30 DOI: 10.1016/j.jbi.2026.104991
Ziqian Qiao , Shaofu Lin , Jiatong Fan , Jianhui Chen , Zhiyi Tang , Zitong Zhang
Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model’s feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.
药物疾病合并症禁忌症(DDCC)是临床治疗的一个重大挑战和重点。这些禁忌症表现出典型的长尾分布,其特点是低频率、高度多样化和大量的个体差异。这些独特的特性对基于电子健康记录的药物推荐建模造成了重大限制,最终提高了临床实践中的安全风险。为了应对这一挑战,本研究提出了KATMed,这是一个知识增强的变压器模型,用于在合并症中进行禁忌症意识药物推荐。该模型采用基于transformer的患者记录编码,并利用两个自监督任务来捕获丰富的时间和语义依赖关系。在此基础上,开发了一个混合知识增强框架,以整合双向药物相关的临床关联。通过使用动态语义关联矩阵来扩展输入信息,从而增强了模型在稀疏而多样的共病记录上的特征学习能力。负DDCC规则作为可微逻辑约束纳入损失函数,以抑制不安全药物。在MIMIC-III和MIMIC-IV数据集上的实验表明,KATMed显著提高了性能,准确率提高了5.2%,安全违规率降低了2.04%。
{"title":"KATMed: a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities","authors":"Ziqian Qiao ,&nbsp;Shaofu Lin ,&nbsp;Jiatong Fan ,&nbsp;Jianhui Chen ,&nbsp;Zhiyi Tang ,&nbsp;Zitong Zhang","doi":"10.1016/j.jbi.2026.104991","DOIUrl":"10.1016/j.jbi.2026.104991","url":null,"abstract":"<div><div>Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model’s feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104991"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology-grounded knowledge graphs for mitigating hallucinations in large language models for clinical question answering 基于本体的知识图谱在临床问题回答的大型语言模型中减轻幻觉
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-28 DOI: 10.1016/j.jbi.2026.104993
Mohamed Ali, Zaki Taha, Mohamed Mabrouk Morsey

Objective:

Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.

Methods:

We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.

Results:

Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.

Conclusion:

Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.
目的:大型语言模型(LLMs)在生物医学信息学中显示出强大的潜力,但经常产生幻觉或事实错误的反应,限制了它们的临床应用。本研究旨在使用基于本体的知识图谱开发和评估GraphRAG框架,以减轻临床问题回答中的幻觉。方法:我们使用来自多个埃及机构的临床和医院数据设计了一个特定领域的资源描述框架(RDF)/Web本体语言(OWL)本体和知识图谱。本体与llm相结合,在问答过程中加强结构化语义基础。临床问题在三种情况下进行评估:(i)基线ChatGPT-4, (ii) DeepSeek-R1, (iii)我们的本体基础框架。准确性是根据临床报告的参考答案进行评估,这些答案来自五个同行评审的埃及医院研究。结果:我们的GraphRAG框架显著优于基线模型。ChatGPT-4的准确率为37%,DeepSeek-R1的准确率为52%,而基于本体的方法的准确率为98%(59/60个问题)。在我们的框架中,幻觉率从ChatGPT-4的约63%和DeepSeek-R1的48%降至1.7%,相对于最先进的llm,幻觉率降低了61%以上。该框架进一步确保了与临床知识一致的、可重复的答案,展示了其对医疗保健应用程序的健壮性。结论:基于本体的知识图谱为减轻法学硕士临床问答中的幻觉提供了一种可靠、可验证的方法。通过将结构化的临床语义嵌入到推理过程中,该框架提高了生物医学信息学的事实准确性、可重复性和安全性。这项工作强调了语义知识表示在为医疗保健决策支持构建可信赖的人工智能(AI)系统中的关键作用。
{"title":"Ontology-grounded knowledge graphs for mitigating hallucinations in large language models for clinical question answering","authors":"Mohamed Ali,&nbsp;Zaki Taha,&nbsp;Mohamed Mabrouk Morsey","doi":"10.1016/j.jbi.2026.104993","DOIUrl":"10.1016/j.jbi.2026.104993","url":null,"abstract":"<div><h3>Objective:</h3><div>Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.</div></div><div><h3>Methods:</h3><div>We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.</div></div><div><h3>Results:</h3><div>Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.</div></div><div><h3>Conclusion:</h3><div>Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104993"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146076154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection PEER:通过基于患者的排斥早期退出来实现可靠和有效的推断
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-20 DOI: 10.1016/j.jbi.2026.104988
Zaifu Zhan , Shuang Zhou , Rui Zhang

Objective:

In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.

Methods:

We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.

Results:

Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.

Conclusion:

PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.
目的:在生物医学应用中,模型必须平衡推理效率和可靠预测。基于患者的早期退出(PABEE)加速了推理,但在不确定性下往往失效。方法:我们提出了PEER (patient -based Early exit with Rejection),这是一个统一的框架,将拒绝机制集成到PABEE中,以提高效率和可靠性。使用PEER,模型通过使用耐心计数器来跟踪跨层的预测一致性来决定输入或拒绝。这种设计避免了不可靠的最终层预测,并将早期退出与选择性弃权统一起来,而无需再训练。我们在11个生物医学数据集上评估PEER,包括临床文本和医学图像。实验涉及多个基于transformer的主干,包括视觉变压器,通过精度、宏观f1和加速比进行测量。结果:实验表明,PEER在保持提前退出的效率收益的同时,持续提高了可靠性。例如,在MIMIC-III数据集上,PEER通过拒绝2.79%的不确定样本,达到了90.73%(超过89.48%的基线)的准确率。或者,在高效设置中,它可以在保持相当性能的同时实现80%的加速比。在不同的数据集上,PEER成功地避免了基线方法错误分类的不确定情况,从而导致更可信的预测。它有效地泛化了不同的模型体系结构、规模和模式,在语言和视觉任务中都显示出鲁棒性。案例研究进一步证实,PEER通过推迟模棱两可的病例供人审查,与临床工作流程保持一致。结论:PEER提供了一个简单的、与体系结构无关的框架,共同确保了快速和可信的推理。它在语言和视觉模型中的通用性突出了在临床决策支持中部署的强大潜力。
{"title":"PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection","authors":"Zaifu Zhan ,&nbsp;Shuang Zhou ,&nbsp;Rui Zhang","doi":"10.1016/j.jbi.2026.104988","DOIUrl":"10.1016/j.jbi.2026.104988","url":null,"abstract":"<div><h3>Objective:</h3><div>In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.</div></div><div><h3>Methods:</h3><div>We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.</div></div><div><h3>Results:</h3><div>Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.</div></div><div><h3>Conclusion:</h3><div>PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104988"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lattice-based privacy-preserving multimodal retrieval for healthcare 用于医疗保健的基于格的隐私保护多模态检索。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-20 DOI: 10.1016/j.jbi.2026.104990
Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li
Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.
多模态数据在推进个性化诊断和精准医疗方面发挥着至关重要的作用。然而,在跨机构共享和协作分析过程中,保护患者隐私变得越来越重要,特别是在安全存储和细粒度检索敏感医疗数据方面。现有的隐私保护技术无法满足对多模态医疗数据安全高效检索的需求。为了应对这一挑战,我们为医疗保健应用程序提出了一个通用的多用户多模态可搜索加密框架,该框架支持基于从对应于任意模态的密文生成的活门的跨模态检索。我们进一步设计了一个分布式解密可搜索的加密方案,该方案首次将AudioCLIP和多密钥全同态加密相结合,以有效地检索加密的多模态数据。此外,我们构造了一个基于属性的多模态可搜索加密方案,作为实现细粒度访问控制的补充解决方案。这使得对多模态密文检索权限的灵活可控管理成为可能。在MedMNIST和AudioSet上的实验结果表明,我们的方案具有较高的检索效率和抗量子安全性,满足现实医疗应用的要求。
{"title":"Lattice-based privacy-preserving multimodal retrieval for healthcare","authors":"Yingying Hou,&nbsp;Wenbin Yao,&nbsp;Xikang Zhu,&nbsp;Zeyu Li","doi":"10.1016/j.jbi.2026.104990","DOIUrl":"10.1016/j.jbi.2026.104990","url":null,"abstract":"<div><div>Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104990"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference DMDGRN:一种基于数据增强的多层有向图卷积网络,用于基因调控网络推理。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-14 DOI: 10.1016/j.jbi.2026.104985
Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng

Objective

Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.

Methods

In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.

Results

Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.

Conclusions

The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.
目的:基因调控网络(grn)提供了转录因子(tf)与其靶基因之间的调控相互作用的图形表示,控制着定义细胞身份和功能的转录状态。破解grn是破解疾病发病机制的基础,也是系统生物学的核心挑战。近年来,基于图神经网络的方法由于其对图结构生物数据建模的卓越能力,在GRN推理方面取得了重大进展。然而,grn的方向性、稀疏性和丰富的高阶调控相互作用等固有特性往往被忽视。方法:在这项研究中,我们提出了DMDGRN,一种基于数据增强的多层有向图卷积网络,用于GRN推理。为了捕获grn的方向,DMDGRN采用相位矩阵构造拉普拉斯算子,可以跟踪消息的传播路径。考虑到已知grn固有的稀疏性,DMDGRN引入了数据增强技术来克服网络的稀疏性。此外,DMDGRN采用带残差连接的多层有向网络架构提取高阶邻域信息。结果:对基准数据集的综合评估表明,DMDGRN显著提高了GRN推理精度。值得注意的是,在乳腺癌上的应用表明,我们的框架成功地确定了人类乳腺癌的相关治疗候选者。结论:研究结果表明,我们采用的策略对推断grn是有效的。乳腺癌数据的成功应用进一步凸显了DMDGRN在揭示疾病相关调控机制和确定治疗靶点方面的潜力,使其成为推进计算生物学和转化医学的有前途的工具。
{"title":"DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference","authors":"Pi-Jing Wei ,&nbsp;Mingzhu Sun ,&nbsp;Zheng Ding ,&nbsp;Rui-Fen Cao ,&nbsp;Zhen Gao ,&nbsp;Chun-Hou Zheng","doi":"10.1016/j.jbi.2026.104985","DOIUrl":"10.1016/j.jbi.2026.104985","url":null,"abstract":"<div><h3>Objective</h3><div>Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.</div></div><div><h3>Methods</h3><div>In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.</div></div><div><h3>Results</h3><div>Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.</div></div><div><h3>Conclusions</h3><div>The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104985"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track TREC生物医学摘要平语适应(PLABA)轨道的经验教训。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-03-01 Epub Date: 2026-01-17 DOI: 10.1016/j.jbi.2026.104983
Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman

Objective:

Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.

Methods:

We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.

Results:

Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.

Conclusion:

The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.
目的:语言模型的最新进展表明,有可能将面向专业的生物医学文献改编为通俗易懂的语言,使其易于患者和护理人员使用。然而,它们在该领域的不可预测性和高潜在危害意味着有必要进行严格的评估。我们在这条赛道上的目标是刺激研究,并提供最有前途的系统的高质量评估。方法:我们在2023年和2024年的文本检索会议上主持了生物医学摘要的通俗语言改编(PLABA)专题。任务包括完整的、句子级的重写涉及40个消费者问题的400篇摘要(任务1),以及识别和替换涉及30个消费者问题的300篇摘要中的困难术语(任务2)。为了自动评估任务1,我们开发了一个四层专业编写的参考集。生物医学专家还对这两项任务提交的材料进行了广泛的手工评价。结果:来自12个国家的12个团队参与了研究,他们的模型从多层感知器到大型预训练变压器。在任务1的人工判断中,表现最好的模型与人类事实的准确性和完整性相媲美,但不是简单性或简洁性。自动的、基于参考的度量标准通常与人工判断不太相关。在任务2中,系统努力识别困难的术语并对如何替换它们进行分类。然而,在生成替换时,基于llm的系统在人工判断的准确性、完整性和简单性方面做得很好,尽管在简洁性方面做得不好。结论:PLABA轨道显示了使用大语言模型适应普通公众的生物医学文献的希望,同时也突出了它们的不足和改进自动基准工具的必要性。
{"title":"Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track","authors":"Brian Ondov ,&nbsp;William Xia ,&nbsp;Kush Attal ,&nbsp;Ishita Unde ,&nbsp;Jerry He ,&nbsp;Dina Demner-Fushman","doi":"10.1016/j.jbi.2026.104983","DOIUrl":"10.1016/j.jbi.2026.104983","url":null,"abstract":"<div><h3>Objective:</h3><div>Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.</div></div><div><h3>Methods:</h3><div>We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.</div></div><div><h3>Results:</h3><div>Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.</div></div><div><h3>Conclusion:</h3><div>The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104983"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence 基于生成人工智能的乳腺癌多模态虚拟活检增强智能。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2025-12-26 DOI: 10.1016/j.jbi.2025.104971
Aurora Rofena , Claudia Lucia Piccolo , Bruno Beomonte Zobel , Paolo Soda , Valerio Guarrasi

Objective:

This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.

Methods:

The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.

Results:

CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.

Conclusion:

This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.
目的:本研究旨在通过整合全场数字乳房x线摄影(FFDM)和对比增强光谱乳房x线摄影(CESM),提出一种用于乳腺癌虚拟活检的多模式、多视图深度学习方法,对乳房病变进行恶性或良性的无创分类。这项工作通过引入生成式人工智能(AI)来合成不可用的CESM图像,从而确保诊断工作流程的连续性,解决了缺少CESM数据的关键挑战。方法:在颅侧(CC)和中外侧斜(MLO)视图上使用FFDM和CESM图像。当缺少CESM时,基于cyclegan的生成模型从FFDM输入生成合成CESM图像。为了进行分类,使用了三个卷积神经网络(ResNet18, ResNet50和VGG16),并采用两阶段后期融合策略将特定视图和特定模式的恶性肿瘤概率结合起来,通过马修斯相关系数(MCC)加权,形成最终的恶性肿瘤评分。通过逐步用合成的CESM输入替换真实的CESM输入,并使用AUC、G-mean和MCC评估分类性能,测试了系统在不同程度缺失CESM数据下的鲁棒性。结果:CycleGAN实现了高保真的CESM合成,在CC和MLO视图上,峰值信噪比超过24 dB,结构相似指数超过0.8。对于病变分类,结合FFDM和CESM的多模态配置始终优于单模态FFDM设置。值得注意的是,即使CESM完全被合成图像取代,与单独的FFDM相比,多模态方法仍然提高了虚拟活检的性能。虽然分类性能随着合成CESM比例的增加而下降,但合成数据的使用仍然是有益的。结论:本研究表明,生成式人工智能可以通过合成CESM图像来增强基于ffdm的虚拟活检管道,有效解决乳腺癌诊断中缺失模态的挑战。在缺乏真实CESM数据的情况下,与单独使用FFDM相比,结合合成图像可以改善病变分类,为支持临床决策提供非侵入性替代方案。此外,通过发布扩展的CESM@UCBM数据集,本研究为推进乳腺多模态诊断系统的研究和创新提供了宝贵的资源。
{"title":"Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence","authors":"Aurora Rofena ,&nbsp;Claudia Lucia Piccolo ,&nbsp;Bruno Beomonte Zobel ,&nbsp;Paolo Soda ,&nbsp;Valerio Guarrasi","doi":"10.1016/j.jbi.2025.104971","DOIUrl":"10.1016/j.jbi.2025.104971","url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.</div></div><div><h3>Methods:</h3><div>The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.</div></div><div><h3>Results:</h3><div>CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.</div></div><div><h3>Conclusion:</h3><div>This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104971"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A computational framework for predicting drug-target interactions by fusing gene ontology information with cross attention 交叉关注融合基因本体信息预测药物-靶标相互作用的计算框架
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2026-01-02 DOI: 10.1016/j.jbi.2025.104976
Wenchao Cui, Pingjian Ding, Lingyun Luo, Shunheng Zhou, Hui Jiang

Motivation

Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate in silico prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.

Results

In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.
动机识别药物-靶标相互作用(DTIs)是药物发现和药物再利用的关键步骤。准确的dti计算机预测可以大大减少开发时间和成本。基于序列的方法的最新进展利用注意机制来提高预测的准确性。然而,这些方法通常只依赖于药物和蛋白质的分子结构,而忽略了反映功能和生物关系的更高层次的语义信息。结果提出了一种基于基因本体论的药物-靶标相互作用预测模型GODTI,该模型通过多模态特征集成提高了药物-靶标相互作用预测的性能。GODTI包括三个主要部分:特征提取模块、多模态融合模块和分子间相互作用建模模块。在蛋白质特征提取器中,获得了来自基因本体的功能描述子和来自氨基酸序列的基于序列的嵌入子并进行了组合。然后通过多模态融合模块将这些蛋白质表征与药物分子特征整合,随后由相互作用建模模块进行处理,以预测潜在的相互作用。我们在四种现实的实验设置下评估了GODTI,显示出与最先进的基线相一致的改进。此外,案例研究证实了GODTI在准确识别新型低成本dti方面的实际效用,强调了其加速药物发现工作流程的潜力。
{"title":"A computational framework for predicting drug-target interactions by fusing gene ontology information with cross attention","authors":"Wenchao Cui,&nbsp;Pingjian Ding,&nbsp;Lingyun Luo,&nbsp;Shunheng Zhou,&nbsp;Hui Jiang","doi":"10.1016/j.jbi.2025.104976","DOIUrl":"10.1016/j.jbi.2025.104976","url":null,"abstract":"<div><h3>Motivation</h3><div>Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate <em>in silico</em> prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.</div></div><div><h3>Results</h3><div>In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104976"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145891182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Fine-Tuning: Leveraging Domain-Aware In-Context learning with large language models for clinical named entity recognition 超越微调:利用领域感知上下文学习与大型语言模型进行临床命名实体识别。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2026-01-08 DOI: 10.1016/j.jbi.2026.104982
Siun Kim , David Seung U Lee , Yujin Kim , Hyung-Jin Yoon , Howard Lee

Background

Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.

Objective

To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.

Methods

We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.

Results

Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 > 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.

Conclusion

With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.
背景:临床命名实体识别(NER)是构建临床叙事的关键。虽然基于大语言模型(LLM)的上下文学习(ICL)能够实现无参数自适应,但基于编码器的微调通常在实际的生物医学NER设置中取得了优异的性能。目的:在现实约束下系统比较ICL和基于编码器的临床NER微调,并确定优化ICL演示选择是否可以缩小性能差距。方法:对首尔国立大学医院恶性血液病患者的2113份临床记录和400份MIMIC-IV记录进行手工注释。ICL配置在任务指令、输出格式、演示选择方法、排序策略和池大小方面进行了优化,使用了Ollama提供的LLaMA-3.3-70B(开源)。在特定领域和通用领域模型上执行编码器微调,RoBERTa-large表示最佳编码器基线。所有模型都被评估为标记级分类任务,使用宏观和加权F1,跨越域内、跨域和跨机构场景。结果:示范选择在决定ICL性能方面发挥了主要作用,在我们的实验设置下,与随机选择相比,宏观F1提高了9.4分。在中等资源设置(500个样本池)中,ICL比RoBERTa-large微调高出4.7个宏观F1点,并在900个样本中保持竞争力。ICL和微调在跨域评估中都经历了性能下降,但ICL展示了优越的数据效率,用更少的标记示例实现了具有竞争力的准确性。ICL在多个领域实现了域内宏F1 > 0.8,优于全数据微调编码器,并且在没有参数更新的情况下,在跨机构转移方面获得了6.3至11.6点的收益。在最大的池大小(约1,900个样本)下,基于编码器的微调重新领先。结论:通过优化的领域感知演示选择,基于开源llm的ICL可以匹配或超过临床NER的编码器微调。它易于适应,并且能够通过演示池更新知识(无需再培训),从而在动态的、资源有限的医疗保健环境中实现持续改进。
{"title":"Beyond Fine-Tuning: Leveraging Domain-Aware In-Context learning with large language models for clinical named entity recognition","authors":"Siun Kim ,&nbsp;David Seung U Lee ,&nbsp;Yujin Kim ,&nbsp;Hyung-Jin Yoon ,&nbsp;Howard Lee","doi":"10.1016/j.jbi.2026.104982","DOIUrl":"10.1016/j.jbi.2026.104982","url":null,"abstract":"<div><h3>Background</h3><div>Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.</div></div><div><h3>Objective</h3><div>To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.</div></div><div><h3>Methods</h3><div>We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.</div></div><div><h3>Results</h3><div>Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 &gt; 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.</div></div><div><h3>Conclusion</h3><div>With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104982"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review 现实世界医疗证据偏倚来源的多维层次框架:范围审查。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-01 Epub Date: 2026-01-20 DOI: 10.1016/j.jbi.2026.104989
Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy

Objective

This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.

Methods

A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.

Results

Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).

Conclusion

This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.
目的:本研究在电子健康记录(EHRs)的真实世界证据(RWE)生成过程中识别和分类偏倚来源,并开发了一个多维概念框架来表征大规模跨国联合网络研究中偏倚是如何产生的。方法:通过综合现有框架、结构化文献综述和多学科专家小组的反复评估,开发了一个涵盖医疗保健服务、数据管理和研究的三期偏倚框架。根据PRISMA-ScR指南进行了范围审查,分析了2016年至2025年在PubMed和Web of Science上的研究,并重点关注使用真实数据的观察性研究的偏倚。根据偏差源在RWE生成过程中的发生阶段,使用定向内容分析对其进行分类。结果:分析220篇论文在这个框架确定了209种不同的偏见来源分为七个具体的水平:获得医疗保健(n = 40),提供保健(n = 29),数据采集和测量(n = 39),临床文档和编码实践(n = 32),数据提取(n = 22),数据建模(n = 11),和数据分析(n = 36)。医疗保健阶段偏差最普遍(n = 108),其次是数据管理(n = 54)和研究水平(n = 47)。结论:该多维框架揭示了RWE生成中的偏倚来源在患者、提供者、管理、信息技术、信息学和分析领域之间相互关联,并为理解大规模观察性研究中RWE过程中的偏倚在何处以及如何产生提供了结构性基础。
{"title":"A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review","authors":"Haeun Lee ,&nbsp;Christelle Xiong ,&nbsp;Derek Baughman ,&nbsp;Chen Dun ,&nbsp;Jiayi Tong ,&nbsp;Benjamin Martin ,&nbsp;Harold Lehmann ,&nbsp;Paul Nagy","doi":"10.1016/j.jbi.2026.104989","DOIUrl":"10.1016/j.jbi.2026.104989","url":null,"abstract":"<div><h3>Objective</h3><div>This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.</div></div><div><h3>Methods</h3><div>A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.</div></div><div><h3>Results</h3><div>Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).</div></div><div><h3>Conclusion</h3><div>This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104989"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1