Pub Date : 2026-03-01Epub Date: 2026-01-30DOI: 10.1016/j.jbi.2026.104991
Ziqian Qiao , Shaofu Lin , Jiatong Fan , Jianhui Chen , Zhiyi Tang , Zitong Zhang
Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model’s feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.
{"title":"KATMed: a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities","authors":"Ziqian Qiao , Shaofu Lin , Jiatong Fan , Jianhui Chen , Zhiyi Tang , Zitong Zhang","doi":"10.1016/j.jbi.2026.104991","DOIUrl":"10.1016/j.jbi.2026.104991","url":null,"abstract":"<div><div>Drug-disease contraindications in comorbidities (DDCC) pose a significant challenge and priority in clinical treatment. These contraindications exhibit a prototypical long-tail distribution, characterized by low-frequency, highly diverse, and substantial individual variability. Such distinct properties impose significant limitations on electronic health record-based medication recommendation modeling, ultimately elevating safety risks in clinical practice. To address this challenge, this study proposes KATMed, a knowledge-augmented transformer model for contraindication-aware medication recommendation in comorbidities. The model employs Transformer-based encoding of patient records and leverages two self-supervised tasks to capture rich temporal and semantic dependencies. Based on this foundation, a hybrid knowledge-augmented framework is developed to integrate bidirectional medication-related clinical associations. Positive disease-procedure associations are modeled by using a dynamic semantic relevance matrix to expand the input information, thereby enhancing the model’s feature learning capability on sparse yet diverse comorbidity records. Negative DDCC rules are incorporated as differentiable logical constraints in the loss function to suppress unsafe medications. Experiments on the MIMIC-III and MIMIC-IV datasets show that KATMed significantly improves performance, achieving a 5.2% increase in accuracy and a 2.04% reduction in safety violations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104991"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-28DOI: 10.1016/j.jbi.2026.104993
Mohamed Ali, Zaki Taha, Mohamed Mabrouk Morsey
Objective:
Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.
Methods:
We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.
Results:
Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.
Conclusion:
Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.
{"title":"Ontology-grounded knowledge graphs for mitigating hallucinations in large language models for clinical question answering","authors":"Mohamed Ali, Zaki Taha, Mohamed Mabrouk Morsey","doi":"10.1016/j.jbi.2026.104993","DOIUrl":"10.1016/j.jbi.2026.104993","url":null,"abstract":"<div><h3>Objective:</h3><div>Large Language Models (LLMs) show strong potential in biomedical informatics but frequently generate hallucinated or factually incorrect responses, limiting their clinical utility. This study aims to develop and evaluate a GraphRAG framework using an ontology-grounded knowledge graph that mitigates hallucinations in clinical question answering.</div></div><div><h3>Methods:</h3><div>We designed a domain-specific Resource Description Framework (RDF)/Web Ontology Language (OWL) ontology and knowledge graph using clinical and hospital data from multiple Egyptian institutions. The ontology was integrated with LLMs to enforce structured semantic grounding during question answering. Clinical questions were evaluated under three conditions: (i) baseline ChatGPT-4, (ii) DeepSeek-R1, and (iii) our ontology-grounded framework. Accuracy was evaluated against clinically reported reference answers derived from five peer-reviewed Egyptian hospital studies.</div></div><div><h3>Results:</h3><div>Our GraphRAG framework significantly outperformed baseline models. While ChatGPT-4 achieved 37% accuracy and DeepSeek-R1 achieved 52%, the ontology-grounded approach achieved 98% accuracy (59/60 questions). The hallucination rate was reduced from approximately 63% in ChatGPT-4 and 48% in DeepSeek-R1 to just 1.7% in our framework, representing a relative hallucination reduction of more than 61% relative to state-of-the-art LLMs. The framework further ensured consistent, reproducible answers aligned with clinical knowledge, demonstrating its robustness for healthcare applications.</div></div><div><h3>Conclusion:</h3><div>Ontology-grounded knowledge graphs provide a reliable and verifiable method for mitigating hallucinations in LLM-based clinical question answering. By embedding structured clinical semantics into the reasoning process, the framework enhances factual accuracy, reproducibility, and safety in biomedical informatics. This work highlights the critical role of semantic knowledge representation in building trustworthy Artificial Intelligence (AI) systems for healthcare decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104993"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146076154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-20DOI: 10.1016/j.jbi.2026.104988
Zaifu Zhan , Shuang Zhou , Rui Zhang
Objective:
In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.
Methods:
We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.
Results:
Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.
Conclusion:
PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.
目的:在生物医学应用中,模型必须平衡推理效率和可靠预测。基于患者的早期退出(PABEE)加速了推理,但在不确定性下往往失效。方法:我们提出了PEER (patient -based Early exit with Rejection),这是一个统一的框架,将拒绝机制集成到PABEE中,以提高效率和可靠性。使用PEER,模型通过使用耐心计数器来跟踪跨层的预测一致性来决定输入或拒绝。这种设计避免了不可靠的最终层预测,并将早期退出与选择性弃权统一起来,而无需再训练。我们在11个生物医学数据集上评估PEER,包括临床文本和医学图像。实验涉及多个基于transformer的主干,包括视觉变压器,通过精度、宏观f1和加速比进行测量。结果:实验表明,PEER在保持提前退出的效率收益的同时,持续提高了可靠性。例如,在MIMIC-III数据集上,PEER通过拒绝2.79%的不确定样本,达到了90.73%(超过89.48%的基线)的准确率。或者,在高效设置中,它可以在保持相当性能的同时实现80%的加速比。在不同的数据集上,PEER成功地避免了基线方法错误分类的不确定情况,从而导致更可信的预测。它有效地泛化了不同的模型体系结构、规模和模式,在语言和视觉任务中都显示出鲁棒性。案例研究进一步证实,PEER通过推迟模棱两可的病例供人审查,与临床工作流程保持一致。结论:PEER提供了一个简单的、与体系结构无关的框架,共同确保了快速和可信的推理。它在语言和视觉模型中的通用性突出了在临床决策支持中部署的强大潜力。
{"title":"PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection","authors":"Zaifu Zhan , Shuang Zhou , Rui Zhang","doi":"10.1016/j.jbi.2026.104988","DOIUrl":"10.1016/j.jbi.2026.104988","url":null,"abstract":"<div><h3>Objective:</h3><div>In biomedical applications, models must balance inference efficiency with reliable predictions. Patience-based early exiting (PABEE) accelerates inference but often fails under uncertainty.</div></div><div><h3>Methods:</h3><div>We propose PEER (Patience-based Early Exiting with Rejection), a unified framework that integrates a rejection mechanism into PABEE to enable both efficiency and reliability. With PEER, models make decisions for the input or reject by employing a patience counter to track prediction consistency across layers. This design avoids unreliable final-layer predictions and unifies early exiting with selective abstention without retraining. We evaluated PEER on 11 biomedical datasets, including clinical text and medical images. Experiments involved multiple Transformer-based backbones, including vision transformer, measured by accuracy, macro-F1, and speed-up ratio.</div></div><div><h3>Results:</h3><div>Experiments demonstrate that PEER consistently improves reliability while preserving the efficiency gains of early exiting. For instance, on the MIMIC-III dataset, PEER achieves an accuracy of 90.73% (surpassing the baseline of 89.48%) by rejecting only 2.79% of uncertain samples. Alternatively, in high-efficiency settings, it achieves 80% speed-up ratio while maintaining comparable performance. Across diverse datasets, PEER successfully abstains from uncertain cases that baseline methods misclassify, leading to more trustworthy predictions. It generalizes effectively across different model architectures, scales, and modalities, showing robustness in both language and vision tasks. Case studies further confirm that PEER aligns with clinical workflows by deferring ambiguous cases for human review.</div></div><div><h3>Conclusion:</h3><div>PEER offers a simple, architecture-agnostic framework that jointly ensures fast and trustworthy inference. Its generalizability across language and vision models highlights strong potential for deployment in clinical decision support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104988"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-20DOI: 10.1016/j.jbi.2026.104990
Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li
Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.
{"title":"Lattice-based privacy-preserving multimodal retrieval for healthcare","authors":"Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li","doi":"10.1016/j.jbi.2026.104990","DOIUrl":"10.1016/j.jbi.2026.104990","url":null,"abstract":"<div><div>Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104990"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-14DOI: 10.1016/j.jbi.2026.104985
Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng
Objective
Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.
Methods
In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.
Results
Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.
Conclusions
The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.
{"title":"DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference","authors":"Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng","doi":"10.1016/j.jbi.2026.104985","DOIUrl":"10.1016/j.jbi.2026.104985","url":null,"abstract":"<div><h3>Objective</h3><div>Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.</div></div><div><h3>Methods</h3><div>In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.</div></div><div><h3>Results</h3><div>Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.</div></div><div><h3>Conclusions</h3><div>The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104985"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-17DOI: 10.1016/j.jbi.2026.104983
Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman
Objective:
Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.
Methods:
We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.
Results:
Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.
Conclusion:
The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.
{"title":"Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track","authors":"Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman","doi":"10.1016/j.jbi.2026.104983","DOIUrl":"10.1016/j.jbi.2026.104983","url":null,"abstract":"<div><h3>Objective:</h3><div>Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.</div></div><div><h3>Methods:</h3><div>We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.</div></div><div><h3>Results:</h3><div>Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.</div></div><div><h3>Conclusion:</h3><div>The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104983"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-26DOI: 10.1016/j.jbi.2025.104971
Aurora Rofena , Claudia Lucia Piccolo , Bruno Beomonte Zobel , Paolo Soda , Valerio Guarrasi
Objective:
This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.
Methods:
The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.
Results:
CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.
Conclusion:
This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.
{"title":"Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence","authors":"Aurora Rofena , Claudia Lucia Piccolo , Bruno Beomonte Zobel , Paolo Soda , Valerio Guarrasi","doi":"10.1016/j.jbi.2025.104971","DOIUrl":"10.1016/j.jbi.2025.104971","url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.</div></div><div><h3>Methods:</h3><div>The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.</div></div><div><h3>Results:</h3><div>CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.</div></div><div><h3>Conclusion:</h3><div>This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104971"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate in silico prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.
Results
In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.
{"title":"A computational framework for predicting drug-target interactions by fusing gene ontology information with cross attention","authors":"Wenchao Cui, Pingjian Ding, Lingyun Luo, Shunheng Zhou, Hui Jiang","doi":"10.1016/j.jbi.2025.104976","DOIUrl":"10.1016/j.jbi.2025.104976","url":null,"abstract":"<div><h3>Motivation</h3><div>Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate <em>in silico</em> prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.</div></div><div><h3>Results</h3><div>In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104976"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145891182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-08DOI: 10.1016/j.jbi.2026.104982
Siun Kim , David Seung U Lee , Yujin Kim , Hyung-Jin Yoon , Howard Lee
Background
Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.
Objective
To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.
Methods
We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.
Results
Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 > 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.
Conclusion
With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.
{"title":"Beyond Fine-Tuning: Leveraging Domain-Aware In-Context learning with large language models for clinical named entity recognition","authors":"Siun Kim , David Seung U Lee , Yujin Kim , Hyung-Jin Yoon , Howard Lee","doi":"10.1016/j.jbi.2026.104982","DOIUrl":"10.1016/j.jbi.2026.104982","url":null,"abstract":"<div><h3>Background</h3><div>Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.</div></div><div><h3>Objective</h3><div>To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.</div></div><div><h3>Methods</h3><div>We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.</div></div><div><h3>Results</h3><div>Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 > 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.</div></div><div><h3>Conclusion</h3><div>With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104982"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-20DOI: 10.1016/j.jbi.2026.104989
Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy
Objective
This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.
Methods
A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.
Results
Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).
Conclusion
This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.
{"title":"A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review","authors":"Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy","doi":"10.1016/j.jbi.2026.104989","DOIUrl":"10.1016/j.jbi.2026.104989","url":null,"abstract":"<div><h3>Objective</h3><div>This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.</div></div><div><h3>Methods</h3><div>A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.</div></div><div><h3>Results</h3><div>Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).</div></div><div><h3>Conclusion</h3><div>This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104989"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}