Pub Date : 2026-03-01Epub Date: 2026-01-20DOI: 10.1016/j.jbi.2026.104990
Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li
Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.
{"title":"Lattice-based privacy-preserving multimodal retrieval for healthcare","authors":"Yingying Hou, Wenbin Yao, Xikang Zhu, Zeyu Li","doi":"10.1016/j.jbi.2026.104990","DOIUrl":"10.1016/j.jbi.2026.104990","url":null,"abstract":"<div><div>Multimodal data plays a vital role in advancing personalized diagnosis and precision medicine. However, during cross-institutional sharing and collaborative analysis, the protection of patient privacy becomes increasingly critical, particularly in terms of the secure storage and fine-grained retrieval of sensitive medical data. Existing privacy-preserving technologies fail to meet the demands of secure and efficient retrieval over multimodal medical data. To address this challenge, we propose a generic multi-user multimodal searchable encryption framework for healthcare applications, which supports cross-modal retrieval based on trapdoors generated from ciphertexts corresponding to arbitrary modalities. We further design a distributed-decryption searchable encryption scheme, which is the first to combine AudioCLIP and multi-key fully homomorphic encryption for efficient retrieval of encrypted multimodal data. Additionally, we construct an attribute-based multimodal searchable encryption scheme as a complementary solution for implementing fine-grained access control. This enables flexible and controllable management of retrieval permissions over multimodal ciphertexts. Experimental results on MedMNIST and AudioSet demonstrate that our schemes achieve high retrieval efficiency and quantum-resistant security, meeting the requirements of real-world medical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104990"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-14DOI: 10.1016/j.jbi.2026.104985
Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng
Objective
Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.
Methods
In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.
Results
Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.
Conclusions
The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.
{"title":"DMDGRN: A data augmentation-based multilayer directed graph convolutional network for gene regulatory network inference","authors":"Pi-Jing Wei , Mingzhu Sun , Zheng Ding , Rui-Fen Cao , Zhen Gao , Chun-Hou Zheng","doi":"10.1016/j.jbi.2026.104985","DOIUrl":"10.1016/j.jbi.2026.104985","url":null,"abstract":"<div><h3>Objective</h3><div>Gene regulatory networks (GRNs) provide a graphical representation of the regulatory interactions between transcription factors (TFs) and their target genes, governing transcriptional states that define cell identity and function. Deciphering GRNs is fundamental for deciphering disease pathogenesis and remains a central challenge in systems biology. Graph neural network-based methods have made significant progress in GRN inference in recent years due to their exceptional ability to model graph-structured biological data. However, the inherent characteristics of GRNs usually have been ignored, including the directionality, the sparsity and abundant high-order regulatory interactions of GRNs.</div></div><div><h3>Methods</h3><div>In this study, we propose DMDGRN, a data augmentation-based multilayer directed graph convolutional network for GRN inference. To capture the direction of GRNs, DMDGRN employs phase matrix to construct the Laplacian operator, which can track message propagation pathways. Considering the inherent sparsity of known GRNs, DMDGRN incorporates data augmentation techniques to overcome the network sparsity. Moreover, DMDGRN adopts a multilayer directed network architecture with residual connections to extract higher-order neighborhood information.</div></div><div><h3>Results</h3><div>Comprehensive evaluations on benchmark datasets demonstrate that DMDGRN significantly improves GRN inference accuracy. Notably, the application on breast cancer shows that our framework successfully identifies relevant therapeutic candidates for human breast cancer.</div></div><div><h3>Conclusions</h3><div>The findings demonstrate that the strategies we adopted are effective for inferring GRNs. The successful application to breast cancer data further highlights its potential of DMDGRN in uncovering disease-relevant regulatory mechanisms and identifying therapeutic targets, making it a promising tool for advancing both computational biology and translational medicine.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104985"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-17DOI: 10.1016/j.jbi.2026.104983
Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman
Objective:
Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.
Methods:
We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.
Results:
Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.
Conclusion:
The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.
{"title":"Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track","authors":"Brian Ondov , William Xia , Kush Attal , Ishita Unde , Jerry He , Dina Demner-Fushman","doi":"10.1016/j.jbi.2026.104983","DOIUrl":"10.1016/j.jbi.2026.104983","url":null,"abstract":"<div><h3>Objective:</h3><div>Recent advances in language models have shown potential to adapt professional-facing biomedical literature to plain language, making it accessible to patients and caregivers. However, their unpredictability and high potential for harm in this domain means rigorous evaluation is necessary. Our goals with this track were to stimulate research and to provide high-quality evaluation of the most promising systems.</div></div><div><h3>Methods:</h3><div>We hosted the Plain Language Adaptation of Biomedical Abstracts (PLABA) track at the 2023 and 2024 Text Retrieval Conferences. Tasks included complete, sentence-level rewriting of 400 abstracts related to 40 consumer questions (Task 1) as well as identifying and replacing difficult terms in 300 abstracts spanning 30 consumer questions (Task 2). For automatic evaluation of Task 1, we developed a four-fold professionally-written reference set. Submissions for both tasks were also provided extensive manual evaluation from biomedical experts.</div></div><div><h3>Results:</h3><div>Twelve teams spanning twelve countries participated, with models from multilayer perceptrons to large pretrained transformers. In manual judgments of Task 1, top-performing models rivaled human factual accuracy and completeness, but not simplicity or brevity. Automatic, reference-based metrics generally did not correlate well with manual judgments. In Task 2, systems struggled with identifying difficult terms and classifying how to replace them. When generating replacements, however, LLM-based systems did well in manually judged accuracy, completeness, and simplicity, though not in brevity.</div></div><div><h3>Conclusion:</h3><div>The PLABA track showed promise for using Large Language Models to adapt biomedical literature for the general public, while also highlighting their deficiencies and the need for improved automatic benchmarking tools.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"175 ","pages":"Article 104983"},"PeriodicalIF":4.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-26DOI: 10.1016/j.jbi.2025.104971
Aurora Rofena , Claudia Lucia Piccolo , Bruno Beomonte Zobel , Paolo Soda , Valerio Guarrasi
Objective:
This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.
Methods:
The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.
Results:
CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.
Conclusion:
This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.
{"title":"Augmented intelligence for multimodal virtual biopsy in breast cancer using generative artificial intelligence","authors":"Aurora Rofena , Claudia Lucia Piccolo , Bruno Beomonte Zobel , Paolo Soda , Valerio Guarrasi","doi":"10.1016/j.jbi.2025.104971","DOIUrl":"10.1016/j.jbi.2025.104971","url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims to propose a multimodal, multi-view deep learning approach for breast cancer virtual biopsy, a non-invasive classification of breast lesions as malignant or benign, by integrating Full-Field Digital Mammography (FFDM) and Contrast-Enhanced Spectral Mammography (CESM). The work addresses the critical challenge of missing CESM data by introducing generative artificial intelligence (AI) to synthesize CESM images when unavailable, ensuring the continuity of diagnostic workflows.</div></div><div><h3>Methods:</h3><div>The proposed method uses FFDM and CESM images in both craniocaudal (CC) and mediolateral oblique (MLO) views. When CESM is missing, a CycleGAN-based generative model produces synthetic CESM images from FFDM inputs. For classification, three convolutional neural networks (ResNet18, ResNet50, and VGG16) are employed, and a two-stage late fusion strategy integrates view-specific and modality-specific malignancy probabilities, weighted by Matthews Correlation Coefficient (MCC), into a final malignancy score. The system’s robustness under varying degrees of missing CESM data is tested by incrementally replacing real CESM inputs with synthetic ones and evaluating classification performance using AUC, G-mean, and MCC.</div></div><div><h3>Results:</h3><div>CycleGAN achieved high-fidelity CESM synthesis, with Peak-Signal-to-Noise Ratio exceeding 24 dB and Structural Similarity Index above 0.8 across both CC and MLO views. For lesion classification, the multimodal configuration combining FFDM and CESM consistently outperformed the unimodal FFDM-only setup. Notably, even when CESM was entirely replaced by synthetic images, the multimodal approach still improved virtual biopsy performance compared to FFDM alone. Although classification performance declined as the proportion of synthetic CESM increased, the use of synthetic data remained beneficial.</div></div><div><h3>Conclusion:</h3><div>This work demonstrates that generative AI can effectively address missing-modality challenges in breast cancer diagnostics by synthesizing CESM images to enhance FFDM-based virtual biopsy pipelines. In the absence of real CESM data, incorporating synthetic images improves lesion classification compared to using FFDM alone, offering a non-invasive alternative to support clinical decision-making. Moreover, by releasing the extended CESM@UCBM dataset, this study contributes a valuable resource for advancing research and innovation in breast multimodal diagnostic systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104971"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate in silico prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.
Results
In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.
{"title":"A computational framework for predicting drug-target interactions by fusing gene ontology information with cross attention","authors":"Wenchao Cui, Pingjian Ding, Lingyun Luo, Shunheng Zhou, Hui Jiang","doi":"10.1016/j.jbi.2025.104976","DOIUrl":"10.1016/j.jbi.2025.104976","url":null,"abstract":"<div><h3>Motivation</h3><div>Identifying drug–target interactions (DTIs) is a critical step in both drug discovery and drug repurposing. Accurate <em>in silico</em> prediction of DTIs can substantially reduce development time and costs. Recent advances in sequence-based methods have leveraged attention mechanisms to improve prediction accuracy. However, these approaches typically rely solely on the molecular structures of drugs and proteins, overlooking higher-level semantic information that reflects functional and biological relationships.</div></div><div><h3>Results</h3><div>In this work, we propose GODTI, a novel Gene Ontology-guided Drug-Target Interaction prediction model that enhances the performance through multimodal feature integration. GODTI comprises three major components: a feature extraction module, a multimodal fusion module, and an intermolecular interaction modeling module. In the protein feature extractor, both functional descriptors derived from Gene Ontology and sequence-based embeddings from amino acid sequences are obtained and combined. These protein representations are then integrated with drug molecular features via the multimodal fusion module and subsequently processed by the interaction modeling module to predict potential interactions. We evaluated GODTI under four realistic experimental settings, demonstrating consistent improvements over state-of-the-art baselines. Furthermore, case studies validated the practical utility of GODTI in accurately identifying novel, low-cost DTIs, underscoring its potential to accelerate drug discovery workflows.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104976"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145891182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-08DOI: 10.1016/j.jbi.2026.104982
Siun Kim , David Seung U Lee , Yujin Kim , Hyung-Jin Yoon , Howard Lee
Background
Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.
Objective
To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.
Methods
We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.
Results
Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 > 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.
Conclusion
With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.
{"title":"Beyond Fine-Tuning: Leveraging Domain-Aware In-Context learning with large language models for clinical named entity recognition","authors":"Siun Kim , David Seung U Lee , Yujin Kim , Hyung-Jin Yoon , Howard Lee","doi":"10.1016/j.jbi.2026.104982","DOIUrl":"10.1016/j.jbi.2026.104982","url":null,"abstract":"<div><h3>Background</h3><div>Clinical named entity recognition (NER) is essential for structuring clinical narratives. While large language model (LLM)-based in-context learning (ICL) enables parameter-free adaptation, encoder-based fine-tuning has generally achieved superior performance in practical biomedical NER settings.</div></div><div><h3>Objective</h3><div>To systematically compare ICL and encoder-based fine-tuning for clinical NER under realistic constraints, and to determine whether optimizing ICL demonstration selection can close the performance gap.</div></div><div><h3>Methods</h3><div>We manually annotated 2,113 clinical notes from hematologic malignancy patients at Seoul National University Hospital and 400 MIMIC-IV notes. ICL configurations were optimized across task instructions, output formats, demonstration selection methods, sorting strategies, and pool sizes, using LLaMA-3.3-70B (open-source) via Ollama. Encoder fine-tuning was performed on both domain-specific and general-domain models, with RoBERTa-large representing the best encoder baseline. All models were evaluated as token-level classification tasks using macro and weighted F1, across in-domain, cross-domain, and cross-institutional scenarios.</div></div><div><h3>Results</h3><div>Demonstration selection played a major role in determining to ICL performance, improving macro F1 by up to 9.4 points over random selection under our experimental settings. In moderate-resource settings (500-sample pool), ICL exceeded RoBERTa-large fine-tuning by 4.7 macro F1 points and remained competitive up to 900 samples. Both ICL and fine-tuning experienced performance degradation in cross-domain evaluations, yet ICL demonstrated superior data efficiency, achieving competitive accuracy with substantially fewer labeled examples. ICL achieved in-domain macro F1 > 0.8 in several domains, outperforming full-data fine-tuned encoders, and delivered 6.3- to 11.6-point gains in cross-institutional transfer without parameter updates. At the largest pool size (∼1,900 samples), encoder-based fine-tuning regained the lead.</div></div><div><h3>Conclusion</h3><div>With optimized domain-aware demonstration selection, open-source LLM-based ICL can match or surpass encoder fine-tuning for clinical NER. Its ease of adaptation and ability to update knowledge via demonstration pools—without retraining—enable continuous improvement in dynamic, resource-limited healthcare settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104982"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-20DOI: 10.1016/j.jbi.2026.104989
Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy
Objective
This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.
Methods
A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.
Results
Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).
Conclusion
This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.
{"title":"A multidimensional hierarchical framework for sources of bias in real-world healthcare evidence: a scoping review","authors":"Haeun Lee , Christelle Xiong , Derek Baughman , Chen Dun , Jiayi Tong , Benjamin Martin , Harold Lehmann , Paul Nagy","doi":"10.1016/j.jbi.2026.104989","DOIUrl":"10.1016/j.jbi.2026.104989","url":null,"abstract":"<div><h3>Objective</h3><div>This study identifies and categorizes bias sources throughout the real-world evidence (RWE) generation process from electronic health records (EHRs), and we develop a multi-dimensional conceptual framework to characterize how bias arises in large-scale multinational federated network studies.</div></div><div><h3>Methods</h3><div>A three-phase bias framework spanning healthcare delivery, data management, and research was developed through the synthesis of existing frameworks, a structured literature review, and iterative assessment by multidisciplinary expert panels. A scoping review was conducted following PRISMA-ScR guidelines, analyzing studies between 2016 and 2025 in PubMed and Web of Science and focusing on bias in observational studies using real-world data. Bias sources were classified using directed content analysis based on their occurrence stage in the RWE generation process.</div></div><div><h3>Results</h3><div>Analysis of 220 papers within this framework identified 209 distinct bias sources categorized into seven specific levels: Access to medical care (n = 40), provision of care (n = 29), data acquisition and measurement (n = 39), clinical documentation and coding practices (n = 32), data extraction (n = 22), data modeling (n = 11), and data analytics (n = 36). Healthcare phase biases were most prevalent (n = 108), followed by data management (n = 54) and research levels (n = 47).</div></div><div><h3>Conclusion</h3><div>This multi-dimensional framework reveals that bias sources in RWE generation are interconnected across patient, provider, administrative, information technology, informatics, and analytical domains, and provides a structural foundation for understanding where and how bias may arise across the RWE process in large-scale observational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104989"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-16DOI: 10.1016/j.jbi.2026.104987
Xenia Konti , Nicoleta J. Economou-Zavlanos , Yi Shen , Giorgos Stamou , Armando Bedoya , Michael J. Pencina , Chuan Hong , Michael M. Zavlanos
Objective
In this paper, we propose an adaptive federated learning framework to learn optimal treatments for individual hospitals that possibly serve different patient populations. The proposed framework can enable the design of more efficient treatment allocation problems.
Methods
We propose a federated treatment recommendation strategy that for each hospital is formulated as a Multi-Armed Bandit (MAB) problem. The process is coordinated by a lead hospital that adaptively learns and transfers Upper Confidence Bounds (UCB) across similar hospitals and Personalized Upper Bounds across heterogeneous hospitals. We test our proposed method on a simulated clinical trial environment created using real Covid-19 data from the Duke University Health System.
Results
Our method relies on collaboration among hospitals, which allows for fewer data samples needed per institution, while protecting the privacy of the individual patient data. At the same time, it ensures fairness of the learned treatments by mitigating possible biases due to differences in the patient populations treated across different hospitals. Finally, our method improves the safety of the learning procedure by reducing the number of patients administered with sub-optimal treatments at each hospital. In the experiments, we show that our proposed method outperforms other state of the art approaches in that it requires up to 36%–75% fewer patient data to learn the optimal treatment for each hospital and administers the optimal treatment to 0.95%-48.6% more patients.
Conclusion
In this paper, we propose an adaptive federated learning strategy for treatment recommendation tasks, that learns optimal treatments for individual hospitals that possibly serve different patient populations, while satisfying privacy, fairness, and safety considerations.
{"title":"A federated learning framework for ethical dynamic treatment allocation across heterogeneous hospitals","authors":"Xenia Konti , Nicoleta J. Economou-Zavlanos , Yi Shen , Giorgos Stamou , Armando Bedoya , Michael J. Pencina , Chuan Hong , Michael M. Zavlanos","doi":"10.1016/j.jbi.2026.104987","DOIUrl":"10.1016/j.jbi.2026.104987","url":null,"abstract":"<div><h3>Objective</h3><div>In this paper, we propose an adaptive federated learning framework to learn optimal treatments for individual hospitals that possibly serve different patient populations. The proposed framework can enable the design of more efficient treatment allocation problems.</div></div><div><h3>Methods</h3><div>We propose a federated treatment recommendation strategy that for each hospital is formulated as a Multi-Armed Bandit (MAB) problem. The process is coordinated by a lead hospital that adaptively learns and transfers Upper Confidence Bounds (UCB) across similar hospitals and Personalized Upper Bounds across heterogeneous hospitals. We test our proposed method on a simulated clinical trial environment created using real Covid-19 data from the Duke University Health System.</div></div><div><h3>Results</h3><div>Our method relies on collaboration among hospitals, which allows for fewer data samples needed per institution, while protecting the privacy of the individual patient data. At the same time, it ensures fairness of the learned treatments by mitigating possible biases due to differences in the patient populations treated across different hospitals. Finally, our method improves the safety of the learning procedure by reducing the number of patients administered with sub-optimal treatments at each hospital. In the experiments, we show that our proposed method outperforms other state of the art approaches in that it requires up to 36%–75% fewer patient data to learn the optimal treatment for each hospital and administers the optimal treatment to 0.95%-48.6% more patients.</div></div><div><h3>Conclusion</h3><div>In this paper, we propose an adaptive federated learning strategy for treatment recommendation tasks, that learns optimal treatments for individual hospitals that possibly serve different patient populations, while satisfying privacy, fairness, and safety considerations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104987"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-08DOI: 10.1016/j.jbi.2026.104980
Jack LeBien , Julian Velev , Abiel Roche-Lima
Background
Indirect methods for estimating clinical reference intervals (RIs) use statistical analysis to identify non-pathological sub-distributions within large datasets acquired from routine clinical testing. This approach has the potential to accelerate the estimation of precise RIs, accounting for influential variables such as age, gender, and ethnicity. Most existing methods are based on traditional statistics and hand-crafted algorithms. The investigation of supervised learning, which often outperforms traditional approaches, has been impeded by the limitations of real-world data. However, previous studies have widely used synthetic data for evaluating and benchmarking indirect methods due several advantages over real-world data, including greater control, variability, accessibility, and the availability of exact ground-truth RIs. Synthetic data may also provide a pathway for developing data-driven solutions for indirect RI estimation.
Methods
In this study, we leveraged synthetic data to train two convolutional neural networks (CNNs) to predict the parameters of underlying reference distributions (RDs) in diverse real-world clinical datasets. While one model was trained for standard univariate data, the other was extended to bivariate data, enabling the prediction of covariance between clinical analytes. Trained models were evaluated using both real-world and synthetic test datasets and compared with four alternative algorithms.
Results
Model predictions closely matched directly estimated RIs and RDs in real-world data and known RDs in synthetic data, outperforming four alternative indirect methods: GMM, refineR, reflimR, and RINetv1. Using labeled healthy and HCV-positive groups in real data, we compared established univariate RIs with predicted multivariate reference regions (MRRs). On average, the MRRs showed 1) higher coverage of healthy patients (closer to the desired 95%) and 2) smaller regions, which reduce the likelihood of including abnormal values.
Conclusions
Synthetic data training is a viable approach for developing accurate indirect RI estimation models for both univariate and bivariate clinical data. This strategy could help address some limitations of real-world data, direct analyses, and univariate RIs.
{"title":"RINet: synthetic data training for indirect estimation of clinical reference distributions","authors":"Jack LeBien , Julian Velev , Abiel Roche-Lima","doi":"10.1016/j.jbi.2026.104980","DOIUrl":"10.1016/j.jbi.2026.104980","url":null,"abstract":"<div><h3>Background</h3><div>Indirect methods for estimating clinical reference intervals (RIs) use statistical analysis to identify non-pathological sub-distributions within large datasets acquired from routine clinical testing. This approach has the potential to accelerate the estimation of precise RIs, accounting for influential variables such as age, gender, and ethnicity. Most existing methods are based on traditional statistics and hand-crafted algorithms. The investigation of supervised learning, which often outperforms traditional approaches, has been impeded by the limitations of real-world data. However, previous studies have widely used synthetic data for evaluating and benchmarking indirect methods due several advantages over real-world data, including greater control, variability, accessibility, and the availability of exact ground-truth RIs. Synthetic data may also provide a pathway for developing data-driven solutions for indirect RI estimation.</div></div><div><h3>Methods</h3><div>In this study, we leveraged synthetic data to train two convolutional neural networks (CNNs) to predict the parameters of underlying reference distributions (RDs) in diverse real-world clinical datasets. While one model was trained for standard univariate data, the other was extended to bivariate data, enabling the prediction of covariance between clinical analytes. Trained models were evaluated using both real-world and synthetic test datasets and compared with four alternative algorithms.</div></div><div><h3>Results</h3><div>Model predictions closely matched directly estimated RIs and RDs in real-world data and known RDs in synthetic data, outperforming four alternative indirect methods: GMM, <em>refineR</em>, <em>reflimR</em>, and RINet<sub>v1</sub>. Using labeled healthy and HCV-positive groups in real data, we compared established univariate RIs with predicted multivariate reference regions (MRRs). On average, the MRRs showed 1) higher coverage of healthy patients (closer to the desired 95%) and 2) smaller regions, which reduce the likelihood of including abnormal values.</div></div><div><h3>Conclusions</h3><div>Synthetic data training is a viable approach for developing accurate indirect RI estimation models for both univariate and bivariate clinical data. This strategy could help address some limitations of real-world data, direct analyses, and univariate RIs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104980"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-12DOI: 10.1016/j.jbi.2026.104981
Fangyu Zhou, Shahadat Uddin
Objective:
We aim to improve Drug–Drug Interactions (DDIs) by explicitly injecting medicinal-chemistry knowledge of functional groups (FGs) into graph neural network (GNN) message passing, in both transductive and inductive settings. Our goal is to (i) encode FG priors in a trainable way that enhances representation quality without handcrafting features, and (ii) yield interpretable attributions that align learned weights with pharmacologically meaningful FG patterns.
Methods:
We introduce FG-DDI, a dual-view GNN that augments both intra- and inter-molecular reasoning. At the intra-molecular level, atom/bond messages are scaled by FG enrichment weights derived from detected FG motifs within each drug graph. At the inter-molecular level, a bipartite message-passing layer between a drug pair is modulated by FG–FG enrichment scores that reflect empirical co-occurrence in known DDIs. Enrichment is computed as odds ratios from corpus statistics and injected via learnable gates, ensuring differentiability and allowing data to override noisy priors. We couple this with standard supervision on interaction labels and report accuracy (ACC), AUROC, average precision (AP), and F1. Experiments use DrugBank (1706 drugs; 86 interaction types) and TwoSides (filtered triplets) under transductive and inductive splits (one unseen; both unseen). We perform ablations removing each FG term to isolate contributions and assess stability across splits.
Results:
Comprehensive experiments on DrugBank and TwoSides datasets demonstrate that FG-DDI achieves superior performance compared to state-of-the-art methods. For DrugBank, the accuracy improves by 0.36% in transductive settings and by 0.46% and 1.42% in inductive settings, respectively for S1 and S2 partitioning.
Conclusion:
By systematically integrating chemical domain knowledge into deep learning architectures, this approach enables better generalization to unseen drug combinations while maintaining computational efficiency, making it particularly valuable for real-world pharmaceutical applications where new drugs continuously enter the market.
{"title":"FG-DDI: Functional group-aware graph neural networks for drug–drug interaction prediction","authors":"Fangyu Zhou, Shahadat Uddin","doi":"10.1016/j.jbi.2026.104981","DOIUrl":"10.1016/j.jbi.2026.104981","url":null,"abstract":"<div><h3>Objective:</h3><div>We aim to improve Drug–Drug Interactions (DDIs) by explicitly injecting medicinal-chemistry knowledge of functional groups (FGs) into graph neural network (GNN) message passing, in both transductive and inductive settings. Our goal is to (i) encode FG priors in a trainable way that enhances representation quality without handcrafting features, and (ii) yield interpretable attributions that align learned weights with pharmacologically meaningful FG patterns.</div></div><div><h3>Methods:</h3><div>We introduce <em>FG-DDI</em>, a dual-view GNN that augments both intra- and inter-molecular reasoning. At the <em>intra</em>-molecular level, atom/bond messages are scaled by FG enrichment weights derived from detected FG motifs within each drug graph. At the <em>inter</em>-molecular level, a bipartite message-passing layer between a drug pair is modulated by FG–FG enrichment scores that reflect empirical co-occurrence in known DDIs. Enrichment is computed as odds ratios from corpus statistics and injected via learnable gates, ensuring differentiability and allowing data to override noisy priors. We couple this with standard supervision on interaction labels and report accuracy (ACC), AUROC, average precision (AP), and F1. Experiments use DrugBank (1706 drugs; 86 interaction types) and TwoSides (filtered triplets) under transductive and inductive splits (one unseen; both unseen). We perform ablations removing each FG term to isolate contributions and assess stability across splits.</div></div><div><h3>Results:</h3><div>Comprehensive experiments on DrugBank and TwoSides datasets demonstrate that FG-DDI achieves superior performance compared to state-of-the-art methods. For DrugBank, the accuracy improves by 0.36% in transductive settings and by 0.46% and 1.42% in inductive settings, respectively for S1 and S2 partitioning.</div></div><div><h3>Conclusion:</h3><div>By systematically integrating chemical domain knowledge into deep learning architectures, this approach enables better generalization to unseen drug combinations while maintaining computational efficiency, making it particularly valuable for real-world pharmaceutical applications where new drugs continuously enter the market.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"174 ","pages":"Article 104981"},"PeriodicalIF":4.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}