Generative drug discovery is hampered by challenges in data privacy and the immense computational cost of SOTA models. To surmount these barriers, we developed Brain-Inspired Federated Diffusion with Reinforcement (BiFDR), a privacy-preserving and resource-efficient framework.
Methods:
BiFDR integrates three synergistic modules. A Neuro-inspired Federated Coordinator (NeuroFed) orchestrates secure collaboration via synaptic plasticity-inspired principles, combining server-side pruning with client-side Low-Rank Adaptation (LoRA) and sparse asynchronous updates. A Transformer-based diffusion generator (TransFuse) efficiently creates chemically valid molecules in a compressed latent space using attention mechanisms. Finally, a reinforcement learning agent (T-JORM) steers the generative process towards novel 2D and 3D molecular structures, guided by a multi-faceted, Tanimoto-based reward function.
Results:
Benchmarked against baseline models, BiFDR improving the Quantitative Estimate of Drug-likeness by 13.7%, the Molecular-level Structural Information Score by 5.7%, and the Molecular Interaction Analysis Index by 52.3%. The framework also enhanced synthetic feasibility, reflected by a 9.5% reduction in the Synthetic Accessibility Score. Critically, BiFDR substantially strengthened data privacy, achieving a 43.6% reduction in the mutual information metric.
Conclusion:
BiFDR establishes an effective and efficient paradigm for generative drug discovery. It consistently produces molecules with superior drug-likeness, structural novelty, and interaction potential. By ensuring synthetic accessibility while rigorously preserving privacy and minimizing computational overhead, BiFDR presents a viable and scalable solution for modern, collaborative drug development pipelines.
{"title":"BiFDR: Brain-Inspired Federated Diffusion Transformer with Reinforcement for privacy-preserving molecular generation","authors":"Hongming Hou , Jing Zhang , Meirun Zhang , Xiucai Ye","doi":"10.1016/j.jbi.2025.104910","DOIUrl":"10.1016/j.jbi.2025.104910","url":null,"abstract":"<div><h3>Objective:</h3><div>Generative drug discovery is hampered by challenges in data privacy and the immense computational cost of SOTA models. To surmount these barriers, we developed Brain-Inspired Federated Diffusion with Reinforcement (BiFDR), a privacy-preserving and resource-efficient framework.</div></div><div><h3>Methods:</h3><div>BiFDR integrates three synergistic modules. A Neuro-inspired Federated Coordinator (NeuroFed) orchestrates secure collaboration via synaptic plasticity-inspired principles, combining server-side pruning with client-side Low-Rank Adaptation (LoRA) and sparse asynchronous updates. A Transformer-based diffusion generator (TransFuse) efficiently creates chemically valid molecules in a compressed latent space using attention mechanisms. Finally, a reinforcement learning agent (T-JORM) steers the generative process towards novel 2D and 3D molecular structures, guided by a multi-faceted, Tanimoto-based reward function.</div></div><div><h3>Results:</h3><div>Benchmarked against baseline models, BiFDR improving the Quantitative Estimate of Drug-likeness by 13.7%, the Molecular-level Structural Information Score by 5.7%, and the Molecular Interaction Analysis Index by 52.3%. The framework also enhanced synthetic feasibility, reflected by a 9.5% reduction in the Synthetic Accessibility Score. Critically, BiFDR substantially strengthened data privacy, achieving a 43.6% reduction in the mutual information metric.</div></div><div><h3>Conclusion:</h3><div>BiFDR establishes an effective and efficient paradigm for generative drug discovery. It consistently produces molecules with superior drug-likeness, structural novelty, and interaction potential. By ensuring synthetic accessibility while rigorously preserving privacy and minimizing computational overhead, BiFDR presents a viable and scalable solution for modern, collaborative drug development pipelines.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104910"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-29DOI: 10.1016/j.jbi.2025.104901
Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson
Objective:
The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.
Methods:
We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.
Results:
In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.
Conclusion:
The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.
{"title":"MedVidDeID: Protecting privacy in clinical encounter video recordings","authors":"Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson","doi":"10.1016/j.jbi.2025.104901","DOIUrl":"10.1016/j.jbi.2025.104901","url":null,"abstract":"<div><h3>Objective:</h3><div>The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.</div></div><div><h3>Methods:</h3><div>We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.</div></div><div><h3>Results:</h3><div>In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.</div></div><div><h3>Conclusion:</h3><div>The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104901"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-26DOI: 10.1016/j.jbi.2025.104902
Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho
Objective
This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.
Methods
A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.
Results
The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.
Conclusion
While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.
目的:本综述旨在对用于健康预测的机器学习(ML)应用中识别和纠正数据集移位的方法和技术的文献进行全面概述。方法系统检索PubMed、IEEE explore、Scopus和Web of Science,检索2019年1月1日至2025年3月15日之间发表的论文。搜索与机器学习、医疗保健和数据集转移相关的组合术语的字符串。共纳入了32项研究,并根据所处理的数据集移位类型、使用的检测和校正策略、算法选择以及对模型性能的影响进行了评估。结果该综述确定了广泛的数据集移位类型,其中时间移位和概念漂移是最常见的。基于模型的监测和统计测试是最常见的检测策略,而再训练和特征工程是主要的校正方法。大多数方法表现出适度的可解释性、计算可行性和通用性。然而,缺乏标准化的性能指标和外部验证限制了研究结果的可比性。虽然已经提出了几种有前途的方法来管理与健康相关的机器学习模型中的数据集转移,但没有一种方法可以在用例中广泛推广。这些技术在实际临床工作流程中的应用仍然有限。未来的研究应优先考虑前瞻性评估,亚组特定分析(例如,按种族,年龄或地理区域),并整合到临床决策支持系统中,以确保在医疗保健环境中稳健和公平的机器学习部署。提供了一个结构化的汇总表和概念管道图,以支持实际采用。
{"title":"Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review","authors":"Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho","doi":"10.1016/j.jbi.2025.104902","DOIUrl":"10.1016/j.jbi.2025.104902","url":null,"abstract":"<div><h3>Objective</h3><div>This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.</div></div><div><h3>Methods</h3><div>A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.</div></div><div><h3>Results</h3><div>The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.</div></div><div><h3>Conclusion</h3><div>While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104902"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-28DOI: 10.1016/j.jbi.2025.104893
Yunfei Zhang, Wei Liao
The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.
{"title":"Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering","authors":"Yunfei Zhang, Wei Liao","doi":"10.1016/j.jbi.2025.104893","DOIUrl":"10.1016/j.jbi.2025.104893","url":null,"abstract":"<div><div>The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104893"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-15DOI: 10.1016/j.jbi.2025.104906
Georgi Grazhdanski , Vasil Vasilev , Sylvia Vassileva , Dimitar Taskov , Izabel Antova , Ivan Koychev , Svetla Boytcheva
Background and Objective:
Synthetic clinical texts can improve transparency and reduce bias and costs when training and evaluating specialized language models in the medical domain. Synthetic texts are freely shareable, as they contain no real patient information, and can be customized for a specific task. The objective of this study is to develop a methodology for generating, validating, and correcting synthetic discharge summaries using LLMs without requiring any real patient data.
Methods:
The proposed approach uses an LLM to generate synthetic discharge summaries for specific diseases and standard medical references from Merck Manuals to ground the generation in internationally accepted medical practices. We validate the generated summaries using LLMs as well as by human expert validation. In addition, we propose a method for automatic correction of the generated discharge summaries using Knowledge Graphs to ensure medical factual correctness.
Results:
The conducted human expert evaluation shows that the generated synthetic discharge summaries are credible and factually accurate when provided with the medical reference context. The generated summaries achieve a System Usability Score of 94.35% based on a comprehensive rubric evaluated by medical professionals and a score of 93.65% on the Faithfulness metric evaluated by an LLM.
Conclusions:
The proposed methodology can be utilized to generate high-quality synthetic discharge summaries for various diseases. The generated synthetic corpus consists of 900 discharge summaries in English representing nine socially significant diseases and is publicly available under an open license. The community can take advantage of the corpus and proposed methodology to train complex machine learning models, helping medical professionals in their daily work without using real patient data.
{"title":"SynthMedic: Utilizing large language models for synthetic discharge summary generation, correction and validation","authors":"Georgi Grazhdanski , Vasil Vasilev , Sylvia Vassileva , Dimitar Taskov , Izabel Antova , Ivan Koychev , Svetla Boytcheva","doi":"10.1016/j.jbi.2025.104906","DOIUrl":"10.1016/j.jbi.2025.104906","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Synthetic clinical texts can improve transparency and reduce bias and costs when training and evaluating specialized language models in the medical domain. Synthetic texts are freely shareable, as they contain no real patient information, and can be customized for a specific task. The objective of this study is to develop a methodology for generating, validating, and correcting synthetic discharge summaries using LLMs without requiring any real patient data.</div></div><div><h3>Methods:</h3><div>The proposed approach uses an LLM to generate synthetic discharge summaries for specific diseases and standard medical references from Merck Manuals to ground the generation in internationally accepted medical practices. We validate the generated summaries using LLMs as well as by human expert validation. In addition, we propose a method for automatic correction of the generated discharge summaries using Knowledge Graphs to ensure medical factual correctness.</div></div><div><h3>Results:</h3><div>The conducted human expert evaluation shows that the generated synthetic discharge summaries are credible and factually accurate when provided with the medical reference context. The generated summaries achieve a System Usability Score of 94.35% based on a comprehensive rubric evaluated by medical professionals and a score of 93.65% on the Faithfulness metric evaluated by an LLM.</div></div><div><h3>Conclusions:</h3><div>The proposed methodology can be utilized to generate high-quality synthetic discharge summaries for various diseases. The generated synthetic corpus consists of 900 discharge summaries in English representing nine socially significant diseases and is publicly available under an open license. The community can take advantage of the corpus and proposed methodology to train complex machine learning models, helping medical professionals in their daily work without using real patient data.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104906"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145080834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-02DOI: 10.1016/j.jbi.2025.104904
Sarah C. Lotspeich , Sheetal Kedar , Rabeya Tahir , Aidan D. Keleghan , Amelia Miranda , Stephany N. Duda , Michael P. Bancks , Brian J. Wells , Ashish K. Khanna , Joseph Rigdon
Objective:
The allostatic load index (ALI) is a 10-component composite measure of whole-person health, which reflects the multiple interrelated physiological regulatory systems that underlie healthy functioning. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients’ data can be validated, and most protocols do not recover missing data.
Methods:
Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine ways to select the most informative patients for validation. Incorporating clinical expertise, we devise a novel validation protocol to promote EHR data quality and completeness.
Results:
Chart reviews uncovered few errors (99% matched source documents) and recovered some missing data through auxiliary information in patients’ charts. On average, validation increased the number of non-missing ALI components per patient from 6 to 7. Through simulations based on preliminary data, residual sampling was identified as the most informative strategy for completing our validation study. Incorporating validation data, statistical models indicated that worse whole-person health (higher ALI) was associated with higher odds of engaging in the healthcare system, adjusting for age.
Conclusion:
Targeted validation with an enriched protocol can ensure the quality and promote the completeness of EHR data. Findings from our validation study were incorporated into analyses as we operationalize the ALI as a scalable whole-person health measure that predicts healthcare utilization in the learning health system.
{"title":"Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records","authors":"Sarah C. Lotspeich , Sheetal Kedar , Rabeya Tahir , Aidan D. Keleghan , Amelia Miranda , Stephany N. Duda , Michael P. Bancks , Brian J. Wells , Ashish K. Khanna , Joseph Rigdon","doi":"10.1016/j.jbi.2025.104904","DOIUrl":"10.1016/j.jbi.2025.104904","url":null,"abstract":"<div><h3>Objective:</h3><div>The allostatic load index (ALI) is a 10-component composite measure of whole-person health, which reflects the multiple interrelated physiological regulatory systems that underlie healthy functioning. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients’ data can be validated, and most protocols do not recover missing data.</div></div><div><h3>Methods:</h3><div>Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine ways to select the most informative patients for validation. Incorporating clinical expertise, we devise a novel validation protocol to promote EHR data quality and completeness.</div></div><div><h3>Results:</h3><div>Chart reviews uncovered few errors (99% matched source documents) and recovered some missing data through auxiliary information in patients’ charts. On average, validation increased the number of non-missing ALI components per patient from 6 to 7. Through simulations based on preliminary data, residual sampling was identified as the most informative strategy for completing our validation study. Incorporating validation data, statistical models indicated that worse whole-person health (higher ALI) was associated with higher odds of engaging in the healthcare system, adjusting for age.</div></div><div><h3>Conclusion:</h3><div>Targeted validation with an enriched protocol can ensure the quality and promote the completeness of EHR data. Findings from our validation study were incorporated into analyses as we operationalize the ALI as a scalable whole-person health measure that predicts healthcare utilization in the learning health system.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104904"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145000662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-21DOI: 10.1016/j.jbi.2025.104915
Yiqing Luo , Lin Liu , Yaxin Fu , Yi Deng , Lin Tang
Objective
Predicting transcriptional responses to external perturbations at the single-cell level is essential for understanding gene regulatory networks, drug discovery, and personalized interventions. The exponential increase in perturbation conditions creates data sparsity, making it difficult to capture dynamic responses and necessitating computational modeling.
Methods
We present Direction-Constrained Diffusion Schrödinger Bridge (DC-DSB), a generative framework that learns probabilistic trajectories between unperturbed and post-perturbation distributions by minimizing path-space KL divergence. To enhance conditional control, DC-DSB integrates hierarchical representations derived from experimental variables and biological prior knowledge. We further introduce a direction-constrained conditioning strategy that injects condition signals along the biologically relevant perturbation trajectory, thereby improving modeling quality and training stability.
Results
DC-DSB improves expression prediction accuracy and generalization to unseen combinations over baselines. By modeling dynamic expression trajectories and co-expression structures under perturbation, DC-DSB enables the discovery of synergistic and antagonistic gene interactions and supports the progressive reconstruction of regulatory pathways.
Conclusion
DC-DSB provides a biologically consistent and generalizable framework for single-cell perturbation modeling. Its trajectory-based and condition-aware architecture overcomes the limitations of static mappings and facilitates downstream analyses in gene regulation and drug discovery.
{"title":"Prediction of Single-Cell perturbation response based on Direction-Constrained diffusion Schrödinger Bridge","authors":"Yiqing Luo , Lin Liu , Yaxin Fu , Yi Deng , Lin Tang","doi":"10.1016/j.jbi.2025.104915","DOIUrl":"10.1016/j.jbi.2025.104915","url":null,"abstract":"<div><h3>Objective</h3><div>Predicting transcriptional responses to external perturbations at the single-cell level is essential for understanding gene regulatory networks, drug discovery, and personalized interventions. The exponential increase in perturbation conditions creates data sparsity, making it difficult to capture dynamic responses and necessitating computational modeling.</div></div><div><h3>Methods</h3><div>We present Direction-Constrained Diffusion Schrödinger Bridge (DC-DSB), a generative framework that learns probabilistic trajectories between unperturbed and post-perturbation distributions by minimizing path-space KL divergence. To enhance conditional control, DC-DSB integrates hierarchical representations derived from experimental variables and biological prior knowledge. We further introduce a direction-constrained conditioning strategy that injects condition signals along the biologically relevant perturbation trajectory, thereby improving modeling quality and training stability.</div></div><div><h3>Results</h3><div>DC-DSB improves expression prediction accuracy and generalization to unseen combinations over baselines. By modeling dynamic expression trajectories and co-expression structures under perturbation, DC-DSB enables the discovery of synergistic and antagonistic gene interactions and supports the progressive reconstruction of regulatory pathways.</div></div><div><h3>Conclusion</h3><div>DC-DSB provides a biologically consistent and generalizable framework for single-cell perturbation modeling. Its trajectory-based and condition-aware architecture overcomes the limitations of static mappings and facilitates downstream analyses in gene regulation and drug discovery.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104915"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-21DOI: 10.1016/j.jbi.2025.104896
Hui Liu , Ziyi Chen , Peilin Li , Yuan-Zhi Liu , Xiangtao Liu , Ronald X. Xu , Mingzhai Sun
Objective:
Large language models (LLMs) have exhibited remarkable efficacy in natural language processing (NLP) tasks, with fine-tuning for Biomedical Named Entity Recognition (BioNER) receiving significant research attention. However, the substantial computational demands associated with fine-tuning large-scale models constrain their development and deployment. Consequently, this study investigates parameter-efficient fine-tuning (PEFT) techniques to optimize LLMs for BioNER under limited computational resources. By leveraging these methods, competitive model performance is maintained while preserving in-domain generalization capability.
Methods:
In this study, we employed the PEFT method QLoRA to fine-tune the open-source Llama3.1 model, developing the NERLlama3.1 model specifically designed for the BioNER task. First, an LLM instruction tuning dataset was created using BioNER datasets such as NCBI-disease, BC5CDR-chem, and BC2GM-gene. Next, the Llama3.1-8B model was fine-tuned using the QLoRA method on a single 16GB memory GPU. Furthermore, during the inference phase, we introduced a prompt engineering technique called self-consistency NER prompting (SCNP). This approach leverages the diversity of outputs generated by LLMs to significantly enhance NER performance. Finally, we also developed a multi-task BioNER-capable model, NERLlama3.1-MT, to investigate the capability of fine-tuned LLMs in addressing multi-task BioNER scenarios.
Results:
The NERLlama3.1 model achieved F1-scores of 0.8977, 0.9402, and 0.8530 on the NCBI-disease, BC5CDR-chemical, and BG2GM-gene datasets, respectively. Furthermore, when evaluated on previously unseen datasets, it attained F1-scores of 0.6867 on BC5CDR-disease, 0.6800 on NLM-chemical, and 0.8378 on NLM-gene. These results demonstrate that NERLlama3.1 not only outperforms fully fine-tuned LLMs but also exhibits superior in-domain generalization capabilities when compared to the BERT-base model. Additionally, this work represents the first exploration of fine-tuning LLMs for multi-task BioNER.
Conclusion:
NERLlama3.1 outperformed LLMs fine-tuned with full parameter updates, despite requiring significantly fewer computational resources. Moreover, it exhibited substantially superior in-domain generalization capabilities compared to traditional pre-trained language models. Its low resource demands, high performance, and strong generalization enhance its applicability and utility across diverse clinical BioNER tasks.
{"title":"Resource-efficient instruction tuning of large language models for biomedical named entity recognition","authors":"Hui Liu , Ziyi Chen , Peilin Li , Yuan-Zhi Liu , Xiangtao Liu , Ronald X. Xu , Mingzhai Sun","doi":"10.1016/j.jbi.2025.104896","DOIUrl":"10.1016/j.jbi.2025.104896","url":null,"abstract":"<div><h3>Objective:</h3><div>Large language models (LLMs) have exhibited remarkable efficacy in natural language processing (NLP) tasks, with fine-tuning for Biomedical Named Entity Recognition (BioNER) receiving significant research attention. However, the substantial computational demands associated with fine-tuning large-scale models constrain their development and deployment. Consequently, this study investigates parameter-efficient fine-tuning (PEFT) techniques to optimize LLMs for BioNER under limited computational resources. By leveraging these methods, competitive model performance is maintained while preserving in-domain generalization capability.</div></div><div><h3>Methods:</h3><div>In this study, we employed the PEFT method QLoRA to fine-tune the open-source Llama3.1 model, developing the NERLlama3.1 model specifically designed for the BioNER task. First, an LLM instruction tuning dataset was created using BioNER datasets such as NCBI-disease, BC5CDR-chem, and BC2GM-gene. Next, the Llama3.1-8B model was fine-tuned using the QLoRA method on a single 16GB memory GPU. Furthermore, during the inference phase, we introduced a prompt engineering technique called self-consistency NER prompting (SCNP). This approach leverages the diversity of outputs generated by LLMs to significantly enhance NER performance. Finally, we also developed a multi-task BioNER-capable model, NERLlama3.1-MT, to investigate the capability of fine-tuned LLMs in addressing multi-task BioNER scenarios.</div></div><div><h3>Results:</h3><div>The NERLlama3.1 model achieved F1-scores of 0.8977, 0.9402, and 0.8530 on the NCBI-disease, BC5CDR-chemical, and BG2GM-gene datasets, respectively. Furthermore, when evaluated on previously unseen datasets, it attained F1-scores of 0.6867 on BC5CDR-disease, 0.6800 on NLM-chemical, and 0.8378 on NLM-gene. These results demonstrate that NERLlama3.1 not only outperforms fully fine-tuned LLMs but also exhibits superior in-domain generalization capabilities when compared to the BERT-base model. Additionally, this work represents the first exploration of fine-tuning LLMs for multi-task BioNER.</div></div><div><h3>Conclusion:</h3><div>NERLlama3.1 outperformed LLMs fine-tuned with full parameter updates, despite requiring significantly fewer computational resources. Moreover, it exhibited substantially superior in-domain generalization capabilities compared to traditional pre-trained language models. Its low resource demands, high performance, and strong generalization enhance its applicability and utility across diverse clinical BioNER tasks.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104896"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-23DOI: 10.1016/j.jbi.2025.104918
Pengfei Yin , Abel Armas Cervantes , Daniel Capurro
Importance
Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.
Objective
Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.
Methods
This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.
Result
Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.
Conclusion
In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.
{"title":"Measuring and visualizing healthcare process variability","authors":"Pengfei Yin , Abel Armas Cervantes , Daniel Capurro","doi":"10.1016/j.jbi.2025.104918","DOIUrl":"10.1016/j.jbi.2025.104918","url":null,"abstract":"<div><h3>Importance</h3><div>Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.</div></div><div><h3>Objective</h3><div>Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.</div></div><div><h3>Methods</h3><div>This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.</div></div><div><h3>Result</h3><div>Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.</div></div><div><h3>Conclusion</h3><div>In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104918"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-09DOI: 10.1016/j.jbi.2025.104908
Steven J. Atlas , Timothy E. Burdick , Adam Wright , Wenyan Zhao , Shoshana Hort , David G. Aman , Mathan Thillaiyapillai , E. John Orav , Amy J. Wint , Rebecca E. Smith , Katherine L. Gallagher , Molly L. Housman , Frank Y. Chang , Courtney J. Diamond , Li Zhou , Jennifer S. Haas , Anna N.A. Tosteson
Background
Many individuals with abnormal cervical cancer screening test results do not receive timely follow-up care. Clinical decision support systems (CDSS) to improve follow-up are challenged by difficulty identifying clinical elements and applying complex guideline recommendations. As part of a multisite trial, two CDSS models were implemented: one used natural language processes to evaluate extracted data outside of the electronic health record (EHR) (System A); the other used commercial EHR functionality using LOINC-defined result fields (System B). This secondary analysis compared the accuracy and trial outcomes among sites using these two CDSS models.
Methods
Primary care clinics (32 in System A and 12 in System B) were randomly assigned to usual care, CDSS alone, or CDSS with patient outreach with or without navigation. CDSS identified individuals with overdue abnormal screening results and specified the recommended follow-up and time interval. CDSS accuracy was assessed by manual chart review. Patient outreach consisted of portal/mailed letters plus a single phone call. Navigation included one or more phone calls to address barriers to care. Completion of recommended follow-up at 120 days after enrollment was the primary outcome. Clinic was the unit of randomization, and the patient was the unit of analysis.
Results
Between October 2020 and December 2021, 2596 patients with abnormal results were identified by the CDSS. CDSS true positives were 61.3 % in System A and 70.4 % in System B. CDSS alone versus usual care did not improve outcomes in either system. CDSS with patient outreach with or without navigation versus usual care significantly increased follow-up rates in System A (38.2 % or 37.2 % vs 23.5 %, p < 0.001) and System B (25.4 % or 23 % vs. 19.7 %, p = 0.044).
Conclusions
Two CDSS models developed to identify overdue abnormal cervical cancer screening test results had moderate accuracy. Both models with patient outreach with or without navigation – but not CDSS alone – increased recommended follow-up. Future CDSS for cervical cancer screening may be improved with open-source tools developed in public–private partnerships.
背景:许多宫颈癌筛查结果异常的个体没有得到及时的随访。临床决策支持系统(CDSS),以提高随访困难识别临床因素和应用复杂的指南建议的挑战。作为多站点试验的一部分,实施了两个CDSS模型:一个使用自然语言过程来评估电子健康记录(EHR)(系统a)之外提取的数据;另一个使用商业EHR功能,使用loc定义的结果字段(系统B)。该二次分析比较了使用这两种CDSS模型的站点的准确性和试验结果。方法:初级保健诊所(A系统32家,B系统12家)被随机分配到常规护理、单独CDSS或CDSS患者外展有或没有导航。CDSS对筛查结果逾期异常的个体进行识别,并规定了建议的随访和时间间隔。CDSS的准确性通过人工图表审查来评估。患者外展包括门户/邮寄信件加上一个电话。导航包括一个或多个电话,以解决护理障碍。在入组后120 天完成推荐的随访是主要结局。临床是随机化的单位,病人是分析的单位。结果:在2020年10月至2021年12月期间,CDSS发现了2596例异常结果患者。在A系统中CDSS真阳性为61.3 %,在b系统中为70.4 %。单独使用CDSS与常规护理相比,对两种系统的结果都没有改善。与常规护理相比,有导航或没有导航的CDSS患者外展显著增加了A系统的随访率(38.2 %或37.2 % vs 23.5 %,p )。结论:两种用于识别逾期异常宫颈癌筛查结果的CDSS模型具有中等准确性。这两种模型都增加了推荐的随访,包括有或没有导航的患者外展,而不是单独的CDSS。未来用于子宫颈癌筛查的CDSS可以通过公私合作开发的开源工具得到改进。
{"title":"Comparing clinical decision support systems for improving follow-up of abnormal cervical cancer screening test results","authors":"Steven J. Atlas , Timothy E. Burdick , Adam Wright , Wenyan Zhao , Shoshana Hort , David G. Aman , Mathan Thillaiyapillai , E. John Orav , Amy J. Wint , Rebecca E. Smith , Katherine L. Gallagher , Molly L. Housman , Frank Y. Chang , Courtney J. Diamond , Li Zhou , Jennifer S. Haas , Anna N.A. Tosteson","doi":"10.1016/j.jbi.2025.104908","DOIUrl":"10.1016/j.jbi.2025.104908","url":null,"abstract":"<div><h3>Background</h3><div>Many individuals with abnormal cervical cancer screening test results do not receive timely follow-up care. Clinical decision support systems (CDSS) to improve follow-up are challenged by difficulty identifying clinical elements and applying complex guideline recommendations. As part of a multisite trial, two CDSS models were implemented: one used natural language processes to evaluate extracted data outside of the electronic health record (EHR) (System A); the other used commercial EHR functionality using LOINC-defined result fields (System B). This secondary analysis compared the accuracy and trial outcomes among sites using these two CDSS models.</div></div><div><h3>Methods</h3><div>Primary care clinics (32 in System A and 12 in System B) were randomly assigned to usual care, CDSS alone, or CDSS with patient outreach with or without navigation. CDSS identified individuals with overdue abnormal screening results and specified the recommended follow-up and time interval. CDSS accuracy was assessed by manual chart review. Patient outreach consisted of portal/mailed letters plus a single phone call. Navigation included one or more phone calls to address barriers to care. Completion of recommended follow-up at 120 days after enrollment was the primary outcome. Clinic was the unit of randomization, and the patient was the unit of analysis.</div></div><div><h3>Results</h3><div>Between October 2020 and December 2021, 2596 patients with abnormal results were identified by the CDSS. CDSS true positives were 61.3 % in System A and 70.4 % in System B. CDSS alone versus usual care did not improve outcomes in either system. CDSS with patient outreach with or without navigation versus usual care significantly increased follow-up rates in System A (38.2 % or 37.2 % vs 23.5 %, p < 0.001) and System B (25.4 % or 23 % vs. 19.7 %, p = 0.044).</div></div><div><h3>Conclusions</h3><div>Two CDSS models developed to identify overdue abnormal cervical cancer screening test results had moderate accuracy. Both models with patient outreach with or without navigation – but not CDSS alone – increased recommended follow-up. Future CDSS for cervical cancer screening may be improved with open-source tools developed in public–private partnerships.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104908"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}