首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
SynthMedic: Utilizing large language models for synthetic discharge summary generation, correction and validation SynthMedic:利用大型语言模型生成、校正和验证综合放电摘要。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-15 DOI: 10.1016/j.jbi.2025.104906
Georgi Grazhdanski , Vasil Vasilev , Sylvia Vassileva , Dimitar Taskov , Izabel Antova , Ivan Koychev , Svetla Boytcheva

Background and Objective:

Synthetic clinical texts can improve transparency and reduce bias and costs when training and evaluating specialized language models in the medical domain. Synthetic texts are freely shareable, as they contain no real patient information, and can be customized for a specific task. The objective of this study is to develop a methodology for generating, validating, and correcting synthetic discharge summaries using LLMs without requiring any real patient data.

Methods:

The proposed approach uses an LLM to generate synthetic discharge summaries for specific diseases and standard medical references from Merck Manuals to ground the generation in internationally accepted medical practices. We validate the generated summaries using LLMs as well as by human expert validation. In addition, we propose a method for automatic correction of the generated discharge summaries using Knowledge Graphs to ensure medical factual correctness.

Results:

The conducted human expert evaluation shows that the generated synthetic discharge summaries are credible and factually accurate when provided with the medical reference context. The generated summaries achieve a System Usability Score of 94.35% based on a comprehensive rubric evaluated by medical professionals and a score of 93.65% on the Faithfulness metric evaluated by an LLM.

Conclusions:

The proposed methodology can be utilized to generate high-quality synthetic discharge summaries for various diseases. The generated synthetic corpus consists of 900 discharge summaries in English representing nine socially significant diseases and is publicly available under an open license. The community can take advantage of the corpus and proposed methodology to train complex machine learning models, helping medical professionals in their daily work without using real patient data.
背景和目的:在训练和评估医学领域的专业语言模型时,合成临床文本可以提高透明度,减少偏见和成本。合成文本可以自由共享,因为它们不包含真实的患者信息,并且可以针对特定任务进行定制。本研究的目的是开发一种方法,在不需要任何真实患者数据的情况下,使用llm生成、验证和纠正合成出院摘要。方法:提出的方法使用法学硕士生成特定疾病的综合出院摘要和默克手册中的标准医疗参考资料,以使生成符合国际公认的医疗实践。我们使用llm和人类专家验证来验证生成的摘要。此外,我们提出了一种使用知识图自动更正生成的出院摘要的方法,以确保医学事实的正确性。结果:人工专家评估表明,在提供医疗参考环境时,生成的综合出院摘要是可信的和事实准确的。根据医学专业人员评估的综合指标,生成的摘要达到了94.35%的系统可用性得分,而根据法学硕士评估的忠诚度指标,生成的摘要达到了93.65%的系统可用性得分。结论:该方法可用于生成各种疾病的高质量综合出院摘要。生成的合成语料库由900个英文出院摘要组成,代表9种具有社会意义的疾病,并在开放许可下公开提供。社区可以利用语料库和提出的方法来训练复杂的机器学习模型,在不使用真实患者数据的情况下帮助医疗专业人员进行日常工作。
{"title":"SynthMedic: Utilizing large language models for synthetic discharge summary generation, correction and validation","authors":"Georgi Grazhdanski ,&nbsp;Vasil Vasilev ,&nbsp;Sylvia Vassileva ,&nbsp;Dimitar Taskov ,&nbsp;Izabel Antova ,&nbsp;Ivan Koychev ,&nbsp;Svetla Boytcheva","doi":"10.1016/j.jbi.2025.104906","DOIUrl":"10.1016/j.jbi.2025.104906","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Synthetic clinical texts can improve transparency and reduce bias and costs when training and evaluating specialized language models in the medical domain. Synthetic texts are freely shareable, as they contain no real patient information, and can be customized for a specific task. The objective of this study is to develop a methodology for generating, validating, and correcting synthetic discharge summaries using LLMs without requiring any real patient data.</div></div><div><h3>Methods:</h3><div>The proposed approach uses an LLM to generate synthetic discharge summaries for specific diseases and standard medical references from Merck Manuals to ground the generation in internationally accepted medical practices. We validate the generated summaries using LLMs as well as by human expert validation. In addition, we propose a method for automatic correction of the generated discharge summaries using Knowledge Graphs to ensure medical factual correctness.</div></div><div><h3>Results:</h3><div>The conducted human expert evaluation shows that the generated synthetic discharge summaries are credible and factually accurate when provided with the medical reference context. The generated summaries achieve a System Usability Score of 94.35% based on a comprehensive rubric evaluated by medical professionals and a score of 93.65% on the Faithfulness metric evaluated by an LLM.</div></div><div><h3>Conclusions:</h3><div>The proposed methodology can be utilized to generate high-quality synthetic discharge summaries for various diseases. The generated synthetic corpus consists of 900 discharge summaries in English representing nine socially significant diseases and is publicly available under an open license. The community can take advantage of the corpus and proposed methodology to train complex machine learning models, helping medical professionals in their daily work without using real patient data.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104906"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145080834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records 通过丰富的验证和有针对性的抽样来克服数据挑战,以测量电子健康记录中的整个人的健康。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-02 DOI: 10.1016/j.jbi.2025.104904
Sarah C. Lotspeich , Sheetal Kedar , Rabeya Tahir , Aidan D. Keleghan , Amelia Miranda , Stephany N. Duda , Michael P. Bancks , Brian J. Wells , Ashish K. Khanna , Joseph Rigdon

Objective:

The allostatic load index (ALI) is a 10-component composite measure of whole-person health, which reflects the multiple interrelated physiological regulatory systems that underlie healthy functioning. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients’ data can be validated, and most protocols do not recover missing data.

Methods:

Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine ways to select the most informative patients for validation. Incorporating clinical expertise, we devise a novel validation protocol to promote EHR data quality and completeness.

Results:

Chart reviews uncovered few errors (99% matched source documents) and recovered some missing data through auxiliary information in patients’ charts. On average, validation increased the number of non-missing ALI components per patient from 6 to 7. Through simulations based on preliminary data, residual sampling was identified as the most informative strategy for completing our validation study. Incorporating validation data, statistical models indicated that worse whole-person health (higher ALI) was associated with higher odds of engaging in the healthcare system, adjusting for age.

Conclusion:

Targeted validation with an enriched protocol can ensure the quality and promote the completeness of EHR data. Findings from our validation study were incorporated into analyses as we operationalize the ALI as a scalable whole-person health measure that predicts healthcare utilization in the learning health system.
目的:适应负荷指数(ALI)是一个由10个成分组成的整体人体健康指标,它反映了健康功能背后的多个相互关联的生理调节系统。来自电子健康记录(EHR)的数据为在学习卫生系统中实施ALI提供了巨大的机会;然而,这些数据容易丢失和错误。验证(例如,通过图表审查)可以提供更高质量的数据,但实际上,只有一小部分患者的数据可以被验证,而且大多数方案不能恢复丢失的数据。方法:从一个广泛的学习型卫生系统的电子病历中选取1000名患者作为代表性样本(其中100人可以被验证),我们提出了设计、实施和分析ALI和医疗保健利用的统计有效和稳健研究的方法。采用半参数最大似然估计,我们稳健地将所有可用的患者信息纳入统计模型。使用有针对性的设计策略,我们检查了选择最具信息性的患者进行验证的方法。结合临床专业知识,我们设计了一种新的验证方案,以提高电子病历数据的质量和完整性。结果:图表审核发现的错误很少(99%与源文档匹配),并通过患者图表中的辅助信息恢复了一些缺失的数据。平均而言,验证将每位患者非缺失ALI成分的数量从6个增加到7个。通过基于初步数据的模拟,残差抽样被确定为完成我们验证研究的最具信息性的策略。结合验证数据,统计模型表明,整体健康状况较差(ALI较高)与参与医疗保健系统的几率较高相关,并根据年龄进行调整。结论:采用丰富的方案进行有针对性的验证,可以保证电子病历数据的质量,提高数据的完整性。我们验证研究的结果被纳入分析,因为我们将ALI作为可扩展的全人健康测量来预测学习健康系统中的医疗保健利用。
{"title":"Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records","authors":"Sarah C. Lotspeich ,&nbsp;Sheetal Kedar ,&nbsp;Rabeya Tahir ,&nbsp;Aidan D. Keleghan ,&nbsp;Amelia Miranda ,&nbsp;Stephany N. Duda ,&nbsp;Michael P. Bancks ,&nbsp;Brian J. Wells ,&nbsp;Ashish K. Khanna ,&nbsp;Joseph Rigdon","doi":"10.1016/j.jbi.2025.104904","DOIUrl":"10.1016/j.jbi.2025.104904","url":null,"abstract":"<div><h3>Objective:</h3><div>The allostatic load index (ALI) is a 10-component composite measure of whole-person health, which reflects the multiple interrelated physiological regulatory systems that underlie healthy functioning. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients’ data can be validated, and most protocols do not recover missing data.</div></div><div><h3>Methods:</h3><div>Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine ways to select the most informative patients for validation. Incorporating clinical expertise, we devise a novel validation protocol to promote EHR data quality and completeness.</div></div><div><h3>Results:</h3><div>Chart reviews uncovered few errors (99% matched source documents) and recovered some missing data through auxiliary information in patients’ charts. On average, validation increased the number of non-missing ALI components per patient from 6 to 7. Through simulations based on preliminary data, residual sampling was identified as the most informative strategy for completing our validation study. Incorporating validation data, statistical models indicated that worse whole-person health (higher ALI) was associated with higher odds of engaging in the healthcare system, adjusting for age.</div></div><div><h3>Conclusion:</h3><div>Targeted validation with an enriched protocol can ensure the quality and promote the completeness of EHR data. Findings from our validation study were incorporated into analyses as we operationalize the ALI as a scalable whole-person health measure that predicts healthcare utilization in the learning health system.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104904"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145000662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Single-Cell perturbation response based on Direction-Constrained diffusion Schrödinger Bridge 基于方向约束扩散的单细胞微扰响应预测Schrödinger桥
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-21 DOI: 10.1016/j.jbi.2025.104915
Yiqing Luo , Lin Liu , Yaxin Fu , Yi Deng , Lin Tang

Objective

Predicting transcriptional responses to external perturbations at the single-cell level is essential for understanding gene regulatory networks, drug discovery, and personalized interventions. The exponential increase in perturbation conditions creates data sparsity, making it difficult to capture dynamic responses and necessitating computational modeling.

Methods

We present Direction-Constrained Diffusion Schrödinger Bridge (DC-DSB), a generative framework that learns probabilistic trajectories between unperturbed and post-perturbation distributions by minimizing path-space KL divergence. To enhance conditional control, DC-DSB integrates hierarchical representations derived from experimental variables and biological prior knowledge. We further introduce a direction-constrained conditioning strategy that injects condition signals along the biologically relevant perturbation trajectory, thereby improving modeling quality and training stability.

Results

DC-DSB improves expression prediction accuracy and generalization to unseen combinations over baselines. By modeling dynamic expression trajectories and co-expression structures under perturbation, DC-DSB enables the discovery of synergistic and antagonistic gene interactions and supports the progressive reconstruction of regulatory pathways.

Conclusion

DC-DSB provides a biologically consistent and generalizable framework for single-cell perturbation modeling. Its trajectory-based and condition-aware architecture overcomes the limitations of static mappings and facilitates downstream analyses in gene regulation and drug discovery.
目的预测单细胞水平对外部扰动的转录反应对于理解基因调控网络、药物发现和个性化干预至关重要。扰动条件的指数增长造成数据稀疏性,使得难以捕获动态响应并需要计算建模。方法我们提出了方向约束扩散Schrödinger桥(DC-DSB),这是一个生成框架,通过最小化路径空间KL散度来学习无扰动和后扰动分布之间的概率轨迹。为了增强条件控制,DC-DSB集成了由实验变量和生物先验知识派生的层次表示。我们进一步引入了一种方向约束的条件反射策略,该策略沿着生物相关的扰动轨迹注入条件信号,从而提高建模质量和训练稳定性。结果dc - dsb在基线上提高了表达预测的准确性和对未见组合的泛化。通过模拟扰动下的动态表达轨迹和共表达结构,DC-DSB能够发现协同和拮抗基因相互作用,并支持调控途径的逐步重建。结论dc - dsb为单细胞微扰建模提供了生物学一致性和可推广的框架。其基于轨迹和条件感知的结构克服了静态映射的局限性,促进了基因调控和药物发现的下游分析。
{"title":"Prediction of Single-Cell perturbation response based on Direction-Constrained diffusion Schrödinger Bridge","authors":"Yiqing Luo ,&nbsp;Lin Liu ,&nbsp;Yaxin Fu ,&nbsp;Yi Deng ,&nbsp;Lin Tang","doi":"10.1016/j.jbi.2025.104915","DOIUrl":"10.1016/j.jbi.2025.104915","url":null,"abstract":"<div><h3>Objective</h3><div>Predicting transcriptional responses to external perturbations at the single-cell level is essential for understanding gene regulatory networks, drug discovery, and personalized interventions. The exponential increase in perturbation conditions creates data sparsity, making it difficult to capture dynamic responses and necessitating computational modeling.</div></div><div><h3>Methods</h3><div>We present Direction-Constrained Diffusion Schrödinger Bridge (DC-DSB), a generative framework that learns probabilistic trajectories between unperturbed and post-perturbation distributions by minimizing path-space KL divergence. To enhance conditional control, DC-DSB integrates hierarchical representations derived from experimental variables and biological prior knowledge. We further introduce a direction-constrained conditioning strategy that injects condition signals along the biologically relevant perturbation trajectory, thereby improving modeling quality and training stability.</div></div><div><h3>Results</h3><div>DC-DSB improves expression prediction accuracy and generalization to unseen combinations over baselines. By modeling dynamic expression trajectories and co-expression structures under perturbation, DC-DSB enables the discovery of synergistic and antagonistic gene interactions and supports the progressive reconstruction of regulatory pathways.</div></div><div><h3>Conclusion</h3><div>DC-DSB provides a biologically consistent and generalizable framework for single-cell perturbation modeling. Its trajectory-based and condition-aware architecture overcomes the limitations of static mappings and facilitates downstream analyses in gene regulation and drug discovery.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104915"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource-efficient instruction tuning of large language models for biomedical named entity recognition 生物医学命名实体识别大型语言模型的资源高效指令调优
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-21 DOI: 10.1016/j.jbi.2025.104896
Hui Liu , Ziyi Chen , Peilin Li , Yuan-Zhi Liu , Xiangtao Liu , Ronald X. Xu , Mingzhai Sun

Objective:

Large language models (LLMs) have exhibited remarkable efficacy in natural language processing (NLP) tasks, with fine-tuning for Biomedical Named Entity Recognition (BioNER) receiving significant research attention. However, the substantial computational demands associated with fine-tuning large-scale models constrain their development and deployment. Consequently, this study investigates parameter-efficient fine-tuning (PEFT) techniques to optimize LLMs for BioNER under limited computational resources. By leveraging these methods, competitive model performance is maintained while preserving in-domain generalization capability.

Methods:

In this study, we employed the PEFT method QLoRA to fine-tune the open-source Llama3.1 model, developing the NERLlama3.1 model specifically designed for the BioNER task. First, an LLM instruction tuning dataset was created using BioNER datasets such as NCBI-disease, BC5CDR-chem, and BC2GM-gene. Next, the Llama3.1-8B model was fine-tuned using the QLoRA method on a single 16GB memory GPU. Furthermore, during the inference phase, we introduced a prompt engineering technique called self-consistency NER prompting (SCNP). This approach leverages the diversity of outputs generated by LLMs to significantly enhance NER performance. Finally, we also developed a multi-task BioNER-capable model, NERLlama3.1-MT, to investigate the capability of fine-tuned LLMs in addressing multi-task BioNER scenarios.

Results:

The NERLlama3.1 model achieved F1-scores of 0.8977, 0.9402, and 0.8530 on the NCBI-disease, BC5CDR-chemical, and BG2GM-gene datasets, respectively. Furthermore, when evaluated on previously unseen datasets, it attained F1-scores of 0.6867 on BC5CDR-disease, 0.6800 on NLM-chemical, and 0.8378 on NLM-gene. These results demonstrate that NERLlama3.1 not only outperforms fully fine-tuned LLMs but also exhibits superior in-domain generalization capabilities when compared to the BERT-base model. Additionally, this work represents the first exploration of fine-tuning LLMs for multi-task BioNER.

Conclusion:

NERLlama3.1 outperformed LLMs fine-tuned with full parameter updates, despite requiring significantly fewer computational resources. Moreover, it exhibited substantially superior in-domain generalization capabilities compared to traditional pre-trained language models. Its low resource demands, high performance, and strong generalization enhance its applicability and utility across diverse clinical BioNER tasks.
目的:大型语言模型(LLMs)在自然语言处理(NLP)任务中表现出显著的有效性,其中生物医学命名实体识别(BioNER)的微调受到了重要的研究关注。然而,与微调大规模模型相关的大量计算需求限制了它们的开发和部署。因此,本研究探讨了参数高效微调(PEFT)技术,以在有限的计算资源下优化BioNER的llm。通过利用这些方法,在保持域内泛化能力的同时保持了具有竞争力的模型性能。方法:本研究采用PEFT方法QLoRA对开源Llama3.1模型进行微调,开发专门针对BioNER任务设计的NERLlama3.1模型。首先,利用NCBI-disease、BC5CDR-chem、BC2GM-gene等BioNER数据集创建LLM指令调优数据集。接下来,在单个16GB内存GPU上,使用QLoRA方法对Llama3.1-8B模型进行微调。此外,在推理阶段,我们引入了一种提示工程技术,称为自一致性NER提示(SCNP)。这种方法利用llm产生的输出的多样性来显著提高NER性能。最后,我们还开发了一个多任务BioNER-capable模型NERLlama3.1-MT,以研究微调llm在解决多任务BioNER场景中的能力。结果:NERLlama3.1模型在ncbi -疾病、bc5cdr -化学和bg2gm -基因数据集上的f1得分分别为0.8977、0.9402和0.8530。此外,当对以前未见过的数据集进行评估时,BC5CDR-disease的f1得分为0.6867,NLM-chemical的得分为0.6800,NLM-gene的得分为0.8378。这些结果表明,与基于bert的模型相比,NERLlama3.1不仅优于完全微调的llm,而且具有优越的域内泛化能力。此外,这项工作代表了对多任务bioner微调llm的首次探索。结论:尽管需要的计算资源显着减少,但NERLlama3.1优于使用全参数更新进行微调的llm。此外,与传统的预训练语言模型相比,它表现出了显著优越的领域内泛化能力。它的低资源需求、高性能和强泛化增强了它在各种临床BioNER任务中的适用性和实用性。
{"title":"Resource-efficient instruction tuning of large language models for biomedical named entity recognition","authors":"Hui Liu ,&nbsp;Ziyi Chen ,&nbsp;Peilin Li ,&nbsp;Yuan-Zhi Liu ,&nbsp;Xiangtao Liu ,&nbsp;Ronald X. Xu ,&nbsp;Mingzhai Sun","doi":"10.1016/j.jbi.2025.104896","DOIUrl":"10.1016/j.jbi.2025.104896","url":null,"abstract":"<div><h3>Objective:</h3><div>Large language models (LLMs) have exhibited remarkable efficacy in natural language processing (NLP) tasks, with fine-tuning for Biomedical Named Entity Recognition (BioNER) receiving significant research attention. However, the substantial computational demands associated with fine-tuning large-scale models constrain their development and deployment. Consequently, this study investigates parameter-efficient fine-tuning (PEFT) techniques to optimize LLMs for BioNER under limited computational resources. By leveraging these methods, competitive model performance is maintained while preserving in-domain generalization capability.</div></div><div><h3>Methods:</h3><div>In this study, we employed the PEFT method QLoRA to fine-tune the open-source Llama3.1 model, developing the NERLlama3.1 model specifically designed for the BioNER task. First, an LLM instruction tuning dataset was created using BioNER datasets such as NCBI-disease, BC5CDR-chem, and BC2GM-gene. Next, the Llama3.1-8B model was fine-tuned using the QLoRA method on a single 16GB memory GPU. Furthermore, during the inference phase, we introduced a prompt engineering technique called self-consistency NER prompting (SCNP). This approach leverages the diversity of outputs generated by LLMs to significantly enhance NER performance. Finally, we also developed a multi-task BioNER-capable model, NERLlama3.1-MT, to investigate the capability of fine-tuned LLMs in addressing multi-task BioNER scenarios.</div></div><div><h3>Results:</h3><div>The NERLlama3.1 model achieved F1-scores of 0.8977, 0.9402, and 0.8530 on the NCBI-disease, BC5CDR-chemical, and BG2GM-gene datasets, respectively. Furthermore, when evaluated on previously unseen datasets, it attained F1-scores of 0.6867 on BC5CDR-disease, 0.6800 on NLM-chemical, and 0.8378 on NLM-gene. These results demonstrate that NERLlama3.1 not only outperforms fully fine-tuned LLMs but also exhibits superior in-domain generalization capabilities when compared to the BERT-base model. Additionally, this work represents the first exploration of fine-tuning LLMs for multi-task BioNER.</div></div><div><h3>Conclusion:</h3><div>NERLlama3.1 outperformed LLMs fine-tuned with full parameter updates, despite requiring significantly fewer computational resources. Moreover, it exhibited substantially superior in-domain generalization capabilities compared to traditional pre-trained language models. Its low resource demands, high performance, and strong generalization enhance its applicability and utility across diverse clinical BioNER tasks.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104896"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring and visualizing healthcare process variability 测量和可视化医疗保健过程可变性。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-23 DOI: 10.1016/j.jbi.2025.104918
Pengfei Yin , Abel Armas Cervantes , Daniel Capurro

Importance

Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.

Objective

Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.

Methods

This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.

Result

Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.

Conclusion

In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.
重要性:了解导致患者护理临床变异性的因素至关重要,因为无根据的变异性可能导致不良事件增加和住院时间延长。确定这种可变性何时变得过度,是优化患者结果和医疗效率的一个步骤。目的:探讨临床变异与临床转归的关系。本研究旨在找出临床变异与住院时间(LOS)之间关系显著的时间点。方法:本队列研究使用MIMIC-IV数据集,收集美国贝斯以色列女执事医疗中心的电子健康记录。我们关注的是接受择期冠状动脉搭桥手术的成年患者,共观察了847例患者。记录年龄、种族、保险类型、Charlson共病指数(CCI)等人口统计学因素。我们进行了变异性分析,其中患者的临床过程表示为事件序列。根据记录的活动起始日对数据进行分割,建立观察窗口。通过回归分析,我们确定了可变性对LOS的影响变得独立显著的时间窗口。结果:回归分析显示,在变异性距离组中排名前20位 %的患者的LOS增加了81 %(95 % CI: 1.72至1.91,p )。结论:在所研究的队列中,变异性较大的患者路程与较长的LOS相关,并具有剂量-反应关系:变异性越高,LOS越长。本研究提出了一种标准化的方法来测量和可视化临床过程中的变异性,并测量其对患者相关结果的影响。
{"title":"Measuring and visualizing healthcare process variability","authors":"Pengfei Yin ,&nbsp;Abel Armas Cervantes ,&nbsp;Daniel Capurro","doi":"10.1016/j.jbi.2025.104918","DOIUrl":"10.1016/j.jbi.2025.104918","url":null,"abstract":"<div><h3>Importance</h3><div>Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.</div></div><div><h3>Objective</h3><div>Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.</div></div><div><h3>Methods</h3><div>This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.</div></div><div><h3>Result</h3><div>Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p &lt; 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p &lt; 0.001) and 21 % (95 % CI: 0.71 to 0.88, p &lt; 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p &lt; 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.</div></div><div><h3>Conclusion</h3><div>In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104918"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing clinical decision support systems for improving follow-up of abnormal cervical cancer screening test results 比较临床决策支持系统对改善异常宫颈癌筛查结果随访的作用。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-09 DOI: 10.1016/j.jbi.2025.104908
Steven J. Atlas , Timothy E. Burdick , Adam Wright , Wenyan Zhao , Shoshana Hort , David G. Aman , Mathan Thillaiyapillai , E. John Orav , Amy J. Wint , Rebecca E. Smith , Katherine L. Gallagher , Molly L. Housman , Frank Y. Chang , Courtney J. Diamond , Li Zhou , Jennifer S. Haas , Anna N.A. Tosteson

Background

Many individuals with abnormal cervical cancer screening test results do not receive timely follow-up care. Clinical decision support systems (CDSS) to improve follow-up are challenged by difficulty identifying clinical elements and applying complex guideline recommendations. As part of a multisite trial, two CDSS models were implemented: one used natural language processes to evaluate extracted data outside of the electronic health record (EHR) (System A); the other used commercial EHR functionality using LOINC-defined result fields (System B). This secondary analysis compared the accuracy and trial outcomes among sites using these two CDSS models.

Methods

Primary care clinics (32 in System A and 12 in System B) were randomly assigned to usual care, CDSS alone, or CDSS with patient outreach with or without navigation. CDSS identified individuals with overdue abnormal screening results and specified the recommended follow-up and time interval. CDSS accuracy was assessed by manual chart review. Patient outreach consisted of portal/mailed letters plus a single phone call. Navigation included one or more phone calls to address barriers to care. Completion of recommended follow-up at 120 days after enrollment was the primary outcome. Clinic was the unit of randomization, and the patient was the unit of analysis.

Results

Between October 2020 and December 2021, 2596 patients with abnormal results were identified by the CDSS. CDSS true positives were 61.3 % in System A and 70.4 % in System B. CDSS alone versus usual care did not improve outcomes in either system. CDSS with patient outreach with or without navigation versus usual care significantly increased follow-up rates in System A (38.2 % or 37.2 % vs 23.5 %, p < 0.001) and System B (25.4 % or 23 % vs. 19.7 %, p = 0.044).

Conclusions

Two CDSS models developed to identify overdue abnormal cervical cancer screening test results had moderate accuracy. Both models with patient outreach with or without navigation – but not CDSS alone – increased recommended follow-up. Future CDSS for cervical cancer screening may be improved with open-source tools developed in public–private partnerships.
背景:许多宫颈癌筛查结果异常的个体没有得到及时的随访。临床决策支持系统(CDSS),以提高随访困难识别临床因素和应用复杂的指南建议的挑战。作为多站点试验的一部分,实施了两个CDSS模型:一个使用自然语言过程来评估电子健康记录(EHR)(系统a)之外提取的数据;另一个使用商业EHR功能,使用loc定义的结果字段(系统B)。该二次分析比较了使用这两种CDSS模型的站点的准确性和试验结果。方法:初级保健诊所(A系统32家,B系统12家)被随机分配到常规护理、单独CDSS或CDSS患者外展有或没有导航。CDSS对筛查结果逾期异常的个体进行识别,并规定了建议的随访和时间间隔。CDSS的准确性通过人工图表审查来评估。患者外展包括门户/邮寄信件加上一个电话。导航包括一个或多个电话,以解决护理障碍。在入组后120 天完成推荐的随访是主要结局。临床是随机化的单位,病人是分析的单位。结果:在2020年10月至2021年12月期间,CDSS发现了2596例异常结果患者。在A系统中CDSS真阳性为61.3 %,在b系统中为70.4 %。单独使用CDSS与常规护理相比,对两种系统的结果都没有改善。与常规护理相比,有导航或没有导航的CDSS患者外展显著增加了A系统的随访率(38.2 %或37.2 % vs 23.5 %,p )。结论:两种用于识别逾期异常宫颈癌筛查结果的CDSS模型具有中等准确性。这两种模型都增加了推荐的随访,包括有或没有导航的患者外展,而不是单独的CDSS。未来用于子宫颈癌筛查的CDSS可以通过公私合作开发的开源工具得到改进。
{"title":"Comparing clinical decision support systems for improving follow-up of abnormal cervical cancer screening test results","authors":"Steven J. Atlas ,&nbsp;Timothy E. Burdick ,&nbsp;Adam Wright ,&nbsp;Wenyan Zhao ,&nbsp;Shoshana Hort ,&nbsp;David G. Aman ,&nbsp;Mathan Thillaiyapillai ,&nbsp;E. John Orav ,&nbsp;Amy J. Wint ,&nbsp;Rebecca E. Smith ,&nbsp;Katherine L. Gallagher ,&nbsp;Molly L. Housman ,&nbsp;Frank Y. Chang ,&nbsp;Courtney J. Diamond ,&nbsp;Li Zhou ,&nbsp;Jennifer S. Haas ,&nbsp;Anna N.A. Tosteson","doi":"10.1016/j.jbi.2025.104908","DOIUrl":"10.1016/j.jbi.2025.104908","url":null,"abstract":"<div><h3>Background</h3><div>Many individuals with abnormal cervical cancer screening test results do not receive timely follow-up care. Clinical decision support systems (CDSS) to improve follow-up are challenged by difficulty identifying clinical elements and applying complex guideline recommendations. As part of a multisite trial, two CDSS models were implemented: one used natural language processes to evaluate extracted data outside of the electronic health record (EHR) (System A); the other used commercial EHR functionality using LOINC-defined result fields (System B). This secondary analysis compared the accuracy and trial outcomes among sites using these two CDSS models.</div></div><div><h3>Methods</h3><div>Primary care clinics (32 in System A and 12 in System B) were randomly assigned to usual care, CDSS alone, or CDSS with patient outreach with or without navigation. CDSS identified individuals with overdue abnormal screening results and specified the recommended follow-up and time interval. CDSS accuracy was assessed by manual chart review. Patient outreach consisted of portal/mailed letters plus a single phone call. Navigation included one or more phone calls to address barriers to care. Completion of recommended follow-up at 120 days after enrollment was the primary outcome. Clinic was the unit of randomization, and the patient was the unit of analysis.</div></div><div><h3>Results</h3><div>Between October 2020 and December 2021, 2596 patients with abnormal results were identified by the CDSS. CDSS true positives were 61.3 % in System A and 70.4 % in System B. CDSS alone versus usual care did not improve outcomes in either system. CDSS with patient outreach with or without navigation versus usual care significantly increased follow-up rates in System A (38.2 % or 37.2 % vs 23.5 %, p &lt; 0.001) and System B (25.4 % or 23 % vs. 19.7 %, p = 0.044).</div></div><div><h3>Conclusions</h3><div>Two CDSS models developed to identify overdue abnormal cervical cancer screening test results had moderate accuracy. Both models with patient outreach with or without navigation – but not CDSS alone – increased recommended follow-up. Future CDSS for cervical cancer screening may be improved with open-source tools developed in public–private partnerships.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104908"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCGMMF: a prediction method for breast cancer prognostic recurrence and metastasis risk based on enhanced multimodal feature fusion PCGMMF:基于增强多模态特征融合的乳腺癌预后复发转移风险预测方法。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-09 DOI: 10.1016/j.jbi.2025.104907
Wei Du , Liang Gao , Xianhua Xu , Yuhua Yao , Zhong Li

Background

Breast cancer is a highly heterogeneous disease with high morbidity and mortality rates. Despite the availability of various treatments, a significant number of patients still face a high probability of recurrence or metastasis, which severely impacts their survival status. Traditional prognostic methods based on single-modality data and machine learning algorithms often fail to adequately capture the complex biological relationships and heterogeneous characteristics of breast cancer, leading to suboptimal prognostic performance. Therefore, there is an urgent need for a more accurate and effective method to predict the risk of recurrence and metastasis in breast cancer prognosis.

Methods

In this study, we propose a novel method termed PCGMMF for breast cancer prognostic analysis. This method integrates histopathological images, clinical data, gene expression data, and DNA methylation data through multimodal fusion. We leverage a pre-trained Vision-LSTM model based on transfer learning to extract features from histopathological images. Additionally, we design a comprehensive feature selection strategy that includes support vector machine (SVM), Mantel test, and correlation analysis to filter features from gene expression data and DNA methylation data. Furthermore, to address the high heterogeneity of breast cancer and the independence and intersectionality of multimodal features, we propose a bidirectional attention and self-attention based enhanced multimodal feature fusion module called BSAMF.

Results

Through a series of experiments, we evaluate the performance of PCGMMF. When predicting the recurrence and metastasis risk of breast cancer prognosis, PCGMMF achieves an accuracy of 0.903 and an AUC value of 0.924, outperforming other state-of-the-art methods. Furthermore, we provide an interpretability analysis of highly significant regions from histopathological images, which can serve as a reference for clinical practice.

Conclusion

PCGMMF offers a robust and innovative solution for breast cancer prognostic analysis by effectively integrating multimodal data and utilizing advanced deep learning techniques. It can effectively conduct breast cancer prognostic analysis and provide significant references for personalized precision treatment and clinical practice.
背景:乳腺癌是一种高发病率和高死亡率的异质性疾病。尽管有各种治疗方法,但仍有相当一部分患者面临复发或转移的高概率,严重影响其生存状态。基于单模态数据和机器学习算法的传统预后方法往往无法充分捕捉乳腺癌复杂的生物学关系和异质性特征,导致预后表现不佳。因此,迫切需要一种更准确有效的预测乳腺癌复发转移风险的预后方法。方法:在本研究中,我们提出了一种称为PCGMMF的新方法用于乳腺癌预后分析。该方法通过多模态融合整合组织病理图像、临床数据、基因表达数据和DNA甲基化数据。我们利用基于迁移学习的预训练视觉- lstm模型从组织病理学图像中提取特征。此外,我们设计了一个综合的特征选择策略,包括支持向量机(SVM)、Mantel测试和相关分析,从基因表达数据和DNA甲基化数据中过滤特征。此外,为了解决乳腺癌的高度异质性和多模态特征的独立性和交叉性,我们提出了一个基于双向关注和自关注的增强多模态特征融合模块BSAMF。结果:通过一系列实验,对PCGMMF的性能进行了评价。在预测乳腺癌复发转移风险时,PCGMMF的准确率为0.903,AUC值为0.924,优于其他最先进的方法。此外,我们还提供了组织病理学图像中高度显著区域的可解释性分析,可作为临床实践的参考。结论:PCGMMF通过有效整合多模态数据和利用先进的深度学习技术,为乳腺癌预后分析提供了一个强大而创新的解决方案。可有效进行乳腺癌预后分析,为个性化精准治疗及临床实践提供重要参考。
{"title":"PCGMMF: a prediction method for breast cancer prognostic recurrence and metastasis risk based on enhanced multimodal feature fusion","authors":"Wei Du ,&nbsp;Liang Gao ,&nbsp;Xianhua Xu ,&nbsp;Yuhua Yao ,&nbsp;Zhong Li","doi":"10.1016/j.jbi.2025.104907","DOIUrl":"10.1016/j.jbi.2025.104907","url":null,"abstract":"<div><h3>Background</h3><div>Breast cancer is a highly heterogeneous disease with high morbidity and mortality rates. Despite the availability of various treatments, a significant number of patients still face a high probability of recurrence or metastasis, which severely impacts their survival status. Traditional prognostic methods based on single-modality data and machine learning algorithms often fail to adequately capture the complex biological relationships and heterogeneous characteristics of breast cancer, leading to suboptimal prognostic performance. Therefore, there is an urgent need for a more accurate and effective method to predict the risk of recurrence and metastasis in breast cancer prognosis.</div></div><div><h3>Methods</h3><div>In this study, we propose a novel method termed PCGMMF for breast cancer prognostic analysis. This method integrates histopathological images, clinical data, gene expression data, and DNA methylation data through multimodal fusion. We leverage a pre-trained Vision-LSTM model based on transfer learning to extract features from histopathological images. Additionally, we design a comprehensive feature selection strategy that includes support vector machine (SVM), Mantel test, and correlation analysis to filter features from gene expression data and DNA methylation data. Furthermore, to address the high heterogeneity of breast cancer and the independence and intersectionality of multimodal features, we propose a bidirectional attention and self-attention based enhanced multimodal feature fusion module called BSAMF.</div></div><div><h3>Results</h3><div>Through a series of experiments, we evaluate the performance of PCGMMF. When predicting the recurrence and metastasis risk of breast cancer prognosis, PCGMMF achieves an accuracy of 0.903 and an AUC value of 0.924, outperforming other state-of-the-art methods. Furthermore, we provide an interpretability analysis of highly significant regions from histopathological images, which can serve as a reference for clinical practice.</div></div><div><h3>Conclusion</h3><div>PCGMMF offers a robust and innovative solution for breast cancer prognostic analysis by effectively integrating multimodal data and utilizing advanced deep learning techniques. It can effectively conduct breast cancer prognostic analysis and provide significant references for personalized precision treatment and clinical practice.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104907"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimized code-free AI approach for efficient and accurate literature screening in bone organoid research 一种优化的无代码人工智能方法,用于骨类器官研究中高效准确的文献筛选。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-09 DOI: 10.1016/j.jbi.2025.104911
Jiaxu Zheng , Janak Lal Pathak , Shangyan Li , Anqi Li , Jiechun Fang , Zonghua Li , Zhisheng Bi , Yin Xiao , Qing Zhang
The exponential growth of biomedical literature has rendered traditional screening methods inefficient and unsustainable, making knowledge discovery akin to finding a needle in a haystack. While recent advances in artificial intelligence (AI) offer new opportunities for rapid literature retrieval, many clinicians and researchers lack familiarity with these tools. In this study, we optimized LitSuggest, a user-friendly, code-free AI-based literature screening system, and established a standardized operational workflow. Using the field of organoid-based bone tissue engineering as a case study, the optimized system achieved an accuracy of 98.83%, precision of 76.19%, recall of 83.33%, and an F1-score of 79.60%, while reducing manual screening workload by over 90%. Furthermore, we innovatively integrated correlation scoring into literature analysis, revealing that China and the United States are leading contributors to bone organoid regeneration research, and that complex and genetic disease organoid models hold significant research potential. This AI-driven approach enables researchers to focus on high-value literature, improving efficiency while guiding future research in bone organoid regeneration and broader biomedical fields.
生物医学文献的指数级增长使得传统的筛选方法效率低下且不可持续,使得知识发现类似于大海捞针。虽然人工智能(AI)的最新进展为快速检索文献提供了新的机会,但许多临床医生和研究人员对这些工具缺乏熟悉。在本研究中,我们对LitSuggest这一用户友好、无代码的基于人工智能的文献筛选系统进行了优化,建立了标准化的操作流程。以类器官骨组织工程领域为例,优化后的系统准确率为98.83%,精密度为76.19%,召回率为83.33%,f1评分为79.60%,人工筛选工作量减少90%以上。此外,我们创新地将相关评分纳入文献分析,揭示了中国和美国是骨类器官再生研究的主要贡献者,复杂和遗传性疾病类器官模型具有重要的研究潜力。这种人工智能驱动的方法使研究人员能够专注于高价值的文献,提高效率,同时指导未来骨类器官再生和更广泛的生物医学领域的研究。
{"title":"An optimized code-free AI approach for efficient and accurate literature screening in bone organoid research","authors":"Jiaxu Zheng ,&nbsp;Janak Lal Pathak ,&nbsp;Shangyan Li ,&nbsp;Anqi Li ,&nbsp;Jiechun Fang ,&nbsp;Zonghua Li ,&nbsp;Zhisheng Bi ,&nbsp;Yin Xiao ,&nbsp;Qing Zhang","doi":"10.1016/j.jbi.2025.104911","DOIUrl":"10.1016/j.jbi.2025.104911","url":null,"abstract":"<div><div>The exponential growth of biomedical literature has rendered traditional screening methods inefficient and unsustainable, making knowledge discovery akin to finding a needle in a haystack. While recent advances in artificial intelligence (AI) offer new opportunities for rapid literature retrieval, many clinicians and researchers lack familiarity with these tools. In this study, we optimized LitSuggest, a user-friendly, code-free AI-based literature screening system, and established a standardized operational workflow. Using the field of organoid-based bone tissue engineering as a case study, the optimized system achieved an accuracy of 98.83%, precision of 76.19%, recall of 83.33%, and an F1-score of 79.60%, while reducing manual screening workload by over 90%. Furthermore, we innovatively integrated correlation scoring into literature analysis, revealing that China and the United States are leading contributors to bone organoid regeneration research, and that complex and genetic disease organoid models hold significant research potential. This AI-driven approach enables researchers to focus on high-value literature, improving efficiency while guiding future research in bone organoid regeneration and broader biomedical fields.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104911"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CeRTS: certainty retrieval token search in large language model clinical information extraction CeRTS:确定性检索令牌搜索在大语言模型临床信息提取中的应用。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-23 DOI: 10.1016/j.jbi.2025.104900
Lars E. Schimmelpfennig , Kriti Bhattarai , Inez Y. Oh , Jake Lever , Obi L. Griffith , Malachi Griffith , Albert M. Lai , Zachary B. Abrams

Objective

Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction.

Methods

We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance.

Results

CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5.

Conclusion

CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model’s Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.
目的:大型语言模型(LLMs)必须有效地传达其不确定性,以便在临床环境中可行。因此,随着法学模型在电子健康记录信息提取中的广泛使用,对可靠的不确定性估计的需求日益迫切。以前的标记级不确定性估计器仅在单个输出序列中使用标记概率。在这里,通过利用JSON输出结构的约束,我们转而考虑所有可能的序列及其各自的概率,以获得更健壮的模型置信度度量。提出了确定性检索令牌搜索(CeRTS),一种新的结构化信息提取的不确定性估计方法。方法:在从八个开源llm的肺癌出院摘要中提取临床特征时,我们将CeRTS与先前的金标准不确定性估计器进行了评估。校正(Brier评分)和鉴别(AUROC)用于量化绩效。结果:CeRTS在每个模型的区分能力上都超过了以前的金标准估计器,并且在大多数情况下实现了更好的校准。CeRTS模型置信度和准确度与Qwen-2.5的一致性最强。结论:CeRTS增强了基于llm的非结构化临床文本信息提取,为每个提取项目分配了校准良好的置信度评分,以最小的额外成本为医学研究人员提供了定量的可靠性测量。尽管CeRTS的性能总体上是稳健的,但它在DeepSeek-R1上表现不佳,我们将其归因于该模型的思维链推理步骤。我们的评估侧重于临床数据,但CeRTS可以应用于任何需要可靠的不确定性估计的领域。
{"title":"CeRTS: certainty retrieval token search in large language model clinical information extraction","authors":"Lars E. Schimmelpfennig ,&nbsp;Kriti Bhattarai ,&nbsp;Inez Y. Oh ,&nbsp;Jake Lever ,&nbsp;Obi L. Griffith ,&nbsp;Malachi Griffith ,&nbsp;Albert M. Lai ,&nbsp;Zachary B. Abrams","doi":"10.1016/j.jbi.2025.104900","DOIUrl":"10.1016/j.jbi.2025.104900","url":null,"abstract":"<div><h3>Objective</h3><div>Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction.</div></div><div><h3>Methods</h3><div>We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance.</div></div><div><h3>Results</h3><div>CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5.</div></div><div><h3>Conclusion</h3><div>CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model’s Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104900"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secondary use of radiological imaging data: Vanderbilt’s ImageVU approach 放射成像数据的二次使用:Vanderbilt的ImageVU方法
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-10 DOI: 10.1016/j.jbi.2025.104905
David S. Smith , Karthik Ramadass , Laura Jones , Jennifer Morse , Daniel Fabbri , Joseph R. Coco , Shunxing Bao , Melissa Basford , Peter J. Embi , Reed A. Omary , John C. Gore , Jill M. Pulley , Bennett A. Landman

Objective:

To develop ImageVU, a scalable research imaging infrastructure that integrates clinical imaging data with metadata-driven cohort discovery, enabling secure, efficient, and regulatory-compliant access to imaging for secondary and opportunistic research use. This manuscript presents a detailed description of ImageVU’s key components and lessons learned to assist other institutions in developing similar research imaging services and infrastructure.

Methods:

ImageVU was designed to support the secondary use of radiological imaging data through a dedicated research imaging store. The system comprises four interconnected components: a Research PACS, an Ad Hoc Backfill Host, Cloud Storage System, and a De-Identification System. Imaging metadata are extracted and stored in the Research Derivative (RD), an identified clinical data repository, and the Synthetic Derivative (SD), a de-identified research data repository, with access facilitated through the RD Discover web portal. Researchers interact with the system via structured metadata queries and multiple data delivery options, including web-based viewing, bulk downloads, and dataset preparation for high-performance computing environments.

Results:

The integration of metadata-driven search capabilities has streamlined cohort discovery and improved imaging data accessibility. As of December 2024, ImageVU has processed 12.9 million MRI and CT series from 1.36 million studies across 453,403 patients. The system has supported 75 project requests, delivering over 50 TB of imaging data to 55 investigators, leading to 66 published research papers.

Conclusion:

ImageVU demonstrates a scalable and efficient approach for integrating clinical imaging into research workflows. By combining institutional data infrastructure with cloud-based storage and metadata-driven cohort identification, the platform enables secure and compliant access to imaging for translational research.
目的:开发ImageVU,一个可扩展的研究成像基础设施,将临床成像数据与元数据驱动的队列发现集成在一起,为二次和机会性研究提供安全、高效、符合法规的成像访问。本文详细介绍了ImageVU的关键组件和经验教训,以帮助其他机构开发类似的研究成像服务和基础设施。方法:ImageVU旨在通过专门的研究成像存储来支持放射成像数据的二次使用。该系统由四个相互连接的组件组成:研究PACS, Ad Hoc回填主机,云存储系统和去识别系统。影像元数据被提取并存储在研究衍生品(RD)和合成衍生品(SD)中,RD衍生品是一个已识别的临床数据存储库,合成衍生品(SD)是一个去识别的研究数据存储库,可以通过RD发现门户网站方便地访问。研究人员通过结构化元数据查询和多种数据交付选项与系统交互,包括基于web的查看、批量下载和高性能计算环境的数据集准备。结果:元数据驱动搜索功能的集成简化了队列发现并改善了成像数据的可访问性。截至2024年12月,ImageVU已经处理了1290万份MRI和CT系列,来自136万份研究,涉及453,403名患者。该系统已经支持了75个项目请求,为55名研究人员提供了超过50tb的成像数据,并发表了66篇研究论文。结论:ImageVU展示了一种可扩展和有效的方法,将临床成像整合到研究工作流程中。通过将机构数据基础设施与基于云的存储和元数据驱动的队列识别相结合,该平台可以安全、合规地访问转化研究的成像。
{"title":"Secondary use of radiological imaging data: Vanderbilt’s ImageVU approach","authors":"David S. Smith ,&nbsp;Karthik Ramadass ,&nbsp;Laura Jones ,&nbsp;Jennifer Morse ,&nbsp;Daniel Fabbri ,&nbsp;Joseph R. Coco ,&nbsp;Shunxing Bao ,&nbsp;Melissa Basford ,&nbsp;Peter J. Embi ,&nbsp;Reed A. Omary ,&nbsp;John C. Gore ,&nbsp;Jill M. Pulley ,&nbsp;Bennett A. Landman","doi":"10.1016/j.jbi.2025.104905","DOIUrl":"10.1016/j.jbi.2025.104905","url":null,"abstract":"<div><h3>Objective:</h3><div>To develop ImageVU, a scalable research imaging infrastructure that integrates clinical imaging data with metadata-driven cohort discovery, enabling secure, efficient, and regulatory-compliant access to imaging for secondary and opportunistic research use. This manuscript presents a detailed description of ImageVU’s key components and lessons learned to assist other institutions in developing similar research imaging services and infrastructure.</div></div><div><h3>Methods:</h3><div>ImageVU was designed to support the secondary use of radiological imaging data through a dedicated research imaging store. The system comprises four interconnected components: a Research PACS, an Ad Hoc Backfill Host, Cloud Storage System, and a De-Identification System. Imaging metadata are extracted and stored in the Research Derivative (RD), an identified clinical data repository, and the Synthetic Derivative (SD), a de-identified research data repository, with access facilitated through the RD Discover web portal. Researchers interact with the system via structured metadata queries and multiple data delivery options, including web-based viewing, bulk downloads, and dataset preparation for high-performance computing environments.</div></div><div><h3>Results:</h3><div>The integration of metadata-driven search capabilities has streamlined cohort discovery and improved imaging data accessibility. As of December 2024, ImageVU has processed 12.9 million MRI and CT series from 1.36 million studies across 453,403 patients. The system has supported 75 project requests, delivering over 50 TB of imaging data to 55 investigators, leading to 66 published research papers.</div></div><div><h3>Conclusion:</h3><div>ImageVU demonstrates a scalable and efficient approach for integrating clinical imaging into research workflows. By combining institutional data infrastructure with cloud-based storage and metadata-driven cohort identification, the platform enables secure and compliant access to imaging for translational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104905"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145045656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1