首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Pre-coding skin cancer from free-text pathology reports using noise-robust neural networks 使用噪声鲁棒神经网络从自由文本病理报告中预编码皮肤癌
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-01 Epub Date: 2025-10-24 DOI: 10.1016/j.jbi.2025.104943
Tapio Niemi, Gautier Defossez, Simon Germann, Jean-Luc Bulliard

Objective

Population-based cancer registries receive numerous free-text pathology reports from which cancer cases are manually coded according to international standards. Skin cancer is the most frequent cancer in Caucasian populations, and its incidence is increasing. We developed an AI-based method to identify skin cancer, locate relevant key terms in pathological reports, and suggest coding for the main clinical variables.

Methods

We explored multiple neural network architectures and found out that convolutional neural networks with customised noise-robust loss functions offer the best performance for identifying cancer types and pre-coding subsite, morphology, behaviour, grade, laterality, and first line of treatment of skin cancer cases. Previously registered cases were used as training data. We additionally applied an attention mechanism to extract and highlight reports’ key diagnostic terms. These highlights facilitate human review of pre-coding results. We evaluated performance of the method by using manually coded cases in a separate test set.

Results

The accuracies of detecting skin cancer types were 0.98–0.99, and F1 scores 0.93–0.96. Pre-coding accuracy and weighted F1 score were: ICD-O subsite (4 digits): 0.89–0.91 and 0.89–0.91, morphology (4 digits): 0.61–0.90 and 0.63–0.89, morphology (3-digits): 0.86–0.98 and 0.89–0.98, tumour behaviour: 0.96–0.98 and 0.96–0.98, laterality: 0.99 and 0.98–0.99. Also, accuracy (0.96) and weighted F1 score (0.96) for the grade were estimated for squamous cell carcinoma (SCC) of the skin, and treatments for SCC and melanoma (accuracies 0.84 and 0.87, weighted F1 scores and 0.82 and 0.87). The extracted key words matched ICD-O code descriptions with high precision.

Conclusion

We piloted our method in the Vaud Cancer Registry, Switzerland. It was able to identify and pre-code skin cancer cases efficiently and find correct key terms in reports. Medical coders found pre-coding useful and time saving. Integration of the method in the registry document workflow and its extension to other cancer types are intended.
目的基于人群的癌症登记处收到大量的自由文本病理报告,其中癌症病例根据国际标准手动编码。皮肤癌是高加索人群中最常见的癌症,其发病率呈上升趋势。我们开发了一种基于人工智能的方法来识别皮肤癌,定位病理报告中的相关关键术语,并建议对主要临床变量进行编码。方法研究了多种神经网络结构,发现具有自定义噪声鲁棒损失函数的卷积神经网络在识别皮肤癌类型和预编码亚位点、形态、行为、分级、侧边性和一线治疗方面表现最佳。以前登记的病例被用作训练数据。我们还应用了一个注意机制来提取和突出报告的关键诊断术语。这些亮点有助于人工审查预编码结果。我们通过在单独的测试集中使用手动编码的案例来评估该方法的性能。结果该方法检测皮肤癌类型的准确率为0.98 ~ 0.99,F1评分为0.93 ~ 0.96。ICD-O亚位点(4位):0.89-0.91和0.89-0.91,形态学(4位):0.61-0.90和0.63-0.89,形态学(3位):0.86-0.98和0.89-0.98,肿瘤行为:0.96-0.98和0.96-0.98,侧边性:0.99和0.98-0.99。此外,估计皮肤鳞状细胞癌(SCC)分级的准确性(0.96)和加权F1评分(0.96),以及SCC和黑色素瘤的治疗(准确性0.84和0.87,加权F1评分和0.82和0.87)。提取的关键词与ICD-O代码描述匹配精度高。结论:我们在瑞士沃州癌症登记处试用了我们的方法。它能够有效地识别和预编码皮肤癌病例,并在报告中找到正确的关键术语。医疗编码人员发现预编码很有用,而且节省了时间。计划将该方法集成到注册表文档工作流中,并将其扩展到其他癌症类型。
{"title":"Pre-coding skin cancer from free-text pathology reports using noise-robust neural networks","authors":"Tapio Niemi,&nbsp;Gautier Defossez,&nbsp;Simon Germann,&nbsp;Jean-Luc Bulliard","doi":"10.1016/j.jbi.2025.104943","DOIUrl":"10.1016/j.jbi.2025.104943","url":null,"abstract":"<div><h3>Objective</h3><div>Population-based cancer registries receive numerous free-text pathology reports from which cancer cases are manually coded according to international standards. Skin cancer is the most frequent cancer in Caucasian populations, and its incidence is increasing. We developed an AI-based method to identify skin cancer, locate relevant key terms in pathological reports, and suggest coding for the main clinical variables.</div></div><div><h3>Methods</h3><div>We explored multiple neural network architectures and found out that convolutional neural networks with customised noise-robust loss functions offer the best performance for identifying cancer types and pre-coding subsite, morphology, behaviour, grade, laterality, and first line of treatment of skin cancer cases. Previously registered cases were used as training data. We additionally applied an attention mechanism to extract and highlight reports’ key diagnostic terms. These highlights facilitate human review of pre-coding results. We evaluated performance of the method by using manually coded cases in a separate test set.</div></div><div><h3>Results</h3><div>The accuracies of detecting skin cancer types were 0.98–0.99, and F1 scores 0.93–0.96. Pre-coding accuracy and weighted F1 score were: ICD-O subsite (4 digits): 0.89–0.91 and 0.89–0.91, morphology (4 digits): 0.61–0.90 and 0.63–0.89, morphology (3-digits): 0.86–0.98 and 0.89–0.98, tumour behaviour: 0.96–0.98 and 0.96–0.98, laterality: 0.99 and 0.98–0.99. Also, accuracy (0.96) and weighted F1 score (0.96) for the grade were estimated for squamous cell carcinoma (SCC) of the skin, and treatments for SCC and melanoma (accuracies 0.84 and 0.87, weighted F1 scores and 0.82 and 0.87). The extracted key words matched ICD-O code descriptions with high precision.</div></div><div><h3>Conclusion</h3><div>We piloted our method in the Vaud Cancer Registry, Switzerland. It was able to identify and pre-code skin cancer cases efficiently and find correct key terms in reports. Medical coders found pre-coding useful and time saving. Integration of the method in the registry document workflow and its extension to other cancer types are intended.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104943"},"PeriodicalIF":4.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriMedPrompt: A unified prompting framework for realistic and layout-conformant clinical progress note synthesis TriMedPrompt:一个统一的提示框架,用于现实和符合布局的临床进展记录合成。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-01 Epub Date: 2025-10-17 DOI: 10.1016/j.jbi.2025.104927
Garapati Keerthana, Manik Gupta
Clinical progress notes are critical artifacts for modeling patient trajectories, auditing clinical decision-making, and powering downstream applications in clinical natural language processing (NLP). However, public resources such as MIMIC-III provide limited progress notes, constraining the development of robust and generalizable machine learning models. This work proposes a novel hybrid prompting framework — TriMedPrompt — to generate high-quality, structurally and semantically coherent synthetic progress notes using large language models (LLMs). Our approach conditions the LLMs on a triad of complementary biomedical signals: (1) real-world progress notes from MIMIC-III, (2) clinically aligned case reports from the PMC Patients dataset, selected via embedding-based retrieval, and (3) structured disease-centric knowledge from PrimeKG. We design a multi-source, layout-aware prompting pipeline that dynamically integrates structured and unstructured information to produce notes across standard clinical formats (e.g., SOAP, BIRP, PIE, DAP).
Through rigorous evaluations—including layout adherence, entity extraction comparisons, semantic similarity analysis, and controlled ablations, we demonstrate that our generated notes achieve a 98.6% semantic entity alignment score with real clinical notes, while maintaining high structural fidelity. Ablation studies further confirm the critical role of combining structured biomedical knowledge and unstructured narrative data in improving note quality. In addition, we illustrate the potential of our synthetic notes in privacy-preserving clinical NLP, offering a safe alternative for model development and benchmarking in sensitive healthcare settings. This work establishes a scalable, controllable paradigm for clinical text synthesis, significantly expanding access to realistic, diverse progress notes and laying the foundation for advancing trustworthy clinical NLP research.
临床进展记录是为患者轨迹建模、审计临床决策以及为临床自然语言处理(NLP)的下游应用提供动力的关键人工制品。然而,像MIMIC-III这样的公共资源提供了有限的进展记录,限制了健壮和可推广的机器学习模型的发展。这项工作提出了一个新的混合提示框架- TriMedPrompt -使用大型语言模型(llm)生成高质量,结构和语义连贯的合成进度记录。我们的方法以三个互补的生物医学信号为llm条件:(1)来自MIMIC-III的现实世界进展记录,(2)通过基于嵌入的检索选择的PMC患者数据集的临床一致病例报告,以及(3)来自PrimeKG的结构化疾病中心知识。我们设计了一个多源、布局感知的提示管道,动态集成结构化和非结构化信息,以生成跨标准临床格式(例如SOAP、BIRP、PIE、DAP)的笔记。通过严格的评估,包括布局一致性、实体提取比较、语义相似性分析和控制消融,我们证明了我们生成的笔记与真实临床笔记的语义实体一致性得分达到98.6%,同时保持了较高的结构保真度。消融研究进一步证实了结构化生物医学知识与非结构化叙事数据相结合在提高病历质量中的关键作用。此外,我们还说明了我们的合成笔记在保护隐私的临床NLP中的潜力,为敏感医疗保健环境中的模型开发和基准测试提供了安全的替代方案。这项工作为临床文本合成建立了一个可扩展的、可控的范例,大大扩展了对现实的、多样化的进展记录的访问,并为推进值得信赖的临床NLP研究奠定了基础。
{"title":"TriMedPrompt: A unified prompting framework for realistic and layout-conformant clinical progress note synthesis","authors":"Garapati Keerthana,&nbsp;Manik Gupta","doi":"10.1016/j.jbi.2025.104927","DOIUrl":"10.1016/j.jbi.2025.104927","url":null,"abstract":"<div><div>Clinical progress notes are critical artifacts for modeling patient trajectories, auditing clinical decision-making, and powering downstream applications in clinical natural language processing (NLP). However, public resources such as MIMIC-III provide limited progress notes, constraining the development of robust and generalizable machine learning models. This work proposes a novel hybrid prompting framework — TriMedPrompt — to generate high-quality, structurally and semantically coherent synthetic progress notes using large language models (LLMs). Our approach conditions the LLMs on a triad of complementary biomedical signals: (1) real-world progress notes from MIMIC-III, (2) clinically aligned case reports from the PMC Patients dataset, selected via embedding-based retrieval, and (3) structured disease-centric knowledge from PrimeKG. We design a multi-source, layout-aware prompting pipeline that dynamically integrates structured and unstructured information to produce notes across standard clinical formats (e.g., SOAP, BIRP, PIE, DAP).</div><div>Through rigorous evaluations—including layout adherence, entity extraction comparisons, semantic similarity analysis, and controlled ablations, we demonstrate that our generated notes achieve a 98.6% semantic entity alignment score with real clinical notes, while maintaining high structural fidelity. Ablation studies further confirm the critical role of combining structured biomedical knowledge and unstructured narrative data in improving note quality. In addition, we illustrate the potential of our synthetic notes in privacy-preserving clinical NLP, offering a safe alternative for model development and benchmarking in sensitive healthcare settings. This work establishes a scalable, controllable paradigm for clinical text synthesis, significantly expanding access to realistic, diverse progress notes and laying the foundation for advancing trustworthy clinical NLP research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104927"},"PeriodicalIF":4.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech 困惑和接近:大型语言模型困惑补充了语义距离度量来检测不连贯的语音
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-21 DOI: 10.1016/j.jbi.2025.104899
Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen

Objective

Semantic coherence in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.

Method

We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.

Results

The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.

Conclusion

We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.
语言中的语义连贯的特点是思想的逻辑连贯。言语缺乏连贯性可能反映了思维混乱,这是精神分裂症谱系障碍(SSDs)精神病的核心特征。开发有助于自动评估语言语义一致性的工具,可以促进早期发现固态硬盘并改进对症状的监测,从而能够更及时地进行干预。大型语言模型(llm)在许多以语言为中心的任务中表现出了强大的能力,并且由于其固有的语言困惑度量和不连贯叙事经常发生的惊人转变之间的自然契合,在分析语义一致性方面显示出了希望。本研究旨在利用基于llm的困惑度度量开发一种新的语义一致性表示和相关度量,并将该度量与传统的基于向量距离的一致性度量进行比较。方法评价基于LLM困惑度的“袋”和“链”模型作为语义连贯的度量。回归模型使用基于困惑度和接近度特征的单一和配对组合进行训练,以使用标准化工具预测人类对语义连贯的评级。研究人员对表现进行了评估,这些评估来自于一组有精神病症状的人的培训演讲,以及一组被诊断为固态硬盘的患者的临床访谈测试,两者都带有人类对无序思维严重程度的评估标签。结果使用混淆度和接近度特征的组合获得了最佳性能,在训练集的留一交叉验证中,与人类评分的Spearman相关性为0.61(单独使用接近度特征时为0.56),在测试集上为0.54(单独使用接近度特征时为0.52)。结论我们开发了利用LLM困惑度评估语义一致性的新方法,并发现它们是基于接近度的方法的补充。综合起来,这些方法在两个数据集上表现出更高的性能,突出了LLM在增强ssd自动诊断和监控方面的潜力。
{"title":"Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech","authors":"Weizhe Xu ,&nbsp;Serguei Pakhomov ,&nbsp;Patrick Heagerty ,&nbsp;Eric Horvitz ,&nbsp;Ellen R. Bradley ,&nbsp;Josh Woolley ,&nbsp;Andrew Campbell ,&nbsp;Alex Cohen ,&nbsp;Dror Ben-Zeev ,&nbsp;Trevor Cohen","doi":"10.1016/j.jbi.2025.104899","DOIUrl":"10.1016/j.jbi.2025.104899","url":null,"abstract":"<div><h3>Objective</h3><div><em>Semantic coherence</em> in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.</div></div><div><h3>Method</h3><div>We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.</div></div><div><h3>Results</h3><div>The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.</div></div><div><h3>Conclusion</h3><div>We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104899"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning approach for automating review of a RxNorm medication mapping pipeline output 用于自动审查RxNorm药物映射管道输出的机器学习方法
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-11 DOI: 10.1016/j.jbi.2025.104909
Matthias Hüser , John Doole , Vinicius Pinho , Hossein Rouhizadeh , Douglas Teodoro , Ahson Saiyed , Matvey B. Palchuk

Objective:

Medication mapping to standardized terminologies is an important prerequisite for performing analytics on a federated EHR network. TriNetX LLC operates the largest such network in the world.

Methods:

Here we report on a novel pipeline, called RxEmbed, for the mapping and binding of local medication descriptions to RxNorm ingredient codes, using LLMs, and automated mapping review using machine learning.

Results:

Performance of RxEmbed was assessed in a public data set from France as well as 6 Healthcare Organizations from the TriNetX federated EHR network across the United States and Brazil. On the public data set, RxEmbed outperformed two recently reported LLM-based baselines in terms of recall, and precision of generated mappings. In TriNetX network data, RxEmbed obtained RxNorm mapping recalls of 84%–93%, at a precision of 99.5%–100%.

Conclusion:

We built and evaluated a LLM-based medication mapping pipeline, that binds local medication descriptions from EHR systems to RxNorm ingredient codes. The high precision of the pipeline output implies very limited need for human review of the generated mappings.
目的:药物映射到标准化术语是在联邦电子病历网络上执行分析的重要先决条件。TriNetX LLC运营着世界上最大的此类网络。方法:在这里,我们报告了一种新的管道,称为RxEmbed,用于将本地药物描述映射和绑定到RxNorm成分代码,使用llm,并使用机器学习进行自动映射审查。结果:RxEmbed的性能在来自法国的公共数据集以及来自TriNetX联合EHR网络横跨美国和巴西的6个医疗保健组织中进行了评估。在公共数据集上,RxEmbed在召回率和生成映射的精度方面优于最近报道的两个基于llm的基线。在TriNetX网络数据中,RxEmbed获得的RxNorm映射召回率为84%-93%,精度为99.5%-100%。结论:我们建立并评估了一个基于llm的药物映射管道,该管道将EHR系统中的本地药物描述绑定到RxNorm成分代码。管道输出的高精度意味着极少需要人工检查生成的映射。
{"title":"A machine learning approach for automating review of a RxNorm medication mapping pipeline output","authors":"Matthias Hüser ,&nbsp;John Doole ,&nbsp;Vinicius Pinho ,&nbsp;Hossein Rouhizadeh ,&nbsp;Douglas Teodoro ,&nbsp;Ahson Saiyed ,&nbsp;Matvey B. Palchuk","doi":"10.1016/j.jbi.2025.104909","DOIUrl":"10.1016/j.jbi.2025.104909","url":null,"abstract":"<div><h3>Objective:</h3><div>Medication mapping to standardized terminologies is an important prerequisite for performing analytics on a federated EHR network. TriNetX LLC operates the largest such network in the world.</div></div><div><h3>Methods:</h3><div>Here we report on a novel pipeline, called <span>RxEmbed</span>, for the mapping and binding of local medication descriptions to RxNorm ingredient codes, using LLMs, and automated mapping review using machine learning.</div></div><div><h3>Results:</h3><div>Performance of <span>RxEmbed</span> was assessed in a public data set from France as well as 6 Healthcare Organizations from the TriNetX federated EHR network across the United States and Brazil. On the public data set, <span>RxEmbed</span> outperformed two recently reported LLM-based baselines in terms of recall, and precision of generated mappings. In TriNetX network data, <span>RxEmbed</span> obtained RxNorm mapping recalls of 84%–93%, at a precision of 99.5%–100%.</div></div><div><h3>Conclusion:</h3><div>We built and evaluated a LLM-based medication mapping pipeline, that binds local medication descriptions from EHR systems to RxNorm ingredient codes. The high precision of the pipeline output implies very limited need for human review of the generated mappings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104909"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145045655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care WoundcareVQA:伤口护理的多语言视觉问答基准数据集。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-29 DOI: 10.1016/j.jbi.2025.104888
Wen-wai Yim , Asma Ben Abacha , Robert Doerning , Chia-Yu Chen , Jiaying Xu , Anita Subbarao , Zixuan Yu , Fei Xia , M. Kennedy Hall , Meliha Yetisgen

Objective:

Introduce the task of wound care multimodal multilingual visual question answering, provide baseline performances, and identify areas of future study.

Methods:

A dataset of wound care multimodal multilingual visual question answering (VQA) was created using consumer health questions asked online. Practicing US medical doctors were tasked with providing metadata and expert responses labels. Several instruct-enabled, multilingual visual question answering models (GPT-4o, Gemini-1.5-Pro, and Qwen-VL) were tested to benchmark performances. Finally, automatic evaluations were tested against domain expert response ratings.

Results:

A multilingual dataset of 477 wound care cases, 768 responses, 748 images, 3k structured data labels, 1362 translation instances, and 10k judgments was constructed (https://osf.io/xsj5u/). Metadata scores ranged from 0.32–0.78 accuracy depending on classification type; response generation performances 0.06 BLEU, 0.66 BERTScore, 0.45 ROUGE-L in English and 0.12 BLEU, 0.69 BERTScore, and 0.50 ROUGE-L in Chinese.

Conclusion:

We construct and explore the tasks of multimodal, multilingual VQA. We hope the work here can inspire further research in wound care metadata classification, VQA response generation, and open response automatic evaluation.
目的:介绍创伤护理多模态多语言视觉问答的任务,提供基线表现,并确定未来的研究领域。方法:利用消费者在线健康问题,建立伤口护理多模式多语言视觉问答(VQA)数据集。美国执业医生的任务是提供元数据和专家回答标签。测试了几种支持指令的多语言视觉问答模型(gpt - 40、Gemini-1.5-Pro和Qwen-VL)的基准性能。最后,根据领域专家的反应等级对自动评估进行了测试。结果:构建了一个包含477个伤口护理案例、768个回复、748张图像、3k个结构化数据标签、1362个翻译实例和10k个判断的多语言数据集(https://osf.io/xsj5u/)。根据分类类型,元数据得分的准确率范围为0.32-0.78;反应生成性能:英语为0.06 BLEU, 0.66 BERTScore, 0.45 ROUGE-L;汉语为0.12 BLEU, 0.69 BERTScore, 0.50 ROUGE-L。结论:我们构建并探索了多模态、多语言的VQA任务。我们希望本文的工作能够对伤口护理元数据分类、VQA反应生成和开放反应自动评估等方面的进一步研究提供启发。
{"title":"WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care","authors":"Wen-wai Yim ,&nbsp;Asma Ben Abacha ,&nbsp;Robert Doerning ,&nbsp;Chia-Yu Chen ,&nbsp;Jiaying Xu ,&nbsp;Anita Subbarao ,&nbsp;Zixuan Yu ,&nbsp;Fei Xia ,&nbsp;M. Kennedy Hall ,&nbsp;Meliha Yetisgen","doi":"10.1016/j.jbi.2025.104888","DOIUrl":"10.1016/j.jbi.2025.104888","url":null,"abstract":"<div><h3>Objective:</h3><div>Introduce the task of wound care multimodal multilingual visual question answering, provide baseline performances, and identify areas of future study.</div></div><div><h3>Methods:</h3><div>A dataset of wound care multimodal multilingual visual question answering (VQA) was created using consumer health questions asked online. Practicing US medical doctors were tasked with providing metadata and expert responses labels. Several instruct-enabled, multilingual visual question answering models (GPT-4o, Gemini-1.5-Pro, and Qwen-VL) were tested to benchmark performances. Finally, automatic evaluations were tested against domain expert response ratings.</div></div><div><h3>Results:</h3><div>A multilingual dataset of 477 wound care cases, 768 responses, 748 images, 3k structured data labels, 1362 translation instances, and 10k judgments was constructed (<span><span>https://osf.io/xsj5u/</span><svg><path></path></svg></span>). Metadata scores ranged from 0.32–0.78 accuracy depending on classification type; response generation performances 0.06 BLEU, 0.66 BERTScore, 0.45 ROUGE-L in English and 0.12 BLEU, 0.69 BERTScore, and 0.50 ROUGE-L in Chinese.</div></div><div><h3>Conclusion:</h3><div>We construct and explore the tasks of multimodal, multilingual VQA. We hope the work here can inspire further research in wound care metadata classification, VQA response generation, and open response automatic evaluation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104888"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPCM-RRG: Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation MPCM-RRG:放射学报告生成的多模式快速协作机制。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-17 DOI: 10.1016/j.jbi.2025.104912
Yumian Yu , Guoheng Huang , Zhe Tan , Jiahui Shi , Ming Li , Chi-Man Pun , Fuchen Zheng , Shiqiang Ma , Shuqiang Wang , Long He
The task of medical report generation involves automatically creating descriptive text reports from medical images, with the aim of alleviating the workload of physicians and enhancing diagnostic efficiency. However, although many existing medical report generation models based on the Transformer framework consider structural information in medical images, they ignore the interference of confounding factors on these structures, which limits the model’s ability to effectively capture rich and critical lesion information. Furthermore, these models often struggle to address the significant imbalance between normal and abnormal content in actual reports, leading to challenges in accurately describing abnormalities. To address these limitations, we propose the Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation Model (MPCM-RRG). This model consists of three key components: the Visual Causal Prompting Module (VCP), the Textual Prompt-Guided Feature Enhancement Module (TPGF), and the Visual–Textual Semantic Consistency Module (VTSC). The VCP module uses chest X-ray masks as visual prompts and incorporates causal inference principles to help the model minimize the influence of irrelevant regions. Through causal intervention, the model can learn the causal relationships between the pathological regions in the image and the corresponding findings described in the report. The TPGF module tackles the imbalance between abnormal and normal text by integrating detailed textual prompts, which also guide the model to focus on lesion areas using a multi-head attention mechanism. The VTSC module promotes alignment between the visual and textual representations through contrastive consistency loss, fostering greater interaction and collaboration between the visual and textual prompts. Experimental results demonstrate that MPCM-RRG outperforms other methods on the IU X-ray and MIMIC-CXR datasets, highlighting its effectiveness in generating high-quality medical reports.
医学报告生成任务涉及从医学图像中自动创建描述性文本报告,目的是减轻医生的工作量,提高诊断效率。然而,尽管现有的许多基于Transformer框架的医学报告生成模型考虑了医学图像中的结构信息,但它们忽略了混杂因素对这些结构的干扰,这限制了模型有效捕获丰富和关键病变信息的能力。此外,这些模型往往难以解决实际报告中正常和异常内容之间的显著不平衡,从而导致准确描述异常的挑战。为了解决这些限制,我们提出了放射学报告生成模型的多模式快速协作机制(MPCM-RRG)。该模型由三个关键部分组成:视觉因果提示模块(VCP)、文本提示引导特征增强模块(TPGF)和视觉文本语义一致性模块(VTSC)。VCP模块使用胸部x射线面罩作为视觉提示,并结合因果推理原则,以帮助模型最小化不相关区域的影响。通过因果干预,模型可以学习到图像中病理区域与报告中描述的相应结果之间的因果关系。TPGF模块通过整合详细的文本提示来解决异常和正常文本之间的不平衡问题,这些提示还使用多头注意机制引导模型关注病变区域。VTSC模块通过对比一致性损失促进了视觉和文本表示之间的一致性,促进了视觉和文本提示之间的更大互动和协作。实验结果表明,MPCM-RRG在IU x射线和MIMIC-CXR数据集上优于其他方法,突出了其在生成高质量医疗报告方面的有效性。
{"title":"MPCM-RRG: Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation","authors":"Yumian Yu ,&nbsp;Guoheng Huang ,&nbsp;Zhe Tan ,&nbsp;Jiahui Shi ,&nbsp;Ming Li ,&nbsp;Chi-Man Pun ,&nbsp;Fuchen Zheng ,&nbsp;Shiqiang Ma ,&nbsp;Shuqiang Wang ,&nbsp;Long He","doi":"10.1016/j.jbi.2025.104912","DOIUrl":"10.1016/j.jbi.2025.104912","url":null,"abstract":"<div><div>The task of medical report generation involves automatically creating descriptive text reports from medical images, with the aim of alleviating the workload of physicians and enhancing diagnostic efficiency. However, although many existing medical report generation models based on the Transformer framework consider structural information in medical images, they ignore the interference of confounding factors on these structures, which limits the model’s ability to effectively capture rich and critical lesion information. Furthermore, these models often struggle to address the significant imbalance between normal and abnormal content in actual reports, leading to challenges in accurately describing abnormalities. To address these limitations, we propose the Multi-modal Prompt Collaboration Mechanism for Radiology Report Generation Model (MPCM-RRG). This model consists of three key components: the Visual Causal Prompting Module (VCP), the Textual Prompt-Guided Feature Enhancement Module (TPGF), and the Visual–Textual Semantic Consistency Module (VTSC). The VCP module uses chest X-ray masks as visual prompts and incorporates causal inference principles to help the model minimize the influence of irrelevant regions. Through causal intervention, the model can learn the causal relationships between the pathological regions in the image and the corresponding findings described in the report. The TPGF module tackles the imbalance between abnormal and normal text by integrating detailed textual prompts, which also guide the model to focus on lesion areas using a multi-head attention mechanism. The VTSC module promotes alignment between the visual and textual representations through contrastive consistency loss, fostering greater interaction and collaboration between the visual and textual prompts. Experimental results demonstrate that MPCM-RRG outperforms other methods on the IU X-ray and MIMIC-CXR datasets, highlighting its effectiveness in generating high-quality medical reports.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104912"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiFDR: Brain-Inspired Federated Diffusion Transformer with Reinforcement for privacy-preserving molecular generation 基于隐私保护分子生成的脑启发联合扩散变压器
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-09-13 DOI: 10.1016/j.jbi.2025.104910
Hongming Hou , Jing Zhang , Meirun Zhang , Xiucai Ye

Objective:

Generative drug discovery is hampered by challenges in data privacy and the immense computational cost of SOTA models. To surmount these barriers, we developed Brain-Inspired Federated Diffusion with Reinforcement (BiFDR), a privacy-preserving and resource-efficient framework.

Methods:

BiFDR integrates three synergistic modules. A Neuro-inspired Federated Coordinator (NeuroFed) orchestrates secure collaboration via synaptic plasticity-inspired principles, combining server-side pruning with client-side Low-Rank Adaptation (LoRA) and sparse asynchronous updates. A Transformer-based diffusion generator (TransFuse) efficiently creates chemically valid molecules in a compressed latent space using attention mechanisms. Finally, a reinforcement learning agent (T-JORM) steers the generative process towards novel 2D and 3D molecular structures, guided by a multi-faceted, Tanimoto-based reward function.

Results:

Benchmarked against baseline models, BiFDR improving the Quantitative Estimate of Drug-likeness by 13.7%, the Molecular-level Structural Information Score by 5.7%, and the Molecular Interaction Analysis Index by 52.3%. The framework also enhanced synthetic feasibility, reflected by a 9.5% reduction in the Synthetic Accessibility Score. Critically, BiFDR substantially strengthened data privacy, achieving a 43.6% reduction in the mutual information metric.

Conclusion:

BiFDR establishes an effective and efficient paradigm for generative drug discovery. It consistently produces molecules with superior drug-likeness, structural novelty, and interaction potential. By ensuring synthetic accessibility while rigorously preserving privacy and minimizing computational overhead, BiFDR presents a viable and scalable solution for modern, collaborative drug development pipelines.
目的:生成式药物发现受到数据隐私挑战和SOTA模型巨大计算成本的阻碍。为了克服这些障碍,我们开发了一种保护隐私和资源高效的框架——脑启发联合扩散强化(BiFDR)。方法:BiFDR集成三个协同模块。受神经启发的联邦协调器(NeuroFed)通过受突触可塑性启发的原则编排安全协作,将服务器端修剪与客户端低秩适应(LoRA)和稀疏异步更新相结合。基于变压器的扩散发生器(TransFuse)利用注意力机制在压缩的潜在空间中有效地产生化学有效分子。最后,一个强化学习代理(T-JORM)将生成过程转向新的2D和3D分子结构,由一个多方面的、基于谷本的奖励函数指导。结果:以基线模型为基准,BiFDR将药物相似性定量估计提高了13.7%,分子水平结构信息评分提高了5.7%,分子相互作用分析指数提高了52.3%。该框架还增强了综合可行性,综合可达性得分降低了9.5%。关键的是,BiFDR大大加强了数据隐私,实现了互信息度量减少43.6%。结论:BiFDR为生成性药物发现建立了一个有效和高效的范式。它始终如一地产生具有优异的药物相似性、结构新颖性和相互作用潜力的分子。通过确保合成可及性,同时严格保护隐私并最大限度地减少计算开销,BiFDR为现代协作药物开发管道提供了可行且可扩展的解决方案。
{"title":"BiFDR: Brain-Inspired Federated Diffusion Transformer with Reinforcement for privacy-preserving molecular generation","authors":"Hongming Hou ,&nbsp;Jing Zhang ,&nbsp;Meirun Zhang ,&nbsp;Xiucai Ye","doi":"10.1016/j.jbi.2025.104910","DOIUrl":"10.1016/j.jbi.2025.104910","url":null,"abstract":"<div><h3>Objective:</h3><div>Generative drug discovery is hampered by challenges in data privacy and the immense computational cost of SOTA models. To surmount these barriers, we developed Brain-Inspired Federated Diffusion with Reinforcement (BiFDR), a privacy-preserving and resource-efficient framework.</div></div><div><h3>Methods:</h3><div>BiFDR integrates three synergistic modules. A Neuro-inspired Federated Coordinator (NeuroFed) orchestrates secure collaboration via synaptic plasticity-inspired principles, combining server-side pruning with client-side Low-Rank Adaptation (LoRA) and sparse asynchronous updates. A Transformer-based diffusion generator (TransFuse) efficiently creates chemically valid molecules in a compressed latent space using attention mechanisms. Finally, a reinforcement learning agent (T-JORM) steers the generative process towards novel 2D and 3D molecular structures, guided by a multi-faceted, Tanimoto-based reward function.</div></div><div><h3>Results:</h3><div>Benchmarked against baseline models, BiFDR improving the Quantitative Estimate of Drug-likeness by 13.7%, the Molecular-level Structural Information Score by 5.7%, and the Molecular Interaction Analysis Index by 52.3%. The framework also enhanced synthetic feasibility, reflected by a 9.5% reduction in the Synthetic Accessibility Score. Critically, BiFDR substantially strengthened data privacy, achieving a 43.6% reduction in the mutual information metric.</div></div><div><h3>Conclusion:</h3><div>BiFDR establishes an effective and efficient paradigm for generative drug discovery. It consistently produces molecules with superior drug-likeness, structural novelty, and interaction potential. By ensuring synthetic accessibility while rigorously preserving privacy and minimizing computational overhead, BiFDR presents a viable and scalable solution for modern, collaborative drug development pipelines.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104910"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedVidDeID: Protecting privacy in clinical encounter video recordings MedVidDeID:在临床遭遇视频记录中保护隐私。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-29 DOI: 10.1016/j.jbi.2025.104901
Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson

Objective:

The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.

Methods:

We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.

Results:

In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.

Conclusion:

The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.
目的:在医疗保健中越来越多地使用视听(AV)数据,改善了患者护理、临床培训以及医学和人种学研究。然而,由于此类数据中的受保护健康信息(PHI),它也在保护患者-提供者隐私方面带来了重大挑战。传统的去识别方法不适用于自动驾驶数据,因为自动驾驶数据可以显示人脸、声音和环境细节等可识别信息。我们的目标是创建一个去识别AV医疗保健数据的管道,以最大限度地减少保证成功去识别所需的人力。方法:我们将开源工具与新方法和基础设施结合起来,形成了一个六阶段的流水线:(1)使用WhisperX提取文本,(2)使用改编PHIlter进行文本去识别,(3)通过擦除进行音频去识别,(4)使用YOLOv11进行姿态检测和模糊处理进行视频去识别,(5)重新组合去识别的音频和视频,以及(6)通过人工质量控制(QC)进行验证和纠正。我们开发了两种去识别策略来支持有损视频图像的不同容忍度。我们使用10小时的模拟临床AV记录来评估这个管道,包括近110万视频帧和大约72,000个单词。结果:MedVidDeId在精准隐私保护(PPP)模式下的成功率为50%,在贪婪隐私保护(GPP)模式下的成功率为97.5%。与手动方法相比,对于15分钟的视频片段,管道在PPP模式下减少了26.7%的去识别时间,在GPP模式下减少了64.2%。结论:MedVidDeID管道为处理AV医疗数据和隐私保护提供了一种可行、高效的混合解决方案。未来的工作将侧重于减少每个阶段的上游错误,并最大限度地减少人在循环中的作用。
{"title":"MedVidDeID: Protecting privacy in clinical encounter video recordings","authors":"Sriharsha Mopidevi ,&nbsp;Kuk Jin Jang ,&nbsp;Basam Alasaly ,&nbsp;Sydney Pugh ,&nbsp;Jean Park ,&nbsp;Ashley Batugo ,&nbsp;Sy Hwang ,&nbsp;Eric Eaton ,&nbsp;Danielle Lee Mowery ,&nbsp;Kevin B. Johnson","doi":"10.1016/j.jbi.2025.104901","DOIUrl":"10.1016/j.jbi.2025.104901","url":null,"abstract":"<div><h3>Objective:</h3><div>The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.</div></div><div><h3>Methods:</h3><div>We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.</div></div><div><h3>Results:</h3><div>In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.</div></div><div><h3>Conclusion:</h3><div>The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104901"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review 用于健康预测的机器学习中检测和减轻数据集转移的策略:系统综述
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-26 DOI: 10.1016/j.jbi.2025.104902
Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho

Objective

This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.

Methods

A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.

Results

The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.

Conclusion

While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.
目的:本综述旨在对用于健康预测的机器学习(ML)应用中识别和纠正数据集移位的方法和技术的文献进行全面概述。方法系统检索PubMed、IEEE explore、Scopus和Web of Science,检索2019年1月1日至2025年3月15日之间发表的论文。搜索与机器学习、医疗保健和数据集转移相关的组合术语的字符串。共纳入了32项研究,并根据所处理的数据集移位类型、使用的检测和校正策略、算法选择以及对模型性能的影响进行了评估。结果该综述确定了广泛的数据集移位类型,其中时间移位和概念漂移是最常见的。基于模型的监测和统计测试是最常见的检测策略,而再训练和特征工程是主要的校正方法。大多数方法表现出适度的可解释性、计算可行性和通用性。然而,缺乏标准化的性能指标和外部验证限制了研究结果的可比性。虽然已经提出了几种有前途的方法来管理与健康相关的机器学习模型中的数据集转移,但没有一种方法可以在用例中广泛推广。这些技术在实际临床工作流程中的应用仍然有限。未来的研究应优先考虑前瞻性评估,亚组特定分析(例如,按种族,年龄或地理区域),并整合到临床决策支持系统中,以确保在医疗保健环境中稳健和公平的机器学习部署。提供了一个结构化的汇总表和概念管道图,以支持实际采用。
{"title":"Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review","authors":"Gabriel Ferreira dos Santos Silva ,&nbsp;Fabiano Novaes Barcellos Filho ,&nbsp;Roberta Moreira Wichmann ,&nbsp;Francisco Costa da Silva Junior ,&nbsp;Alexandre Dias Porto Chiavegatto Filho","doi":"10.1016/j.jbi.2025.104902","DOIUrl":"10.1016/j.jbi.2025.104902","url":null,"abstract":"<div><h3>Objective</h3><div>This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.</div></div><div><h3>Methods</h3><div>A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.</div></div><div><h3>Results</h3><div>The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.</div></div><div><h3>Conclusion</h3><div>While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104902"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering 通过纠错提示工程改进药物不良反应命名实体识别的大型语言模型
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 Epub Date: 2025-08-28 DOI: 10.1016/j.jbi.2025.104893
Yunfei Zhang, Wei Liao
The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.
药物不良反应(adr)的监测和分析对于确保患者安全和改善治疗效果非常重要。在命名实体识别(NER)过程中,准确识别药品名称、药物成分和ADR实体对于确保药品安全和推进药品信息整合至关重要。鉴于现有的医学名称实体识别技术依赖于大量人工标注的数据进行训练,由于数据的显著可变性和药品名称之间的高度相似性,它们在应用于药物不良反应时往往效果较差。本文提出了一个集成纠错实例的ADR提示模板。提示符模板包括:1。2、基本提示和任务描述。注释实体解释;注释指南;5.少射学习的标注样本;纠错示例。此外,它集成了来自网络的复杂ADR数据,并使用Begin, Inside, Outside (BIO)注释方法构建了一个包含三种类型实体(药物名称,药物成分和药物不良反应)的语料库。最后,我们评估了每个提示的有效性,并将其与经过微调的大型语言模型元AI (LLaMA)模型和DeepSeek模型进行了比较。实验结果表明,在该提示模板下,GPT-3.5的F1评分从0.648提高到0.887,GPT-4的F1评分从0.757提高到0.921。它明显优于经过微调的LLaMA模型和DeepSeek模型。验证了该方法的优越性,为药物相关实体关系的提取和知识图谱的构建提供了坚实的基础。
{"title":"Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering","authors":"Yunfei Zhang,&nbsp;Wei Liao","doi":"10.1016/j.jbi.2025.104893","DOIUrl":"10.1016/j.jbi.2025.104893","url":null,"abstract":"<div><div>The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104893"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1