首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
Interpretable machine learning model for predicting in-hospital mortality in elderly acute pancreatitis: Development and validation in a multicenter cohort 用于预测老年急性胰腺炎住院死亡率的可解释机器学习模型:多中心队列的开发和验证
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-18 DOI: 10.1016/j.ijmedinf.2026.106299
Hao He , Li Luo , Lei Bai , Lei Luo , Kunming Tian , Xiaoyun Fu , Bao Fu

Background

Elderly acute pancreatitis (AP) patients face significantly higher in-hospital all-cause mortality, highlighting the need for effective risk stratification to support timely clinical decision-making.

Methods

We conducted a multicenter retrospective study that enrolled 2,728 elderly AP patients, with which we developed and validated a robust machine learning (ML) model for predicting in-hospital all-cause mortality. We first selected predictors of mortality using LASSO regression and random forest–based Boruta algorithms. Then, seven ML models incorporating the selected predictors were trained and evaluated using the area under the receiver operating characteristic curve (AUC).

Results

XGBoost demonstrated the highest predictive performance, achieving an AUC of 0.884 (95% CI: 0.823–0.945) in the external validation test, outperforming the conventional Ranson score in predicting in-hospital mortality. Shapley additive explanations ranked vasoactive drug, hospital length of stay, leukocyte count, noninvasive ventilation, and invasive mechanical ventilation as five key predictors. An interactive web-based tool based on the optimal XGBoost model has been available at https://appredction.shinyapps.io/acutepancreatitis_xgb/ to generate real-time risk predictions.

Conclusions

This study proposed a validated and interpretable ML model to support in-hospital risk stratification for elderly patients with AP, thereby facilitating clinical decision-making and optimizing intensive care unit resource allocation.
背景:老年急性胰腺炎(AP)患者面临着明显更高的院内全因死亡率,强调了有效的风险分层以支持及时的临床决策的必要性。方法:我们进行了一项多中心回顾性研究,纳入了2728例老年AP患者,我们开发并验证了一个强大的机器学习(ML)模型,用于预测院内全因死亡率。我们首先使用LASSO回归和基于随机森林的Boruta算法选择死亡率预测因子。然后,使用受试者工作特征曲线(AUC)下的面积对包含所选预测因子的七个ML模型进行训练和评估。结果:XGBoost表现出最高的预测性能,在外部验证检验中达到0.884 (95% CI: 0.823-0.945)的AUC,在预测院内死亡率方面优于传统的Ranson评分。Shapley加性解释将血管活性药物、住院时间、白细胞计数、无创通气和有创机械通气列为五个关键预测因素。基于最佳XGBoost模型的交互式网络工具可在https://appredction.shinyapps.io/acutepancreatitis_xgb/上获得,以生成实时风险预测。结论:本研究提出了一个经过验证且可解释的ML模型,支持老年AP患者的院内风险分层,从而促进临床决策,优化重症监护病房资源配置。
{"title":"Interpretable machine learning model for predicting in-hospital mortality in elderly acute pancreatitis: Development and validation in a multicenter cohort","authors":"Hao He ,&nbsp;Li Luo ,&nbsp;Lei Bai ,&nbsp;Lei Luo ,&nbsp;Kunming Tian ,&nbsp;Xiaoyun Fu ,&nbsp;Bao Fu","doi":"10.1016/j.ijmedinf.2026.106299","DOIUrl":"10.1016/j.ijmedinf.2026.106299","url":null,"abstract":"<div><h3>Background</h3><div>Elderly acute pancreatitis (AP) patients face significantly higher in-hospital all-cause mortality, highlighting the need for effective risk stratification to support timely clinical decision-making.</div></div><div><h3>Methods</h3><div>We conducted a multicenter retrospective study that enrolled 2,728 elderly AP patients, with which we developed and validated a robust machine learning (ML) model for predicting in-hospital all-cause mortality. We first selected predictors of mortality using LASSO regression and random forest–based Boruta algorithms. Then, seven ML models incorporating the selected predictors were trained and evaluated using the area under the receiver operating characteristic curve (AUC).</div></div><div><h3>Results</h3><div>XGBoost demonstrated the highest predictive performance, achieving an AUC of 0.884 (95% CI: 0.823–0.945) in the external validation test, outperforming the conventional Ranson score in predicting in-hospital mortality. Shapley additive explanations ranked vasoactive drug, hospital length of stay, leukocyte count, noninvasive ventilation, and invasive mechanical ventilation as five key predictors. An interactive web-based tool based on the optimal XGBoost model has been available at <span><span>https://appredction.shinyapps.io/acutepancreatitis_xgb/</span><svg><path></path></svg></span> to generate real-time risk predictions.</div></div><div><h3>Conclusions</h3><div>This study proposed a validated and interpretable ML model to support in-hospital risk stratification for elderly patients with AP, thereby facilitating clinical decision-making and optimizing intensive care unit resource allocation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106299"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping clinical terms to standard terminology for multi-institutional research platform: Mapping principles and system deployment 多机构研究平台的临床术语到标准术语的映射:映射原则和系统部署。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-20 DOI: 10.1016/j.ijmedinf.2026.106294
Hannah Kang , Youngsun Park , Yukyeong Son , Ho-Young Lee , Soo-Yong Shin

Objective

To develop and implement a multi-institutional research platform by standardizing and integrating clinical terms with international terminologies.

Materials and Methods

This study introduces the Health data Research Suite (HRS) platform, designed to standardize and connect electronic medical record (EMR) data across institutions for efficient multi-institutional research. A hybrid mapping process—combining automated and manual methods—ensures semantic equivalency, consistency, and compliance with international standards like SNOMED CT, LOINC, and RxNorm. Key strategies included domain-specific semantic restrictions, prioritized attributes for post-coordination, and tailored mapping approaches. Cross-validation and expert consultations resolved mapping discrepancies and inactive concept issues, ensuring reliable data alignment.

Results

The study enhanced mapping accuracy in SNOMED CT, LOINC, and RxNorm by utilizing semantic tags and attribute prioritization, with expert consultations addressing any discrepancies. The HRS platform, designed with advanced code search capabilities and user-friendly interfaces, improved cohort generation and facilitated multi-institutional research.

Discussion

Challenges in maintaining inter-institutional consistency and addressing SNOMED CT’s hierarchical limitations were mitigated with a detailed mapping manual, systematic validation, and expert consensus-building. However, a national shortage of trained terminology specialists in Korea underscores the need for educational programs to enhance workforce expertise. Future enhancements include advanced search options and attribute-based retrieval to further improve usability and research support.

Conclusion

This study presents a mapping strategy to align institutional clinical terms with international standards, addressing challenges in semantic consistency and system implementation. The approach enhances multi-institutional research efficiency and fosters innovation in integrated healthcare research, with potential to advance global health outcomes.
目的:通过临床术语与国际术语的标准化和整合,开发和实施一个多机构研究平台。材料和方法:本研究引入健康数据研究套件(HRS)平台,旨在标准化和连接跨机构的电子病历(EMR)数据,以实现高效的多机构研究。混合映射过程(结合了自动和手动方法)确保了语义的等价性、一致性,并符合国际标准,如SNOMED CT、LOINC和RxNorm。关键策略包括特定于领域的语义限制、后期协调的优先属性以及定制的映射方法。交叉验证和专家咨询解决了映射差异和非活动概念问题,确保了可靠的数据对齐。结果:该研究通过使用语义标签和属性优先级来提高SNOMED CT、LOINC和RxNorm的映射精度,并通过专家咨询来解决任何差异。HRS平台具有先进的代码搜索能力和用户友好界面,改进了队列生成并促进了多机构研究。讨论:通过详细的制图手册、系统验证和专家共识的建立,在保持机构间一致性和解决SNOMED CT分层限制方面的挑战得到了缓解。然而,韩国全国缺乏受过训练的专业术语专家,这凸显了提高劳动力专业知识的教育计划的必要性。未来的增强包括高级搜索选项和基于属性的检索,以进一步提高可用性和研究支持。结论:本研究提出了一种映射策略,以使机构临床术语与国际标准保持一致,解决语义一致性和系统实施方面的挑战。这种方法提高了多机构研究的效率,促进了综合医疗保健研究的创新,有可能促进全球健康成果。
{"title":"Mapping clinical terms to standard terminology for multi-institutional research platform: Mapping principles and system deployment","authors":"Hannah Kang ,&nbsp;Youngsun Park ,&nbsp;Yukyeong Son ,&nbsp;Ho-Young Lee ,&nbsp;Soo-Yong Shin","doi":"10.1016/j.ijmedinf.2026.106294","DOIUrl":"10.1016/j.ijmedinf.2026.106294","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and implement a multi-institutional research platform by standardizing and integrating clinical terms with international terminologies.</div></div><div><h3>Materials and Methods</h3><div>This study introduces the Health data Research Suite (HRS) platform, designed to standardize and connect electronic medical record (EMR) data across institutions for efficient multi-institutional research. A hybrid mapping process—combining automated and manual methods—ensures semantic equivalency, consistency, and compliance with international standards like SNOMED CT, LOINC, and RxNorm. Key strategies included domain-specific semantic restrictions, prioritized attributes for post-coordination, and tailored mapping approaches. Cross-validation and expert consultations resolved mapping discrepancies and inactive concept issues, ensuring reliable data alignment.</div></div><div><h3>Results</h3><div>The study enhanced mapping accuracy in SNOMED CT, LOINC, and RxNorm by utilizing semantic tags and attribute prioritization, with expert consultations addressing any discrepancies. The HRS platform, designed with advanced code search capabilities and user-friendly interfaces, improved cohort generation and facilitated multi-institutional research.</div></div><div><h3>Discussion</h3><div>Challenges in maintaining inter-institutional consistency and addressing SNOMED CT’s hierarchical limitations were mitigated with a detailed mapping manual, systematic validation, and expert consensus-building. However, a national shortage of trained terminology specialists in Korea underscores the need for educational programs to enhance workforce expertise. Future enhancements include advanced search options and attribute-based retrieval to further improve usability and research support.</div></div><div><h3>Conclusion</h3><div>This study presents a mapping strategy to align institutional clinical terms with international standards, addressing challenges in semantic consistency and system implementation. The approach enhances multi-institutional research efficiency and fosters innovation in integrated healthcare research, with potential to advance global health outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106294"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering caregivers of individuals with autism spectrum disorder through sensor-based monitoring of emotional dysregulation: A scoping review 通过基于传感器的情绪失调监测赋予自闭症谱系障碍患者照顾者权力:范围综述
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-09 DOI: 10.1016/j.ijmedinf.2026.106262
Moid Sandhu , Siddique Latif , Andrew Bayor , Wei Lu , Mahnoosh Kholghi , Deepa Prabhu , David Silvera-Tawil
Objective: This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).
Methods: A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.
Results: Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.
Conclusion: There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.
目的:本文综述了基于传感器的情绪失调监测的现有工作,以支持自闭症谱系障碍(ASD)患者的护理人员。方法:对6个数据库(b谷歌Scholar、IEEE Xplore、Scopus、ACM Digital Library、Web of Science和PubMed)进行系统文献检索,检索时间为2016年1月1日至2025年9月30日。结果:32项研究符合纳入标准,其中27项关注基于传感器的情绪失调检测,5项关注干预或支持机制。这些研究表明,基于传感器的技术具有持续生理监测的潜力,有助于早期发现和干预,以支持情绪失调发作。在实时警报能力、自主干预部署、自我调节框架集成、系统可靠性、长期可持续性、用户界面设计和跨环境可扩展性方面发现了关键缺陷。结论:迫切需要开发实时情绪监测系统,使护理人员能够为ASD患者提供及时、有针对性的干预措施。未来的研究应优先发展实时警报系统、自主干预协议和解决方案,以优化可靠性、可持续性、可用性和跨异构护理环境的适应性。
{"title":"Empowering caregivers of individuals with autism spectrum disorder through sensor-based monitoring of emotional dysregulation: A scoping review","authors":"Moid Sandhu ,&nbsp;Siddique Latif ,&nbsp;Andrew Bayor ,&nbsp;Wei Lu ,&nbsp;Mahnoosh Kholghi ,&nbsp;Deepa Prabhu ,&nbsp;David Silvera-Tawil","doi":"10.1016/j.ijmedinf.2026.106262","DOIUrl":"10.1016/j.ijmedinf.2026.106262","url":null,"abstract":"<div><div><em>Objective:</em> This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).</div><div><em>Methods:</em> A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.</div><div><em>Results:</em> Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.</div><div><em>Conclusion:</em> There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106262"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When artificial intelligence guides and misguides clinicians: A critical appraisal of AI recommendation correctness and diagnostic decision-making 当人工智能引导和误导临床医生:对人工智能推荐正确性和诊断决策的批判性评估。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.ijmedinf.2026.106293
Hasan Nawaz Tahir , Anfal Khan , Muhammad Yousaf , Shahnila Javed , Muhammad Kamran Khan , Yousaf Ali
{"title":"When artificial intelligence guides and misguides clinicians: A critical appraisal of AI recommendation correctness and diagnostic decision-making","authors":"Hasan Nawaz Tahir ,&nbsp;Anfal Khan ,&nbsp;Muhammad Yousaf ,&nbsp;Shahnila Javed ,&nbsp;Muhammad Kamran Khan ,&nbsp;Yousaf Ali","doi":"10.1016/j.ijmedinf.2026.106293","DOIUrl":"10.1016/j.ijmedinf.2026.106293","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106293"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models’ Performances regarding logical observation identifiers names and codes mapping in laboratory medicine: A comparative analysis of ChatGPT-4.0, Gemini, and Perplexity 大型语言模型在检验医学中关于逻辑观察标识符名称和代码映射的性能:ChatGPT-4.0、Gemini和Perplexity的比较分析。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106270
Shinae Yu , Eun-Jung Cho , Sollip Kim , Kuenyoul Park , Min-Sun Kim , YeJin Oh , Hyejin Ryu

Objectives

This study aimed to assess the feasibility and practical utility of using large language models (LLMs) for Logical Observation Identifiers Names and Codes (LOINC) mapping to standardise healthcare data in the field of laboratory medicine. We evaluated the accuracy and applicability of three LLMs—ChatGPT-4.0 (OpenAI), Gemini 1.5 (Google DeepMind), and Perplexity AI (Perplexity.ai)—in mapping laboratory test items, which typically require considerable institutional-level standardisation efforts.

Methods

A total of 75 representative laboratory test items, including 55 clinical chemistry and 20 hematology tests commonly used in clinical practice, were selected. Six board-certified clinical pathologists independently mapped each test item to its appropriate LOINC code. A consensus mapping was established by the experts and used as the gold standard. Each LLM’s output was compared to this consensus, and the results were categorised as complete match (CM), partial match (PM), or mismatch (MM) based on agreement with the reference.

Results

Overall paired ordinal analyses demonstrated a significant difference in LOINC code-mapping performance among the three models, with Gemini performing significantly worse than both ChatGPT-4.0 and Perplexity AI, and no significant difference between ChatGPT-4.0 and Perplexity AI. ChatGPT-4.0 achieved the highest CM rate in clinical chemistry (58.2%), whereas Perplexity AI performed best in hematology (55.0%). Gemini showed the highest MM rates, particularly in hematology (80.0%), while partial matches were largely attributable to method-related discrepancies rather than fully incorrect mappings.

Conclusion

Structured inputs, localisation to domestic laboratory practices, and expert oversight are critical to improving the reliability of LLM-generated LOINC mappings. While LLMs can reduce workload by generating candidate mappings, human validation remains essential to ensure clinical accuracy. Future improvements should focus on algorithmic refinement, error feedback integration, and adaptation to diverse laboratory settings to enhance accuracy and generalisability in real-world laboratory settings.
目的:本研究旨在评估使用大型语言模型(llm)进行逻辑观察标识符名称和代码(LOINC)映射的可行性和实用性,以标准化实验室医学领域的医疗保健数据。我们评估了三种LLMs-ChatGPT-4.0 (OpenAI), Gemini 1.5(谷歌DeepMind)和Perplexity AI (Perplexity)的准确性和适用性。在绘制实验室测试项目中,这通常需要相当大的机构级别的标准化努力。方法:选取75项具有代表性的实验室检测项目,其中临床常用的化学检测项目55项,血液学检测项目20项。六名委员会认证的临床病理学家独立地将每个测试项目映射到相应的LOINC代码。专家们建立了共识图,并将其作为金标准。将每个LLM的输出与此共识进行比较,并根据与参考文献的一致性将结果分类为完全匹配(CM),部分匹配(PM)或不匹配(MM)。结果:总体配对顺序分析表明,三种模型在LOINC代码映射性能上存在显著差异,Gemini的表现明显低于ChatGPT-4.0和Perplexity AI,而ChatGPT-4.0和Perplexity AI之间无显著差异。ChatGPT-4.0在临床化学中的CM率最高(58.2%),而Perplexity AI在血液学中的CM率最高(55.0%)。双子座的MM率最高,特别是血液学(80.0%),而部分匹配主要是由于方法相关的差异,而不是完全不正确的映射。结论:结构化输入、本地化到国内实验室实践以及专家监督对于提高llm生成的LOINC映射的可靠性至关重要。虽然llm可以通过生成候选映射来减少工作量,但人工验证仍然是确保临床准确性的关键。未来的改进应集中在算法改进、误差反馈集成和适应不同的实验室环境,以提高在现实世界实验室环境中的准确性和通用性。
{"title":"Large Language Models’ Performances regarding logical observation identifiers names and codes mapping in laboratory medicine: A comparative analysis of ChatGPT-4.0, Gemini, and Perplexity","authors":"Shinae Yu ,&nbsp;Eun-Jung Cho ,&nbsp;Sollip Kim ,&nbsp;Kuenyoul Park ,&nbsp;Min-Sun Kim ,&nbsp;YeJin Oh ,&nbsp;Hyejin Ryu","doi":"10.1016/j.ijmedinf.2026.106270","DOIUrl":"10.1016/j.ijmedinf.2026.106270","url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to assess the feasibility and practical utility of using large language models (LLMs) for Logical Observation Identifiers Names and Codes (LOINC) mapping to standardise healthcare data in the field of laboratory medicine. We evaluated the accuracy and applicability of three LLMs—ChatGPT-4.0 (OpenAI), Gemini 1.5 (Google DeepMind), and Perplexity AI (<span><span>Perplexity.ai</span><svg><path></path></svg></span>)—in mapping laboratory test items, which typically require considerable institutional-level standardisation efforts.</div></div><div><h3>Methods</h3><div>A total of 75 representative laboratory test items, including 55 clinical chemistry and 20 hematology tests commonly used in clinical practice, were selected. Six board-certified clinical pathologists independently mapped each test item to its appropriate LOINC code. A consensus mapping was established by the experts and used as the gold standard. Each LLM’s output was compared to this consensus, and the results were categorised as complete match (CM), partial match (PM), or mismatch (MM) based on agreement with the reference.</div></div><div><h3>Results</h3><div>Overall paired ordinal analyses demonstrated a significant difference in LOINC code-mapping performance among the three models, with Gemini performing significantly worse than both ChatGPT-4.0 and Perplexity AI, and no significant difference between ChatGPT-4.0 and Perplexity AI. ChatGPT-4.0 achieved the highest CM rate in clinical chemistry (58.2%), whereas Perplexity AI performed best in hematology (55.0%). Gemini showed the highest MM rates, particularly in hematology (80.0%), while partial matches were largely attributable to method-related discrepancies rather than fully incorrect mappings.</div></div><div><h3>Conclusion</h3><div>Structured inputs, localisation to domestic laboratory practices, and expert oversight are critical to improving the reliability of LLM-generated LOINC mappings. While LLMs can reduce workload by generating candidate mappings, human validation remains essential to ensure clinical accuracy. Future improvements should focus on algorithmic refinement, error feedback integration, and adaptation to diverse laboratory settings to enhance accuracy and generalisability in real-world laboratory settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106270"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145935900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scoping review: how evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare 范围审查:评估方法如何影响我们对ChatGPT在医疗保健中的有效性的理解
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2025-12-31 DOI: 10.1016/j.ijmedinf.2025.106248
Yuanyuan Liu , Yu Zhang, Haoran Mao

Background

The rapid growth in research on ChatGPT’s healthcare applications has led to diverse evaluation methods and substantially heterogeneous findings, undermining evidence reliability and hindering clinical translation.

Objectives

This review aims to examine how different evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare.

Methods

Studies published between 2023 and 2024 that assess the use of ChatGPT in medical or healthcare-related contexts were included. Evidence was obtained from peer-reviewed literature analyzing ChatGPT’s applications across clinical, educational, and diagnostic domains. Following the PRISMA guidelines, this systematic review analyzed 131 studies published during 2023–2024 that assess the use of ChatGPT in medical contexts.

Results

The results indicate that predominant evaluation approaches—controlled trial studies, expert assessment studies, measurement-based evaluation studies, and prompt generation analysis studies—systematically influence conclusions about ChatGPT’s performance due to their inherent methodological characteristics, such as subjectivity, objectivity, and differences in ecological validity. Further analysis reveals that ChatGPT’s performance is highly context-dependent, shaped by specific application scenarios, model versions, and prompting strategies.

Conclusions

To address methodological heterogeneity and the lack of standardization, this study recommends multi-method cross-validation strategies and a risk-stratified, standardized evaluation framework. These steps are essential to enhance the scientific rigor and reliability of ChatGPT’s assessment in healthcare and to provide a solid foundation for its clinical integration.
ChatGPT在医疗保健应用方面的研究快速增长,导致评估方法多样化,结果极不一致,破坏了证据的可靠性,阻碍了临床转化。目的本综述旨在研究不同的评估方法如何影响我们对ChatGPT在医疗保健中的有效性的理解。方法纳入2023年至2024年间发表的评估ChatGPT在医疗或卫生保健相关背景下使用的研究。证据来自同行评议的文献,分析了ChatGPT在临床、教育和诊断领域的应用。遵循PRISMA指南,本系统综述分析了2023-2024年间发表的131项评估ChatGPT在医学背景下使用的研究。结果表明,主要的评估方法——对照试验研究、专家评估研究、基于测量的评估研究和提示生成分析研究——由于其固有的方法学特征(如主观性、客观性和生态效度差异),系统地影响了关于ChatGPT性能的结论。进一步的分析表明,ChatGPT的性能高度依赖于上下文,由特定的应用程序场景、模型版本和提示策略决定。结论为了解决方法异质性和缺乏标准化的问题,本研究建议采用多方法交叉验证策略和风险分层、标准化的评估框架。这些步骤对于提高ChatGPT在医疗保健领域评估的科学严谨性和可靠性至关重要,并为其临床整合提供坚实的基础。
{"title":"A scoping review: how evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare","authors":"Yuanyuan Liu ,&nbsp;Yu Zhang,&nbsp;Haoran Mao","doi":"10.1016/j.ijmedinf.2025.106248","DOIUrl":"10.1016/j.ijmedinf.2025.106248","url":null,"abstract":"<div><h3>Background</h3><div>The rapid growth in research on ChatGPT’s healthcare applications has led to diverse evaluation methods and substantially heterogeneous findings, undermining evidence reliability and hindering clinical translation.</div></div><div><h3>Objectives</h3><div>This review aims to examine how different evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare.</div></div><div><h3>Methods</h3><div>Studies published between 2023 and 2024 that assess the use of ChatGPT in medical or healthcare-related contexts were included. Evidence was obtained from peer-reviewed literature analyzing ChatGPT’s applications across clinical, educational, and diagnostic domains. Following the PRISMA guidelines, this systematic review analyzed 131 studies published during 2023–2024 that assess the use of ChatGPT in medical contexts.</div></div><div><h3>Results</h3><div>The results indicate that predominant evaluation approaches—controlled trial studies, expert assessment studies, measurement-based evaluation studies, and prompt generation analysis studies—systematically influence conclusions about ChatGPT’s performance due to their inherent methodological characteristics, such as subjectivity, objectivity, and differences in ecological validity. Further analysis reveals that ChatGPT’s performance is highly context-dependent, shaped by specific application scenarios, model versions, and prompting strategies.</div></div><div><h3>Conclusions</h3><div>To address methodological heterogeneity and the lack of standardization, this study recommends multi-method cross-validation strategies and a risk-stratified, standardized evaluation framework. These steps are essential to enhance the scientific rigor and reliability of ChatGPT’s assessment in healthcare and to provide a solid foundation for its clinical integration.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106248"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advantages and challenges of tracking st-segment elevation myocardial infarction patients with a real-time dashboard: A single-centre experience 用实时仪表板跟踪st段抬高型心肌梗死患者的优势和挑战:单中心体验
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-02 DOI: 10.1016/j.ijmedinf.2026.106261
Egidio de Mattia , Filippo Paoletti , Daniela Pedicino , Giovanna Liuzzo , Carmen Angioletti , Alessia d’Aiello , Alessio Perilli , Andrea Adduci , Giovanni Arcuri , Emilio Meneschincheri , Barbara Ruffo , Melissa D’Agostino , Rita De Donno , Antonio Giulio de Belvis

Background

Timely primary percutaneous coronary intervention (pPCI) is the most important treatment to improve outcomes in ST-segment elevation myocardial infarction (STEMI), with a strong relationship between treatment delays and morbidity and mortality. The present study aims to define the main steps for setting up a real-time digital monitoring dashboard to improve the clinical performance of STEMI management and to evaluate the impact of its implementation on the proportion of patients receiving primary percutaneous coronary intervention (pPCI) within 90 min.

Methods

The set-up of the digital monitoring system required the definition of detailed algorithms for the diagnosis, treatment, and rehab/follow-up phase. For each patient with a diagnosis of STEMI included in the clinical pathway (CP) a multidisciplinary working group identified i) rules for flagging patients alongside the CP, based on specific risk scores; ii) the critical points of the CP to be monitored, such as door-to-balloon time, intensive care unit length of stay, and total hospital length of stay. An interrupted time series analysis and multivariable logistic regression models were performed to assess for changes in the outcome (pPCI within 90 min) after the platform implementation, adjusting for temporal and individual confounders.

Results

After the introduction of the dashboard, the proportion of timely pPCI improved from 40 % pre-implementation to 65 % post-implementation. Adjusted models indicated a twofold increase in the odds of meeting the 90-minute benchmark (OR = 2.00; 95 % CI: 0.99–4.12).

Conclusion

The real-time monitoring system showed a positive impact on the timely management of STEMI, highlighting the potential for improving healthcare efficiency and patient outcomes.
及时的原发性经皮冠状动脉介入治疗(pPCI)是改善st段抬高型心肌梗死(STEMI)预后最重要的治疗方法,治疗延误与发病率和死亡率之间存在密切关系。本研究旨在定义建立实时数字监测仪表板的主要步骤,以提高STEMI管理的临床表现,并评估其实施对90分钟内接受初级经皮冠状动脉介入治疗(pPCI)的患者比例的影响。方法数字监测系统的建立需要明确诊断、治疗和康复/随访阶段的详细算法。对于临床路径(CP)中每个STEMI诊断患者,多学科工作组确定i)根据特定风险评分将患者与CP一起标记的规则;ii)需要监测的CP关键点,如从门到球囊的时间、重症监护病房的住院时间和总住院时间。通过中断时间序列分析和多变量逻辑回归模型来评估平台实施后结果(90分钟内pPCI)的变化,并对时间和个体混杂因素进行调整。结果引入仪表板后,及时pPCI的比例由实施前的40%提高到实施后的65%。调整后的模型显示,达到90分钟基准的几率增加了两倍(OR = 2.00; 95% CI: 0.99-4.12)。结论实时监测系统对STEMI的及时管理有积极的影响,突出了提高医疗效率和患者预后的潜力。
{"title":"Advantages and challenges of tracking st-segment elevation myocardial infarction patients with a real-time dashboard: A single-centre experience","authors":"Egidio de Mattia ,&nbsp;Filippo Paoletti ,&nbsp;Daniela Pedicino ,&nbsp;Giovanna Liuzzo ,&nbsp;Carmen Angioletti ,&nbsp;Alessia d’Aiello ,&nbsp;Alessio Perilli ,&nbsp;Andrea Adduci ,&nbsp;Giovanni Arcuri ,&nbsp;Emilio Meneschincheri ,&nbsp;Barbara Ruffo ,&nbsp;Melissa D’Agostino ,&nbsp;Rita De Donno ,&nbsp;Antonio Giulio de Belvis","doi":"10.1016/j.ijmedinf.2026.106261","DOIUrl":"10.1016/j.ijmedinf.2026.106261","url":null,"abstract":"<div><h3>Background</h3><div>Timely primary percutaneous coronary intervention (pPCI) is the most important treatment to improve outcomes in ST-segment elevation myocardial infarction (STEMI), with a strong relationship between treatment delays and morbidity and mortality. The present study aims to define the main steps for setting up a real-time digital monitoring dashboard to improve the clinical performance of STEMI management and to evaluate the impact of its implementation on the proportion of patients receiving primary percutaneous coronary intervention (pPCI) within 90 min.</div></div><div><h3>Methods</h3><div>The set-up of the digital monitoring system required the definition of detailed algorithms for the diagnosis, treatment, and rehab/follow-up phase. For each patient with a diagnosis of STEMI included in the clinical pathway (CP) a multidisciplinary working group identified i) rules for flagging patients alongside the CP, based on specific risk scores; ii) the critical points of the CP to be monitored, such as door-to-balloon time, intensive care unit length of stay, and total hospital length of stay. An interrupted time series analysis and multivariable logistic regression models were performed to assess for changes in the outcome (pPCI within 90 min) after the platform implementation, adjusting for temporal and individual confounders.</div></div><div><h3>Results</h3><div>After the introduction of the dashboard, the proportion of timely pPCI improved from 40 % pre-implementation to 65 % post-implementation. Adjusted models indicated a twofold increase in the odds of meeting the 90-minute benchmark (OR = 2.00; 95 % CI: 0.99–4.12).</div></div><div><h3>Conclusion</h3><div>The real-time monitoring system showed a positive impact on the timely management of STEMI, highlighting the potential for improving healthcare efficiency and patient outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106261"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing diabetes monitoring systems’ reports: A novel integrated diabetes report (IDR) 加强糖尿病监测系统报告:一种新的糖尿病综合报告(IDR)。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106288
Tahmineh Aldaghi , Robert Bem , Jan Muzik

Aim

Individuals with diabetes require continuous self-management. Diabetes monitoring systems generate structured reports that help individuals and healthcare providers interpret data and optimize treatment strategies. To design and validate an Integrated Diabetes Report (IDR) that improves the clarity, usability, and clinical relevance of diabetes data visualizations.

Method

A review of 13 diabetes monitoring systems revealed five main report categories: overlay, logbook, device-specific, daily, and overview reports. While the overview report was the most frequently used, it lacked comprehensive visualization and essential clinical metrics. To address these gaps, a multidisciplinary panel of four experts collaborated to design a more integrated reporting framework.

Results

Across systems, glucose statistics were included in all reports, followed by insulin data (in 12 systems), carbohydrate intake (in 6 systems), hypo-hyperglycemic indices (in 2 systems), sleep indices (in 2 systems), and medication details (in 1 system). Key gaps included minimal data on physical activity, limited documentation of carbohydrates, and the absence of consolidated insulin visualization. The IDR introduces a complications section, an integrated graph combining AGP with basal and bolus insulin, and an advanced insulin profile comparing seven calculated indices.

Conclusion

The IDR improves clinical interpretation, supports treatment decisions, and enhances risk assessment for diabetes management.
目的:糖尿病患者需要持续的自我管理。糖尿病监测系统生成结构化报告,帮助个人和医疗保健提供者解释数据并优化治疗策略。设计并验证糖尿病综合报告(IDR),以提高糖尿病数据可视化的清晰度、可用性和临床相关性。方法:对13个糖尿病监测系统的回顾揭示了五种主要报告类别:覆盖报告、日志报告、特定设备报告、每日报告和概述报告。虽然概述报告是最常用的,但它缺乏全面的可视化和必要的临床指标。为了解决这些差距,一个由四名专家组成的多学科小组合作设计了一个更加综合的报告框架。结果:在各个系统中,所有报告均包含葡萄糖统计数据,其次是胰岛素数据(12个系统)、碳水化合物摄入量(6个系统)、低血糖指数(2个系统)、睡眠指数(2个系统)和用药细节(1个系统)。主要的差距包括:关于身体活动的数据很少,关于碳水化合物的记录有限,以及缺乏整合的胰岛素可视化。IDR引入了并发症部分,将AGP与基础胰岛素和大剂量胰岛素结合起来的综合图表,以及比较七个计算指标的高级胰岛素概况。结论:IDR改善了临床解释,支持了治疗决策,并加强了糖尿病管理的风险评估。
{"title":"Enhancing diabetes monitoring systems’ reports: A novel integrated diabetes report (IDR)","authors":"Tahmineh Aldaghi ,&nbsp;Robert Bem ,&nbsp;Jan Muzik","doi":"10.1016/j.ijmedinf.2026.106288","DOIUrl":"10.1016/j.ijmedinf.2026.106288","url":null,"abstract":"<div><h3>Aim</h3><div>Individuals with diabetes require continuous self-management. Diabetes monitoring systems generate structured reports that help individuals and healthcare providers interpret data and optimize treatment strategies. To design and validate an Integrated Diabetes Report (IDR) that improves the clarity, usability, and clinical relevance of diabetes data visualizations.</div></div><div><h3>Method</h3><div>A review of 13 diabetes monitoring systems revealed five main report categories: overlay, logbook, device-specific, daily, and overview reports. While the overview report was the most frequently used, it lacked comprehensive visualization and essential clinical metrics. To address these gaps, a multidisciplinary panel of four experts collaborated to design a more integrated reporting framework.</div></div><div><h3>Results</h3><div>Across systems, glucose statistics were included in all reports, followed by insulin data (in 12 systems), carbohydrate intake (in 6 systems), hypo-hyperglycemic indices (in 2 systems), sleep indices (in 2 systems), and medication details (in 1 system). Key gaps included minimal data on physical activity, limited documentation of carbohydrates, and the absence of consolidated insulin visualization. The IDR introduces a complications section, an integrated graph combining AGP with basal and bolus insulin, and an advanced insulin profile comparing seven calculated indices.</div></div><div><h3>Conclusion</h3><div>The IDR improves clinical interpretation, supports treatment decisions, and enhances risk assessment for diabetes management.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106288"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of evaluation system for Internet hospitals 互联网医院评价体系述评。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2025-12-22 DOI: 10.1016/j.ijmedinf.2025.106234
Zheqing Li , Liyang Tang , Yin Li , Yuanyuan Dang , Lin Yao

Context

Internet hospitals have emerged as a digital innovation in healthcare, optimizing resource allocation and enhancing patient experience. They also support hierarchical diagnosis and treatment and contribute to the Healthy China initiative.

Objectives

To establish a comprehensive evaluation system to promote the sustainable development of Internet hospitals.

Methods

A systematic review of literature related to the evaluation of Internet-based healthcare services was conducted. Using Web of Science and CNKI as data sources, studies published between 2015 and 2024 were screened based on predefined criteria, focusing on high-quality journals and research reports. The selected literature was coded and analyzed across four dimensions: patient services, doctor services, management services, and information security.

Results

The final analysis included 34 papers, with 25 mentioning patient services indicators, 20 mentioning doctor services indicators, 18 mentioning medical services process management indicators, and 9 mentioning information security. This study identifies key evaluation indicators and examines their interrelationships, highlighting potential systemic risks from localized optimizations.

Conclusion

This review analyzed Internet hospital evaluation across patient services, doctor services, services management, and information security. While it highlights potential efficiency gains, it notes the lack of comprehensive indicators, limiting assessment and improvement. For sustainable development, a more comprehensive evaluation system should integrate multi-stakeholder perspectives (patients, doctors, institutions), address systemic risks from localized optimization, and incorporate coordinated policy considerations.
背景:互联网医院作为医疗领域的数字化创新,优化了资源配置,提升了患者体验。他们还支持分级诊疗,为“健康中国”倡议做出贡献。目的:建立促进互联网医院可持续发展的综合评价体系。方法:系统回顾与互联网医疗服务评价相关的文献。以Web of Science和CNKI为数据来源,根据预先设定的标准筛选2015 - 2024年间发表的研究,重点筛选高质量的期刊和研究报告。对选定的文献进行编码,并从四个方面进行分析:患者服务、医生服务、管理服务和信息安全。结果:最终分析共纳入34篇论文,其中患者服务指标25篇,医生服务指标20篇,医疗服务流程管理指标18篇,信息安全9篇。本研究确定了关键的评估指标,并检查了它们之间的相互关系,突出了局部优化带来的潜在系统性风险。结论:本综述分析了互联网医院在患者服务、医生服务、服务管理和信息安全方面的评价。虽然它强调了潜在的效率提高,但它指出缺乏全面的指标,限制了评估和改进。为了实现可持续发展,更全面的评价体系应该整合多方利益相关者(患者、医生、机构)的视角,从局部优化中解决系统性风险,并纳入协调一致的政策考虑。
{"title":"A review of evaluation system for Internet hospitals","authors":"Zheqing Li ,&nbsp;Liyang Tang ,&nbsp;Yin Li ,&nbsp;Yuanyuan Dang ,&nbsp;Lin Yao","doi":"10.1016/j.ijmedinf.2025.106234","DOIUrl":"10.1016/j.ijmedinf.2025.106234","url":null,"abstract":"<div><h3>Context</h3><div>Internet hospitals have emerged as a digital innovation in healthcare, optimizing resource allocation and enhancing patient experience. They also support hierarchical diagnosis and treatment and contribute to the Healthy China initiative.</div></div><div><h3>Objectives</h3><div>To establish a comprehensive evaluation system to promote the sustainable development of Internet hospitals.</div></div><div><h3>Methods</h3><div>A systematic review of literature related to the evaluation of Internet-based healthcare services was conducted. Using Web of Science and CNKI as data sources, studies published between 2015 and 2024 were screened based on predefined criteria, focusing on high-quality journals and research reports. The selected literature was coded and analyzed across four dimensions: patient services, doctor services, management services, and information security.</div></div><div><h3>Results</h3><div>The final analysis included 34 papers, with 25 mentioning patient services indicators, 20 mentioning doctor services indicators, 18 mentioning medical services process management indicators, and 9 mentioning information security. This study identifies key evaluation indicators and examines their interrelationships, highlighting potential systemic risks from localized optimizations.</div></div><div><h3>Conclusion</h3><div>This review analyzed Internet hospital evaluation across patient services, doctor services, services management, and information security. While it highlights potential efficiency gains, it notes the lack of comprehensive indicators, limiting assessment and improvement. For sustainable development, a more comprehensive evaluation system should integrate multi-stakeholder perspectives (patients, doctors, institutions), address systemic risks from localized optimization, and incorporate coordinated policy considerations.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106234"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing 使用自然语言处理从临床记录中自动提取氟嘧啶治疗和治疗相关毒性
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-04-01 Epub Date: 2026-01-10 DOI: 10.1016/j.ijmedinf.2026.106276
Xizhi Wu , Madeline S. Kreider , Philip E. Empey , Chenyu Li , Yanshan Wang

Objective

Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.

Materials and methods

We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.

Results

Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.

Discussion

LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.

Conclusion

LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.
目的氟嘧啶广泛用于结直肠癌和乳腺癌,但与手足综合征和心脏毒性等毒性有关。由于毒性文件通常嵌入在临床记录中,我们旨在开发和评估自然语言处理(NLP)方法来提取治疗和毒性信息。材料与方法我们构建了一个金标准数据集,包含来自204,165名成年肿瘤患者的236份临床记录。领域专家注释了与治疗方案和毒性有关的类别。我们开发了基于规则的、基于机器学习的(随机森林[RF]、支持向量机[SVM]、逻辑回归[LR])、基于深度学习的(BERT、ClinicalBERT)和基于大型语言模型(LLM)的NLP方法(零射击和错误分析提示)。对每个模型进行5次交叉验证。结果serror分析提示在治疗和毒理提取的精密度、召回率和F1得分(F1 = 1.000)均达到了最佳水平(F1 = 0.965),而zero-shot表现中等(治疗F1 = 0.889,毒理提取F1 = 0.854),基于规则的治疗F1 = 1.000,毒理提取F1 = 0.904。LR和SVM的毒性提取效果分别为2、4位(LR F1 = 0.914, SVM F1 = 0.903)。深度学习和RF表现不佳,BERT在治疗方面的表现为F1 = 0.792,在毒性提取方面的表现为F1 = 0.837。,ClinicalBERT在治疗方面达到F1 = 0.797,毒性提取方面达到F1 = 0.884)。治疗组RF为F1 = 0.745,毒副作用提取组RF为F1 = 0.853。讨论基于lmm的误差分析优于所有其他方法,其次是机器学习方法。机器学习和深度学习方法受到小型训练数据的限制,并且泛化能力有限,特别是对于罕见的类别。结论基于llm的误差分析能最有效地从临床记录中提取氟嘧啶的治疗和毒性信息,在支持肿瘤研究和药物警戒方面具有很强的潜力。
{"title":"Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing","authors":"Xizhi Wu ,&nbsp;Madeline S. Kreider ,&nbsp;Philip E. Empey ,&nbsp;Chenyu Li ,&nbsp;Yanshan Wang","doi":"10.1016/j.ijmedinf.2026.106276","DOIUrl":"10.1016/j.ijmedinf.2026.106276","url":null,"abstract":"<div><h3>Objective</h3><div>Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.</div></div><div><h3>Materials and methods</h3><div>We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.</div></div><div><h3>Results</h3><div>Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.</div></div><div><h3>Discussion</h3><div>LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.</div></div><div><h3>Conclusion</h3><div>LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106276"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Carbon Balance Manage. Org. Geochem. Global Biogeochem. Cycles Geol. Ore Deposits Ore Geol. Rev. OCEAN SCI J Contrib. Mineral. Petrol. Engineering Science and Technology, an International Journal Int. J. Biometeorol. GEOHERITAGE ERN: Other Microeconomics: General Equilibrium & Disequilibrium Models of Financial Markets (Topic) GEOLOGY Geochem. Perspect. Appl. Clay Sci. Ocean Dyn. Precambrian Res. Clean-Soil Air Water Geobiology Geochem. J. P GEOLOGIST ASSOC Chem. Ecol. Espacio Tiempo y Forma. Serie VI, Geografía Environ. Eng. Sci. npj Clim. Atmos. Sci. Geosci. J. Environ. Mol. Mutagen. GEOCHEM GEOPHY GEOSY Engineering Structures and Technologies Geochem. Int. Ocean and Coastal Research Atmos. Res. Condens. Matter Phys. GEOCHRONOMETRIA ACTA GEOL SIN-ENGL Gondwana Res. Bull. Geol. Soc. Den. Chin. Phys. Lett. NEUES JB MINER ABH J. Environ. Eng. Geophys. Big Earth Data Meteorol. Atmos. Phys. Environ. Eng. Res. Environ. Pollut. Bioavailability Q. J. R. Meteorolog. Soc. 非金属矿 CRIT REV ENV SCI TEC Quat. Int. ABDOM RADIOL ACTA RADIOL Nat. Hazards Earth Syst. Sci. Hydrol. Earth Syst. Sci. PALAEOGEOGR PALAEOCL Espacio Tiempo y Forma. Serie VII, Historia del Arte 2013 IEEE International Conference on Computer Vision Enzyme Research [Rinsho ketsueki] The Japanese journal of clinical hematology Veg. Hist. Archaeobot. ACTA GEOL POL Environmental Control in Biology Environ. Res. Lett. B SOC GEOL MEX Scott. J. Geol. OCEANOLOGY+ Int. J. Appl. Earth Obs. Geoinf. Paleontol. J. HOLOCENE Acta Geochimica Journal of Semiconductors Appl. Phys. Rev. Earth Syst. Dyn. Chin. Phys. C Archaeol. Anthropol. Sci. Classical Quantum Gravity QUATERNAIRE European journal of biochemistry Ecol. Processes OFIOLITI ICHNOS Environmental Claims Journal J. Coastal Res. Conserv. Genet. Resour. ACTA CLIN BELG Atmos. Chem. Phys. Polar Res. Adv. Meteorol. Acta Oceanolog. Sin. FRONT EARTH SCI-PRC Acta Geophys. ACTA PETROL SIN J. Meteorolog. Res. Am. J. Phys. Anthropol. J. Atmos. Chem. Geochim. Cosmochim. Acta IZV-PHYS SOLID EART+ ECOLOGY Am. J. Sci. Asia-Pac. J. Atmos. Sci. Geostand. Geoanal. Res. Basin Res. ARCHAEOMETRY
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1