首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
Enhancing diabetes monitoring systems’ reports: A novel integrated diabetes report (IDR) 加强糖尿病监测系统报告:一种新的糖尿病综合报告(IDR)。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106288
Tahmineh Aldaghi , Robert Bem , Jan Muzik

Aim

Individuals with diabetes require continuous self-management. Diabetes monitoring systems generate structured reports that help individuals and healthcare providers interpret data and optimize treatment strategies. To design and validate an Integrated Diabetes Report (IDR) that improves the clarity, usability, and clinical relevance of diabetes data visualizations.

Method

A review of 13 diabetes monitoring systems revealed five main report categories: overlay, logbook, device-specific, daily, and overview reports. While the overview report was the most frequently used, it lacked comprehensive visualization and essential clinical metrics. To address these gaps, a multidisciplinary panel of four experts collaborated to design a more integrated reporting framework.

Results

Across systems, glucose statistics were included in all reports, followed by insulin data (in 12 systems), carbohydrate intake (in 6 systems), hypo-hyperglycemic indices (in 2 systems), sleep indices (in 2 systems), and medication details (in 1 system). Key gaps included minimal data on physical activity, limited documentation of carbohydrates, and the absence of consolidated insulin visualization. The IDR introduces a complications section, an integrated graph combining AGP with basal and bolus insulin, and an advanced insulin profile comparing seven calculated indices.

Conclusion

The IDR improves clinical interpretation, supports treatment decisions, and enhances risk assessment for diabetes management.
目的:糖尿病患者需要持续的自我管理。糖尿病监测系统生成结构化报告,帮助个人和医疗保健提供者解释数据并优化治疗策略。设计并验证糖尿病综合报告(IDR),以提高糖尿病数据可视化的清晰度、可用性和临床相关性。方法:对13个糖尿病监测系统的回顾揭示了五种主要报告类别:覆盖报告、日志报告、特定设备报告、每日报告和概述报告。虽然概述报告是最常用的,但它缺乏全面的可视化和必要的临床指标。为了解决这些差距,一个由四名专家组成的多学科小组合作设计了一个更加综合的报告框架。结果:在各个系统中,所有报告均包含葡萄糖统计数据,其次是胰岛素数据(12个系统)、碳水化合物摄入量(6个系统)、低血糖指数(2个系统)、睡眠指数(2个系统)和用药细节(1个系统)。主要的差距包括:关于身体活动的数据很少,关于碳水化合物的记录有限,以及缺乏整合的胰岛素可视化。IDR引入了并发症部分,将AGP与基础胰岛素和大剂量胰岛素结合起来的综合图表,以及比较七个计算指标的高级胰岛素概况。结论:IDR改善了临床解释,支持了治疗决策,并加强了糖尿病管理的风险评估。
{"title":"Enhancing diabetes monitoring systems’ reports: A novel integrated diabetes report (IDR)","authors":"Tahmineh Aldaghi ,&nbsp;Robert Bem ,&nbsp;Jan Muzik","doi":"10.1016/j.ijmedinf.2026.106288","DOIUrl":"10.1016/j.ijmedinf.2026.106288","url":null,"abstract":"<div><h3>Aim</h3><div>Individuals with diabetes require continuous self-management. Diabetes monitoring systems generate structured reports that help individuals and healthcare providers interpret data and optimize treatment strategies. To design and validate an Integrated Diabetes Report (IDR) that improves the clarity, usability, and clinical relevance of diabetes data visualizations.</div></div><div><h3>Method</h3><div>A review of 13 diabetes monitoring systems revealed five main report categories: overlay, logbook, device-specific, daily, and overview reports. While the overview report was the most frequently used, it lacked comprehensive visualization and essential clinical metrics. To address these gaps, a multidisciplinary panel of four experts collaborated to design a more integrated reporting framework.</div></div><div><h3>Results</h3><div>Across systems, glucose statistics were included in all reports, followed by insulin data (in 12 systems), carbohydrate intake (in 6 systems), hypo-hyperglycemic indices (in 2 systems), sleep indices (in 2 systems), and medication details (in 1 system). Key gaps included minimal data on physical activity, limited documentation of carbohydrates, and the absence of consolidated insulin visualization. The IDR introduces a complications section, an integrated graph combining AGP with basal and bolus insulin, and an advanced insulin profile comparing seven calculated indices.</div></div><div><h3>Conclusion</h3><div>The IDR improves clinical interpretation, supports treatment decisions, and enhances risk assessment for diabetes management.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106288"},"PeriodicalIF":4.1,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond binary diagnosis: Key questions on AI accuracy, real-world applicability, and safety in clinical decision support 超越二元诊断:人工智能准确性、现实世界适用性和临床决策支持安全性的关键问题。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106292
Jin Ye
This comment relates to Kücking et al.’s (2026) study on the bidirectional effects of artificial intelligence recommendations and healthcare provider related factors on the accuracy of wound impregnation diagnosis. While acknowledging the valuable contributions of this research, including distinguishing between correct/incorrect artificial intelligence outputs, rigorous simulation design, and emphasis on clinical safety, we have raised key questions to enhance the interpretation of results and real-world translation. The main focuses include the moderating role of artificial intelligence system accuracy in automation bias, external effectiveness in real clinical environments, potential mechanisms for gender differences in diagnostic performance, the impact of visual cue design on decision-making, and the potential of explainable artificial intelligence (XAI) in risk mitigation. This review aims to promote further research and facilitate the safe and effective integration of artificial intelligence based clinical decision support systems (CDSS) into clinical practice.
这一评论涉及k cking等人(2026)关于人工智能推荐和医疗保健提供者相关因素对伤口浸渍诊断准确性的双向影响的研究。在承认这项研究的宝贵贡献的同时,包括区分正确/不正确的人工智能输出,严格的模拟设计,以及对临床安全性的强调,我们提出了一些关键问题,以加强对结果的解释和现实世界的翻译。主要重点包括人工智能系统准确性在自动化偏差中的调节作用,真实临床环境中的外部有效性,诊断表现性别差异的潜在机制,视觉线索设计对决策的影响,以及可解释人工智能(XAI)在风险缓解中的潜力。本文综述旨在促进进一步的研究,促进基于人工智能的临床决策支持系统(CDSS)安全有效地整合到临床实践中。
{"title":"Beyond binary diagnosis: Key questions on AI accuracy, real-world applicability, and safety in clinical decision support","authors":"Jin Ye","doi":"10.1016/j.ijmedinf.2026.106292","DOIUrl":"10.1016/j.ijmedinf.2026.106292","url":null,"abstract":"<div><div>This comment relates to Kücking et al.’s (2026) study on the bidirectional effects of artificial intelligence recommendations and healthcare provider related factors on the accuracy of wound impregnation diagnosis. While acknowledging the valuable contributions of this research, including distinguishing between correct/incorrect artificial intelligence outputs, rigorous simulation design, and emphasis on clinical safety, we have raised key questions to enhance the interpretation of results and real-world translation. The main focuses include the moderating role of artificial intelligence system accuracy in automation bias, external effectiveness in real clinical environments, potential mechanisms for gender differences in diagnostic performance, the impact of visual cue design on decision-making, and the potential of explainable artificial intelligence (XAI) in risk mitigation. This review aims to promote further research and facilitate the safe and effective integration of artificial intelligence based clinical decision support systems (CDSS) into clinical practice.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106292"},"PeriodicalIF":4.1,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less time Coding, more time Caring: Performance evaluation of ChatGPT-5 for ICD-10 coding of radiology reports 少时间编码,多时间关怀:ChatGPT-5对放射学报告ICD-10编码的性能评价
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106296
Tristan Ruhwedel , Julian M.M. Rogasch , Paul Martin Dahlke , Seyd Shnayien , Christian Furth , Christoph Wetz , Holger Amthauer , Imke Schatka , Nick Lasse Beetz

Introduction

Worldwide radiologists are facing a high administrative workload. ICD-10 coding is mandatory for reimbursement in many health systems and a frequent source of billing errors. Large language models have shown promise in supporting coding related tasks, but previous studies with earlier ChatGPT versions reported mixed results and evidence specific to radiology reports remains scarce. We therefore aimed to investigate whether ChatGPT-5 can be consulted when assigning ICD-10 codes to radiology reports and whether this leads to a measurable time advantage.

Methods

2,738 fictious radiology reports across multiple modalities were derived from the PARROT database. Additionally, 100 fictitious PET/CT reports were created. Each report was assigned a single, most relevant ICD-10 code using ChatGPT-5. For PARROT, ChatGPT-derived codes were compared with predefined database reference labels. For PET/CT, ChatGPT-derived codes were compared with codes assigned by an independent manual coder. Exact and character-level concordance were assessed. In cases of discordance, a blinded adjudicator selected the most accurate ICD-10 code. Coding efficiency was evaluated for PET/CT reports by measuring coding time per report.

Results

For PARROT, exact-code concordance was 1,590/2,738 (58.1 %). In a random subset of 200 mismatches, blinded adjudication preferred the ChatGPT derived code in 123 and the reference label in 77 cases (p = 0.0015). Coding non-English reports resulted in significantly lower concordance (first character: p = 0.002; second/third characters: p < 0.001; last characters: p = 0.012) and longer coding times than English reports (p = 0.002). Regarding PET/CT reports, median coding time was 8 s with ChatGPT and 135 s without. The median time saved was 127 s per report.

Conclusion

Applied to daily clinical care, higher code correctness might reduce billing errors, while saved time could be reallocated to patient care. Radiologists should collaborate with developers to create versions of LLMs that operate within data-secure environments.
世界各地的放射科医生都面临着很高的行政工作量。在许多卫生系统中,ICD-10编码是报销的强制性规定,也是账单错误的常见来源。大型语言模型在支持编码相关任务方面显示出了希望,但是先前对早期ChatGPT版本的研究报告了混合的结果,并且针对放射学报告的证据仍然很少。因此,我们的目的是研究在将ICD-10代码分配给放射学报告时是否可以咨询ChatGPT-5,以及这是否会带来可测量的时间优势。方法从PARROT数据库中提取2,738份不同模式的虚构放射学报告。此外,还创建了100个虚构的PET/CT报告。每个报告使用ChatGPT-5分配一个最相关的ICD-10代码。对于PARROT, chatgpt衍生的代码与预定义的数据库参考标签进行了比较。对于PET/CT, chatgpt衍生代码与独立手动编码器分配的代码进行比较。准确和字符水平的一致性进行了评估。在不一致的情况下,盲法裁判选择最准确的ICD-10代码。通过测量每个报告的编码时间来评估PET/CT报告的编码效率。结果PARROT的准确编码一致性为1590 / 2738(58.1%)。在200个不匹配的随机子集中,盲法判决倾向于123例ChatGPT衍生代码和77例参考标签(p = 0.0015)。编码非英语报告的一致性显著低于英语报告(第一个字符:p = 0.002;第二/第三个字符:p <; 0.001;最后一个字符:p = 0.012),编码时间较长(p = 0.002)。关于PET/CT报告,ChatGPT的中位编码时间为8秒,未ChatGPT的中位编码时间为135秒。每个报告节省的平均时间为127秒。结论应用于临床日常护理中,提高编码正确性可减少计费错误,节省的时间可重新分配给患者护理。放射科医生应该与开发人员合作,创建在数据安全环境中运行的llm版本。
{"title":"Less time Coding, more time Caring: Performance evaluation of ChatGPT-5 for ICD-10 coding of radiology reports","authors":"Tristan Ruhwedel ,&nbsp;Julian M.M. Rogasch ,&nbsp;Paul Martin Dahlke ,&nbsp;Seyd Shnayien ,&nbsp;Christian Furth ,&nbsp;Christoph Wetz ,&nbsp;Holger Amthauer ,&nbsp;Imke Schatka ,&nbsp;Nick Lasse Beetz","doi":"10.1016/j.ijmedinf.2026.106296","DOIUrl":"10.1016/j.ijmedinf.2026.106296","url":null,"abstract":"<div><h3>Introduction</h3><div>Worldwide radiologists are facing a high administrative workload. ICD-10 coding is mandatory for reimbursement in many health systems and a frequent source of billing errors. Large language models have shown promise in supporting coding related tasks, but previous studies with earlier ChatGPT versions reported mixed results and evidence specific to radiology reports remains scarce. We therefore aimed to investigate whether ChatGPT-5 can be consulted when assigning ICD-10 codes to radiology reports and whether this leads to a measurable time advantage.</div></div><div><h3>Methods</h3><div>2,738 fictious radiology reports across multiple modalities were derived from the PARROT database. Additionally, 100 fictitious PET/CT reports were created. Each report was assigned a single, most relevant ICD-10 code using ChatGPT-5. For PARROT, ChatGPT-derived codes were compared with predefined database reference labels. For PET/CT, ChatGPT-derived codes were compared with codes assigned by an independent manual coder. Exact and character-level concordance were assessed. In cases of discordance, a blinded adjudicator selected the most accurate ICD-10 code. Coding efficiency was evaluated for PET/CT reports by measuring coding time per report.</div></div><div><h3>Results</h3><div>For PARROT, exact-code concordance was 1,590/2,738 (58.1 %). In a random subset of 200 mismatches, blinded adjudication preferred the ChatGPT derived code in 123 and the reference label in 77 cases (p = 0.0015). Coding non-English reports resulted in significantly lower concordance (first character: p = 0.002; second/third characters: p &lt; 0.001; last characters: p = 0.012) and longer coding times than English reports (p = 0.002). Regarding PET/CT reports, median coding time was 8 s with ChatGPT and 135 s without. The median time saved was 127 s per report.</div></div><div><h3>Conclusion</h3><div>Applied to daily clinical care, higher code correctness might reduce billing errors, while saved time could be reallocated to patient care. Radiologists should collaborate with developers to create versions of LLMs that operate within data-secure environments.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106296"},"PeriodicalIF":4.1,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biometric Data in Post-Traumatic Stress Disorder Detection: A Scoping Review of Digital Health Applications. 创伤后应激障碍检测中的生物特征数据:数字健康应用的范围审查。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-15 DOI: 10.1016/j.ijmedinf.2026.106289
Phue Thet Khaing, Masaharu Nakayama

Context: Post-traumatic stress disorder (PTSD) is mainly assessed through self-reports and clinician interviews, which can delay recognition and limit reach. Biometric markers captured using digital technologies may enable earlier and more objective detections.

Purpose: To map biometric modalities used for PTSD detection in digital health, identify underused markers, characterise machine learning (ML)/artificial intelligence (AI) approaches, and assess sex-related analyses.

Methods: Guided by PRISMA-ScR, a protocol on the Open Science Framework was pre-registered and searches in PubMed, IEEE Xplore, and Google Scholar (2015-2025) were conducted. The full search string was: ("post-traumatic stress disorder" OR "PTSD") AND ("biometric data" OR "biosensor" OR "wearable technology") AND ("detection" OR "screening" OR "diagnosis" OR "monitoring") AND ("digital health" OR "mobile health" OR "AI-based" OR "machine learning"). Peer-reviewed human studies using biometric data with digital tools and/or ML/AI for PTSD detection were eligible. Of 3,312 records, 89 underwent full-text review, and 18 studies met the inclusion criteria.

Analysis: Data were categorised by biometric modality, digital platform (wearable devices, mobile applications, ML/AI systems), study population, and performance metrics (area under the curve, sensitivity, specificity). Findings were grouped thematically (physiological, neuroimaging, behavioural, genetic, multimodal) and synthesised narratively to identify trends, gaps, and the application of sex-stratified modelling.

Results: Most studies focused on physiological (e.g., heart rate, sleep) and neuroimaging (functional magnetic resonance imaging, electroencephalography) signals; behavioural and genetic modalities were underexplored. Data were frequently captured via wearables and mobile platforms, with ML commonly applied. Performance reporting was uneven, sex-stratified analyses were rare, and several promising modalities (e.g., eye-tracking, electrodermal activity) remain underused.

Conclusion: Digital biometric approaches can detect PTSD; however, progress has been slowed by heterogeneous study designs, inconsistent reporting, and limited attention to sex differences. Establishing common reporting standards, evaluating multimodal models in real-world settings, and developing algorithms incorporating sex for more equitable screening are warranted.

背景:创伤后应激障碍(PTSD)的评估主要通过自我报告和临床医生访谈,这可能会延迟识别和限制到达。使用数字技术捕获的生物特征标记可以实现更早和更客观的检测。目的:绘制用于数字健康中PTSD检测的生物识别模式,识别未充分利用的标记,表征机器学习(ML)/人工智能(AI)方法,并评估与性别相关的分析。方法:在PRISMA-ScR的指导下,预注册开放科学框架协议,并在PubMed、IEEE Xplore和谷歌Scholar(2015-2025)中进行检索。完整的搜索字符串是:(“创伤后应激障碍”或“PTSD”)和(“生物特征数据”或“生物传感器”或“可穿戴技术”)和(“检测”或“筛查”或“诊断”或“监测”)和(“数字健康”或“移动健康”或“基于人工智能”或“机器学习”)。使用生物特征数据与数字工具和/或ML/AI进行创伤后应激障碍检测的同行评审人类研究符合条件。在3312项记录中,89项进行了全文审查,18项研究符合纳入标准。分析:根据生物识别模式、数字平台(可穿戴设备、移动应用程序、ML/AI系统)、研究人群和性能指标(曲线下面积、灵敏度、特异性)对数据进行分类。研究结果按主题分组(生理、神经影像学、行为、遗传、多模态),并以叙事方式综合,以确定趋势、差距和性别分层模型的应用。结果:大多数研究集中在生理(如心率、睡眠)和神经影像学(功能磁共振成像、脑电图)信号;行为和遗传模式尚未得到充分探索。数据经常通过可穿戴设备和移动平台捕获,通常使用ML。绩效报告不平衡,性别分层分析很少,一些有前途的模式(如眼动追踪,皮肤电活动)仍未得到充分利用。结论:数字生物识别方法可以检测创伤后应激障碍;然而,异质性研究设计、不一致的报告以及对性别差异的关注有限,延缓了研究进展。有必要建立共同的报告标准,在现实环境中评估多模式模型,并开发包含性别的算法,以实现更公平的筛查。
{"title":"Biometric Data in Post-Traumatic Stress Disorder Detection: A Scoping Review of Digital Health Applications.","authors":"Phue Thet Khaing, Masaharu Nakayama","doi":"10.1016/j.ijmedinf.2026.106289","DOIUrl":"https://doi.org/10.1016/j.ijmedinf.2026.106289","url":null,"abstract":"<p><strong>Context: </strong>Post-traumatic stress disorder (PTSD) is mainly assessed through self-reports and clinician interviews, which can delay recognition and limit reach. Biometric markers captured using digital technologies may enable earlier and more objective detections.</p><p><strong>Purpose: </strong>To map biometric modalities used for PTSD detection in digital health, identify underused markers, characterise machine learning (ML)/artificial intelligence (AI) approaches, and assess sex-related analyses.</p><p><strong>Methods: </strong>Guided by PRISMA-ScR, a protocol on the Open Science Framework was pre-registered and searches in PubMed, IEEE Xplore, and Google Scholar (2015-2025) were conducted. The full search string was: (\"post-traumatic stress disorder\" OR \"PTSD\") AND (\"biometric data\" OR \"biosensor\" OR \"wearable technology\") AND (\"detection\" OR \"screening\" OR \"diagnosis\" OR \"monitoring\") AND (\"digital health\" OR \"mobile health\" OR \"AI-based\" OR \"machine learning\"). Peer-reviewed human studies using biometric data with digital tools and/or ML/AI for PTSD detection were eligible. Of 3,312 records, 89 underwent full-text review, and 18 studies met the inclusion criteria.</p><p><strong>Analysis: </strong>Data were categorised by biometric modality, digital platform (wearable devices, mobile applications, ML/AI systems), study population, and performance metrics (area under the curve, sensitivity, specificity). Findings were grouped thematically (physiological, neuroimaging, behavioural, genetic, multimodal) and synthesised narratively to identify trends, gaps, and the application of sex-stratified modelling.</p><p><strong>Results: </strong>Most studies focused on physiological (e.g., heart rate, sleep) and neuroimaging (functional magnetic resonance imaging, electroencephalography) signals; behavioural and genetic modalities were underexplored. Data were frequently captured via wearables and mobile platforms, with ML commonly applied. Performance reporting was uneven, sex-stratified analyses were rare, and several promising modalities (e.g., eye-tracking, electrodermal activity) remain underused.</p><p><strong>Conclusion: </strong>Digital biometric approaches can detect PTSD; however, progress has been slowed by heterogeneous study designs, inconsistent reporting, and limited attention to sex differences. Establishing common reporting standards, evaluating multimodal models in real-world settings, and developing algorithms incorporating sex for more equitable screening are warranted.</p>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"211 ","pages":"106289"},"PeriodicalIF":4.1,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When artificial intelligence guides and misguides clinicians: A critical appraisal of AI recommendation correctness and diagnostic decision-making 当人工智能引导和误导临床医生:对人工智能推荐正确性和诊断决策的批判性评估。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-14 DOI: 10.1016/j.ijmedinf.2026.106293
Hasan Nawaz Tahir , Anfal Khan , Muhammad Yousaf , Shahnila Javed , Muhammad Kamran Khan , Yousaf Ali
{"title":"When artificial intelligence guides and misguides clinicians: A critical appraisal of AI recommendation correctness and diagnostic decision-making","authors":"Hasan Nawaz Tahir ,&nbsp;Anfal Khan ,&nbsp;Muhammad Yousaf ,&nbsp;Shahnila Javed ,&nbsp;Muhammad Kamran Khan ,&nbsp;Yousaf Ali","doi":"10.1016/j.ijmedinf.2026.106293","DOIUrl":"10.1016/j.ijmedinf.2026.106293","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106293"},"PeriodicalIF":4.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
“Calibration or contamination?” Reassessing the evaluation of large language models for clinical mortality prediction “校准还是污染?”重新评估大型语言模型对临床死亡率预测的评价
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-14 DOI: 10.1016/j.ijmedinf.2026.106291
Zhihao Lei
{"title":"“Calibration or contamination?” Reassessing the evaluation of large language models for clinical mortality prediction","authors":"Zhihao Lei","doi":"10.1016/j.ijmedinf.2026.106291","DOIUrl":"10.1016/j.ijmedinf.2026.106291","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106291"},"PeriodicalIF":4.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Communicable diseases platform (CDP): Real-Time clinical analytics for infections 传染病平台(CDP):感染的实时临床分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.ijmedinf.2026.106277
Manuri De Silva , Alice Voskoboynik , Sailavan Ramesh , Janice Campbell , Saravanan Satkumaran , Daryl R. Cheng

Objective

Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).

Methods

In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.

Discussion

The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.
目的:传染性疾病,特别是季节性呼吸道疾病,是儿科就诊和住院的主要原因。现有的监测系统通常需要回顾性的人工数据整理,并侧重于人口统计或临床数据,而不是两者兼而有之。传染病平台(CDP)是一个动态数据平台,汇集了墨尔本皇家儿童医院(RCH)所有传染病报告的两种数据类型。方法在试点阶段,CDP从医院电子病历中提取呼吸道拭子阳性患者的去识别汇总数据。仪表板显示了2016年至2025年的阳性率和累计住院趋势,并进一步按病原体、年龄、表现类型和干预措施进行过滤。CDP提高了对临床概况、疾病负担和季节性模式的理解,支持更好的疫情控制、患者流量预测和临床监测。未来的发展包括免疫数据集成和机器学习算法评估,用于实时疫苗有效性估计和传染病预测建模。
{"title":"Communicable diseases platform (CDP): Real-Time clinical analytics for infections","authors":"Manuri De Silva ,&nbsp;Alice Voskoboynik ,&nbsp;Sailavan Ramesh ,&nbsp;Janice Campbell ,&nbsp;Saravanan Satkumaran ,&nbsp;Daryl R. Cheng","doi":"10.1016/j.ijmedinf.2026.106277","DOIUrl":"10.1016/j.ijmedinf.2026.106277","url":null,"abstract":"<div><h3>Objective</h3><div>Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).</div></div><div><h3>Methods</h3><div>In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.</div></div><div><h3>Discussion</h3><div>The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106277"},"PeriodicalIF":4.1,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinicians’ perspectives on electronic medical records use in diabetes outpatient Care: A qualitative study 临床医生对糖尿病门诊使用电子病历的看法:一项定性研究。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-11 DOI: 10.1016/j.ijmedinf.2026.106275
Wenyong Wang , Mahnaz Samadbeik , Gaurav Puri , Donald S.A. McLeod , Elton Lobo , Tuan Duong , Titus Kirwa , Clair Sullivan

Background

Electronic Medical Records (EMRs) aim to improve efficiency, safety, and quality of care. However, the impact of EMR implementation, particularly in outpatient diabetes care, remains underexplored. This study explored clinicians’ perspectives on EMR use in diabetes outpatient care.

Methods

This qualitative study, conducted in line with COREQ guidelines, involved four focus groups with 22 clinicians (doctors, nurses, and allied health) at a metropolitan diabetes service in Queensland, Australia. Data were analysed using deductive content analysis, guided by the Quintuple Aim and Technology Acceptance Model/Unified Theory of Acceptance and Use of Technology frameworks.

Results

Clinicians reported mixed outcomes across the Quintuple Aim domains, shaped by technology adoption constructs. Facilitators such as improved efficiency, access to patient information, and prescribing safety reflected perceived usefulness and positive attitudes, contributing to favourable outcomes across multiple Quintuple Aim. Barriers such as navigation complexity, technical issues, alert fatigue, and overwhelming training led to negative outcomes in EMR use. Tensions around documentation practices and patient expectations of system use, resulted in mixed outcomes. Overall, clinicians viewed EMRs as essential, but sustained adoption required improved usability, tailored training, and better system integration.

Conclusion

This study concludes that while the EMRs improved safety, efficiency, and access to information, their design and implementation also introduced burdens that negatively affected clinician experience. EMRs significantly shape the healthcare workforce, influencing workflow, wellbeing, and professional engagement. In outpatient diabetes care, specific workflow challenges such as glycaemic data integration highlight that existing EMR designs may not fully support the complexity of chronic disease management. To maximise benefits, EMR initiatives should be approached as quality improvement activities, with role-specific training, reliable infrastructure, and clinician involvement in system optimisation. Future research should address usability challenges, enhance integration, and ensure that both clinician and patient perspectives guide digital health transformation.
背景:电子病历(EMRs)旨在提高医疗效率、安全性和质量。然而,EMR实施的影响,特别是在门诊糖尿病护理方面,仍未得到充分探讨。本研究探讨临床医生在糖尿病门诊医疗中使用电子病历的观点。方法:本定性研究按照COREQ指南进行,涉及澳大利亚昆士兰州一家大都市糖尿病服务中心的22名临床医生(医生、护士和专职健康人员)的四个焦点小组。在“五重目标”和“技术接受模型”/“技术接受与使用统一理论”框架的指导下,采用演绎内容分析对数据进行分析。结果:临床医生报告了五项目标领域的混合结果,这些结果受到技术采用结构的影响。提高效率、获取患者信息和处方安全等促进因素反映了感知到的有用性和积极态度,有助于在多个“五大目标”中取得有利结果。导航复杂性、技术问题、警报疲劳和压倒性的培训等障碍导致EMR使用的负面结果。文档实践和患者对系统使用的期望之间的紧张关系导致了不同的结果。总体而言,临床医生认为电子病历是必要的,但持续采用需要改进可用性、量身定制的培训和更好的系统集成。结论:本研究得出结论,虽然电子病历提高了安全性、效率和信息获取,但其设计和实施也带来了负担,对临床医生的体验产生了负面影响。电子病历极大地塑造了医疗保健人力,影响了工作流程、健康和专业参与度。在门诊糖尿病护理中,特定的工作流程挑战,如血糖数据整合,突出表明现有的电子病历设计可能无法完全支持慢性疾病管理的复杂性。为了最大限度地提高效益,电子病历计划应该作为质量改进活动,具有特定角色的培训、可靠的基础设施和临床医生参与系统优化。未来的研究应解决可用性挑战,加强整合,并确保临床医生和患者的观点都能指导数字健康转型。
{"title":"Clinicians’ perspectives on electronic medical records use in diabetes outpatient Care: A qualitative study","authors":"Wenyong Wang ,&nbsp;Mahnaz Samadbeik ,&nbsp;Gaurav Puri ,&nbsp;Donald S.A. McLeod ,&nbsp;Elton Lobo ,&nbsp;Tuan Duong ,&nbsp;Titus Kirwa ,&nbsp;Clair Sullivan","doi":"10.1016/j.ijmedinf.2026.106275","DOIUrl":"10.1016/j.ijmedinf.2026.106275","url":null,"abstract":"<div><h3>Background</h3><div>Electronic Medical Records (EMRs) aim to improve efficiency, safety, and quality of care. However, the impact of EMR implementation, particularly in outpatient diabetes care, remains underexplored. This study explored clinicians’ perspectives on EMR use in diabetes outpatient care.</div></div><div><h3>Methods</h3><div>This qualitative study, conducted in line with COREQ guidelines, involved four focus groups with 22 clinicians (doctors, nurses, and allied health) at a metropolitan diabetes service in Queensland, Australia. Data were analysed using deductive content analysis, guided by the Quintuple Aim and Technology Acceptance Model/Unified Theory of Acceptance and Use of Technology frameworks.</div></div><div><h3>Results</h3><div>Clinicians reported mixed outcomes across the Quintuple Aim domains, shaped by technology adoption constructs. Facilitators such as improved efficiency, access to patient information, and prescribing safety reflected perceived usefulness and positive attitudes, contributing to favourable outcomes across multiple Quintuple Aim. Barriers such as navigation complexity, technical issues, alert fatigue, and overwhelming training led to negative outcomes in EMR use. Tensions around documentation practices and patient expectations of system use, resulted in mixed outcomes<strong>.</strong> Overall, clinicians viewed EMRs as essential, but sustained adoption required improved usability, tailored training, and better system integration.</div></div><div><h3>Conclusion</h3><div>This study concludes that while the EMRs improved safety, efficiency, and access to information, their design and implementation also introduced burdens that negatively affected clinician experience. EMRs significantly shape the healthcare workforce, influencing workflow, wellbeing, and professional engagement. In outpatient diabetes care, specific workflow challenges such as glycaemic data integration highlight that existing EMR designs may not fully support the complexity of chronic disease management. To maximise benefits, EMR initiatives should be approached as quality improvement activities, with role-specific training, reliable infrastructure, and clinician involvement in system optimisation. Future research should address usability challenges, enhance integration, and ensure that both clinician and patient perspectives guide digital health transformation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106275"},"PeriodicalIF":4.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive value of machine learning for mortality risk in aortic dissection: a systematic review and meta-analysis 机器学习对主动脉夹层死亡风险的预测价值:系统回顾和荟萃分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-11 DOI: 10.1016/j.ijmedinf.2026.106271
Zhihong Han , Baixin Li , Jie Liu

Background

Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.

Methods

A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.

Results

In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.

Conclusion

ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.
主动脉夹层(AD)是一种严重的心血管疾病,具有短期死亡的重大风险。一些研究人员努力利用机器学习(ML)方法来开发阿尔茨海默病死亡风险的预测模型。然而,关于这些模型准确性的系统证据仍然很少,这对风险评估工具的开发和增强提出了挑战。因此,本研究旨在系统地回顾ML预测AD患者死亡风险的可靠性。方法检索截止到2025年9月11日的PubMed、Cochrane、Embase和Web of Science。运用预测模型偏倚风险评估工具(PROBAST)估计纳入研究的偏倚风险。根据AD类型和死亡时间进行亚组分析。结果共纳入35项研究,共19,838例AD患者。结果显示,在训练数据集中,ML模型预测AD死亡率的敏感性(SEN)为0.75 (95% CI: 0.72-0.78),特异性(SPE)为0.77 (95% CI: 0.74-0.80)。在主要关注TAAD的验证集中,SEN为0.79 (95% CI: 0.74-0.84), SPE为0.78 (95% CI: 0.68-0.85)。对于院内死亡率,SEN为0.78 (95% CI: 0.72-0.83), SPE为0.77 (95% CI: 0.65-0.86);院外死亡率的SEN和SPE分别为0.81 ~ 0.84和0.74 ~ 0.86。结论ml模型在预测AD死亡风险方面具有较好的准确性,在一定程度上优于现有评分系统。未来的研究应纳入更多的多中心、多民族和地域差异的病例,以开发更广泛适用的风险预测工具,并为量身定制的预防策略提供见解。
{"title":"Predictive value of machine learning for mortality risk in aortic dissection: a systematic review and meta-analysis","authors":"Zhihong Han ,&nbsp;Baixin Li ,&nbsp;Jie Liu","doi":"10.1016/j.ijmedinf.2026.106271","DOIUrl":"10.1016/j.ijmedinf.2026.106271","url":null,"abstract":"<div><h3>Background</h3><div>Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.</div></div><div><h3>Methods</h3><div>A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.</div></div><div><h3>Results</h3><div>In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.</div></div><div><h3>Conclusion</h3><div>ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106271"},"PeriodicalIF":4.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing 使用自然语言处理从临床记录中自动提取氟嘧啶治疗和治疗相关毒性
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-10 DOI: 10.1016/j.ijmedinf.2026.106276
Xizhi Wu , Madeline S. Kreider , Philip E. Empey , Chenyu Li , Yanshan Wang

Objective

Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.

Materials and methods

We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.

Results

Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.

Discussion

LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.

Conclusion

LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.
目的氟嘧啶广泛用于结直肠癌和乳腺癌,但与手足综合征和心脏毒性等毒性有关。由于毒性文件通常嵌入在临床记录中,我们旨在开发和评估自然语言处理(NLP)方法来提取治疗和毒性信息。材料与方法我们构建了一个金标准数据集,包含来自204,165名成年肿瘤患者的236份临床记录。领域专家注释了与治疗方案和毒性有关的类别。我们开发了基于规则的、基于机器学习的(随机森林[RF]、支持向量机[SVM]、逻辑回归[LR])、基于深度学习的(BERT、ClinicalBERT)和基于大型语言模型(LLM)的NLP方法(零射击和错误分析提示)。对每个模型进行5次交叉验证。结果serror分析提示在治疗和毒理提取的精密度、召回率和F1得分(F1 = 1.000)均达到了最佳水平(F1 = 0.965),而zero-shot表现中等(治疗F1 = 0.889,毒理提取F1 = 0.854),基于规则的治疗F1 = 1.000,毒理提取F1 = 0.904。LR和SVM的毒性提取效果分别为2、4位(LR F1 = 0.914, SVM F1 = 0.903)。深度学习和RF表现不佳,BERT在治疗方面的表现为F1 = 0.792,在毒性提取方面的表现为F1 = 0.837。,ClinicalBERT在治疗方面达到F1 = 0.797,毒性提取方面达到F1 = 0.884)。治疗组RF为F1 = 0.745,毒副作用提取组RF为F1 = 0.853。讨论基于lmm的误差分析优于所有其他方法,其次是机器学习方法。机器学习和深度学习方法受到小型训练数据的限制,并且泛化能力有限,特别是对于罕见的类别。结论基于llm的误差分析能最有效地从临床记录中提取氟嘧啶的治疗和毒性信息,在支持肿瘤研究和药物警戒方面具有很强的潜力。
{"title":"Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing","authors":"Xizhi Wu ,&nbsp;Madeline S. Kreider ,&nbsp;Philip E. Empey ,&nbsp;Chenyu Li ,&nbsp;Yanshan Wang","doi":"10.1016/j.ijmedinf.2026.106276","DOIUrl":"10.1016/j.ijmedinf.2026.106276","url":null,"abstract":"<div><h3>Objective</h3><div>Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.</div></div><div><h3>Materials and methods</h3><div>We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.</div></div><div><h3>Results</h3><div>Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.</div></div><div><h3>Discussion</h3><div>LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.</div></div><div><h3>Conclusion</h3><div>LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106276"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1