首页 > 最新文献

Applied Clinical Informatics最新文献

英文 中文
Measuring The Accuracy and Reproducibility of DeepSeek R1, Claude 3.5 Sonnet, and GPT‑4.1 on Complex Clinical Scenarios. 测量DeepSeek R1、Claude 3.5 Sonnet和GPT‑4.1在复杂临床情况下的准确性和可重复性。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-02-09 DOI: 10.1055/a-2807-4256
Robert E Hoyt, Maria Bajwa

Background: The integration of large language models (LLMs) into clinical diagnostics presents significant challenges regarding their accuracy and reliability.

Objective: This study aimed to evaluate the performance of DeepSeek R1, an open-source reasoning model, alongside two other LLMs, GPT-4.1 and Claude 3.5 Sonnet, across multiple-choice clinical cases.

Methods: A dataset of complex medical cases representative of real-world clinical practice was selected. For efficiency, models were accessed via application programming interfaces (APIs) and assessed using standardized prompts and a predefined evaluation protocol.

Results: The models demonstrated an overall accuracy of 77.1%, with GPT-4 producing the fewest errors and Claude 3.5 the most. The reproducibility analysis indicated that the tests were very repeatable: DeepSeek (100%), GPT-4.1 (97.5%), and Claude 3.5 Sonnet (92%).

Conclusions: While LLMs show promise for enhancing diagnostics, ongoing scrutiny is required to address error rates and validate standard medical answers. Given the limited dataset and prompting protocol, findings should not be interpreted as broader equivalence in real‑world clinical reasoning. This study demonstrates the need for robust evaluation standards, attention to error rates, and further research.

背景:将大型语言模型(llm)集成到临床诊断中,对其准确性和可靠性提出了重大挑战。目的:本研究旨在评估DeepSeek R1(一个开源推理模型)与另外两个法学硕士(GPT-4.1和Claude 3.5 Sonnet)在多项选择临床病例中的表现。方法:选取具有实际临床实践代表性的复杂病例数据集。为了提高效率,可以通过应用程序编程接口(api)访问模型,并使用标准化提示和预定义的评估协议对模型进行评估。结果:模型的总体准确率为77.1%,其中GPT-4产生的错误最少,Claude 3.5产生的错误最多。可重复性分析表明,测试具有很高的可重复性:DeepSeek(100%)、GPT-4.1(97.5%)和Claude 3.5 Sonnet(92%)。结论:虽然llm显示出增强诊断的希望,但需要持续的审查来解决错误率并验证标准医学答案。鉴于有限的数据集和提示方案,研究结果不应被解释为在现实世界的临床推理中具有更广泛的等效性。这项研究表明,需要健全的评估标准,关注错误率,并进一步研究。
{"title":"Measuring The Accuracy and Reproducibility of DeepSeek R1, Claude 3.5 Sonnet, and GPT‑4.1 on Complex Clinical Scenarios.","authors":"Robert E Hoyt, Maria Bajwa","doi":"10.1055/a-2807-4256","DOIUrl":"https://doi.org/10.1055/a-2807-4256","url":null,"abstract":"<p><strong>Background: </strong>The integration of large language models (LLMs) into clinical diagnostics presents significant challenges regarding their accuracy and reliability.</p><p><strong>Objective: </strong>This study aimed to evaluate the performance of DeepSeek R1, an open-source reasoning model, alongside two other LLMs, GPT-4.1 and Claude 3.5 Sonnet, across multiple-choice clinical cases.</p><p><strong>Methods: </strong>A dataset of complex medical cases representative of real-world clinical practice was selected. For efficiency, models were accessed via application programming interfaces (APIs) and assessed using standardized prompts and a predefined evaluation protocol.</p><p><strong>Results: </strong>The models demonstrated an overall accuracy of 77.1%, with GPT-4 producing the fewest errors and Claude 3.5 the most. The reproducibility analysis indicated that the tests were very repeatable: DeepSeek (100%), GPT-4.1 (97.5%), and Claude 3.5 Sonnet (92%).</p><p><strong>Conclusions: </strong>While LLMs show promise for enhancing diagnostics, ongoing scrutiny is required to address error rates and validate standard medical answers. Given the limited dataset and prompting protocol, findings should not be interpreted as broader equivalence in real‑world clinical reasoning. This study demonstrates the need for robust evaluation standards, attention to error rates, and further research.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing Push Notification Volume and Delivery Patterns in Hospital Medicine. 医院医学推送通知数量和传递模式的特征。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-02-04 DOI: 10.1055/a-2802-2912
Averi Wilson, Andrew Patrick Bain, Suhani Goyal, Abey Thomas, Robert W Turer, Craig Glazer, DuWayne L Willett, Wendy Yin, Samuel McDonald

Background: Push notifications are a common method of clinical communication in inpatient settings, yet their volume and delivery patterns have not been described. Alert fatigue has been well described in healthcare, and push notifications may be a new contributor.

Objective: To characterize the volume, type, and temporal distribution of push notifications received by hospitalists across distinct clinical roles in a large academic health system.

Methods: We conducted a cross-sectional analysis of electronic health record (EHR) audit log data from June 1, 2024, to June 1, 2025, at a large academic health system using Epic (Verona, WI) EHR. All push notifications received by attending hospitalists were extracted, categorized (secure message, results, other), and summarized by hour, hospitalist role, and device type.

Results: Ninety-seven hospitalists received 1,114,657 push notifications over one year, with a median of 11 (3-24) notifications per hour. Rounding hospitalists received 9 (7-12) notifications per patient per working day. Secure message notifications accounted for the majority and result-related notifications comprised only 2.2% of notifications. Notifications peaked midday and were received throughout the day, including outside of scheduled shift times.

Conclusions: Hospitalists are exposed to a high volume of push notifications, which may contribute to alert fatigue and ultimately impact patient safety and clinician wellbeing. System-level efforts to prioritize clinically meaningful notifications, refine notification settings, and enhance secure-messaging infrastructure are needed to protect clinician attention and support patient safety.

背景:推送通知是住院患者临床交流的一种常见方法,但其数量和交付模式尚未被描述。警报疲劳已经在医疗保健领域得到了很好的描述,推送通知可能是一个新的贡献者。目的:描述一个大型学术卫生系统中不同临床角色的医院医生收到的推送通知的数量、类型和时间分布。方法:我们对一个使用Epic (Verona, WI) EHR的大型学术医疗系统从2024年6月1日至2025年6月1日的电子健康记录(EHR)审计日志数据进行了横断面分析。对住院医生收到的所有推送通知进行提取、分类(安全消息、结果、其他),并按小时、医院医生角色和设备类型进行汇总。结果:97家医院在一年内收到1,114,657条推送通知,平均每小时11条(3-24条)。门诊医生每个工作日每位病人收到9(7-12)封通知。安全消息通知占大多数,与结果相关的通知仅占2.2%。通知在中午达到高峰,全天都收到,包括在预定的轮班时间之外。结论:医院面临着大量的推送通知,这可能会导致警觉疲劳,最终影响患者安全和临床医生的健康。需要系统级努力优先考虑临床有意义的通知,完善通知设置,并加强安全消息传递基础设施,以保护临床医生的注意力并支持患者安全。
{"title":"Characterizing Push Notification Volume and Delivery Patterns in Hospital Medicine.","authors":"Averi Wilson, Andrew Patrick Bain, Suhani Goyal, Abey Thomas, Robert W Turer, Craig Glazer, DuWayne L Willett, Wendy Yin, Samuel McDonald","doi":"10.1055/a-2802-2912","DOIUrl":"https://doi.org/10.1055/a-2802-2912","url":null,"abstract":"<p><strong>Background: </strong>Push notifications are a common method of clinical communication in inpatient settings, yet their volume and delivery patterns have not been described. Alert fatigue has been well described in healthcare, and push notifications may be a new contributor.</p><p><strong>Objective: </strong>To characterize the volume, type, and temporal distribution of push notifications received by hospitalists across distinct clinical roles in a large academic health system.</p><p><strong>Methods: </strong>We conducted a cross-sectional analysis of electronic health record (EHR) audit log data from June 1, 2024, to June 1, 2025, at a large academic health system using Epic (Verona, WI) EHR. All push notifications received by attending hospitalists were extracted, categorized (secure message, results, other), and summarized by hour, hospitalist role, and device type.</p><p><strong>Results: </strong>Ninety-seven hospitalists received 1,114,657 push notifications over one year, with a median of 11 (3-24) notifications per hour. Rounding hospitalists received 9 (7-12) notifications per patient per working day. Secure message notifications accounted for the majority and result-related notifications comprised only 2.2% of notifications. Notifications peaked midday and were received throughout the day, including outside of scheduled shift times.</p><p><strong>Conclusions: </strong>Hospitalists are exposed to a high volume of push notifications, which may contribute to alert fatigue and ultimately impact patient safety and clinician wellbeing. System-level efforts to prioritize clinically meaningful notifications, refine notification settings, and enhance secure-messaging infrastructure are needed to protect clinician attention and support patient safety.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Standardizing Data Elements for Implementation of ICU Liberation Bundle. 实现ICU解放包的数据元素标准化。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-02-03 DOI: 10.1055/a-2802-7458
Md Fantacher Islam, Molly Douglas, Jarrod Mosier, Vignesh Subbian

Background and significance: Getting patients out of intensive care units (ICUs) is a major goal for acute care clinicians, as prolonged stays increase the risk of complications and strain critical resources such as staff, equipment, and beds. The ICU Liberation bundle or the ABCDEF (A-F) care bundle is an evidence-based framework for improving outcomes in critically ill patients by addressing pain, sedation, delirium, mobility, and family engagement. However, variability in documentation and lack of standardized data elements hinder effective implementation and evaluation of adherence to bundle components.

Objectives: This study aims to characterize data elements of the A-F liberation bundle using a large, single-center critical care database and to develop standardized bundle cards that map bundle components to controlled vocabularies.

Methods: We conducted a retrospective analysis of data elements related to A-F bundle using the MIMIC-IV database. Clinical concepts were mapped to standardized vocabularies and aligned with the OMOP common data model. Bundle cards were developed for each component to provide structured, accessible documentation of assessment tools, adherence criteria, and terminology mappings.

Results: Pain assessments were documented in over 11,000 patients, with a median of 23 assessments per day. Sedation levels for nearly 59,000 patients were evaluated, with 37.7% meeting Society of Critical Care Medicine (SCCM) adherence criteria. Delirium assessments followed standardized protocols incorporating RASS and CAM-ICU scores. Components E and F lacked formal compliance specifications; bundle cards for these components identified key activities and highlighted gaps in standardized vocabularies. Adherence analyses revealed variability likely due to non-standardized documentation practices.

Conclusion: We developed and validated six ICU Liberation Bundle cards that map bundle components to standardized vocabularies and common data models, enabling retrospective adherence evaluation in real-world data. These information resources promote consistent documentation, support interoperability, and provide a foundation for prospective monitoring to enhance bundle implementation in critical care.

背景和意义:将患者从重症监护病房(icu)中取出是急症护理临床医生的主要目标,因为延长住院时间会增加并发症的风险,并使人员、设备和床位等关键资源紧张。ICU解放包或ABCDEF (A-F)护理包是一个基于证据的框架,通过解决疼痛、镇静、谵妄、活动能力和家庭参与问题,改善危重患者的预后。然而,文档的可变性和缺乏标准化的数据元素阻碍了对捆绑组件的有效实施和评估。目的:本研究旨在利用大型单中心重症监护数据库表征a - f解放包的数据元素,并开发标准化的包卡,将包组件映射到受控词汇表。方法:采用MIMIC-IV数据库对a - f束相关数据元素进行回顾性分析。临床概念被映射到标准化词汇表,并与OMOP公共数据模型保持一致。为每个组件开发了包卡,以提供结构化的、可访问的评估工具、遵守标准和术语映射的文档。结果:超过11,000名患者记录了疼痛评估,平均每天23次评估。对近59,000名患者的镇静水平进行了评估,其中37.7%符合重症医学会(SCCM)的依从性标准。谵妄评估采用标准化方案,包括RASS和CAM-ICU评分。组件E和F缺乏正式的法规遵循规范;这些组件的捆绑卡确定了关键活动,并突出了标准化词汇表中的差距。依从性分析揭示了可能由于非标准化文档实践而导致的可变性。结论:我们开发并验证了6个ICU解放Bundle卡片,这些卡片将Bundle组件映射到标准化词汇表和通用数据模型,从而能够在真实数据中进行回顾性依从性评估。这些信息资源促进了文档的一致性,支持互操作性,并为未来监测提供了基础,以加强重症监护中的捆绑实施。
{"title":"Standardizing Data Elements for Implementation of ICU Liberation Bundle.","authors":"Md Fantacher Islam, Molly Douglas, Jarrod Mosier, Vignesh Subbian","doi":"10.1055/a-2802-7458","DOIUrl":"https://doi.org/10.1055/a-2802-7458","url":null,"abstract":"<p><strong>Background and significance: </strong>Getting patients out of intensive care units (ICUs) is a major goal for acute care clinicians, as prolonged stays increase the risk of complications and strain critical resources such as staff, equipment, and beds. The ICU Liberation bundle or the ABCDEF (A-F) care bundle is an evidence-based framework for improving outcomes in critically ill patients by addressing pain, sedation, delirium, mobility, and family engagement. However, variability in documentation and lack of standardized data elements hinder effective implementation and evaluation of adherence to bundle components.</p><p><strong>Objectives: </strong>This study aims to characterize data elements of the A-F liberation bundle using a large, single-center critical care database and to develop standardized bundle cards that map bundle components to controlled vocabularies.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of data elements related to A-F bundle using the MIMIC-IV database. Clinical concepts were mapped to standardized vocabularies and aligned with the OMOP common data model. Bundle cards were developed for each component to provide structured, accessible documentation of assessment tools, adherence criteria, and terminology mappings.</p><p><strong>Results: </strong>Pain assessments were documented in over 11,000 patients, with a median of 23 assessments per day. Sedation levels for nearly 59,000 patients were evaluated, with 37.7% meeting Society of Critical Care Medicine (SCCM) adherence criteria. Delirium assessments followed standardized protocols incorporating RASS and CAM-ICU scores. Components E and F lacked formal compliance specifications; bundle cards for these components identified key activities and highlighted gaps in standardized vocabularies. Adherence analyses revealed variability likely due to non-standardized documentation practices.</p><p><strong>Conclusion: </strong>We developed and validated six ICU Liberation Bundle cards that map bundle components to standardized vocabularies and common data models, enabling retrospective adherence evaluation in real-world data. These information resources promote consistent documentation, support interoperability, and provide a foundation for prospective monitoring to enhance bundle implementation in critical care.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing HIV Care Engagement: Usability of a mHealth App for Identifying and Retaining Individuals with Nonviral Suppression in Digital Cohort. 优化艾滋病毒护理参与:移动健康应用程序在数字队列中识别和保留非病毒抑制个体的可用性。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-01-01 Epub Date: 2026-01-30 DOI: 10.1055/a-2786-0291
Fabiana Cristina Dos Santos, Sophia Mclnerney, Miya C Tate, Aadia Rana, D Scott Batey, Rebecca Schnall

Drive to Zero is a mobile health application (app) designed to identify and retain people with HIV (PWH) who have experienced challenges with achieving or maintaining viral suppression. The app targets PWH who have lacked documented HIV care in the past months and are experiencing medication adherence barriers. Features include an interactive chat for communicating with the study team and access to educational resources to support care engagement and health management.This usability study aimed to assess the Drive to Zero app's ease of use and interface design through expert heuristic evaluation and end-user testing.Usability was evaluated through two approaches: heuristic evaluations conducted by five informatics experts following Nielsen's usability principles, and end-user testing with 20 PWH using the validated Post-Study System Usability Questionnaire and qualitative interviews to collect feedback on app functionality and user experience.Heuristic experts and end-users demonstrated satisfaction with the app's appearance, reporting that it has a simple and intuitive interface for identifying and retaining PWH, which will assist them with study engagement and ultimately reengage with HIV care. However, participants highlighted areas needing improvement, suggesting better accessibility of "home" and "help" buttons to improve user control and a more detailed explanation of the incentive program to enhance user engagement and retention.Usability evaluations provided valuable insights into the Drive to Zero app's design. Areas for improvement were enhancing user controls and improving the readability of the incentive program. These findings will guide iterative refinements, ensuring that future versions of the app improve the usability and acceptability of its target audience.

“走向零”是一款移动健康应用程序,旨在识别和留住在实现或维持病毒抑制方面遇到挑战的艾滋病毒感染者。该应用程序针对的是在过去几个月里缺乏记录在案的艾滋病毒护理的PWH,并且正在经历药物依从性障碍。其功能包括与学习团队进行交流的交互式聊天,以及访问教育资源以支持护理参与和健康管理。这项可用性研究旨在通过专家启发式评估和最终用户测试来评估Drive to Zero应用程序的易用性和界面设计。可用性通过两种方法进行评估:由五位信息学专家根据尼尔森可用性原则进行启发式评估,以及使用经过验证的研究后系统可用性问卷和定性访谈对20个PWH进行最终用户测试,以收集对应用功能和用户体验的反馈。启发式专家和最终用户对应用程序的外观表示满意,报告说它具有简单直观的界面,用于识别和保留PWH,这将帮助他们参与学习并最终重新参与艾滋病毒护理。然而,与会者强调了需要改进的地方,建议增加“主页”和“帮助”按钮的可访问性,以改善用户控制,并更详细地解释激励计划,以提高用户参与度和留存率。可用性评估为Drive to Zero应用的设计提供了有价值的见解。需要改进的领域是加强用户控制和改进奖励方案的可读性。这些发现将指导迭代改进,确保应用程序的未来版本提高其目标受众的可用性和可接受性。
{"title":"Optimizing HIV Care Engagement: Usability of a mHealth App for Identifying and Retaining Individuals with Nonviral Suppression in Digital Cohort.","authors":"Fabiana Cristina Dos Santos, Sophia Mclnerney, Miya C Tate, Aadia Rana, D Scott Batey, Rebecca Schnall","doi":"10.1055/a-2786-0291","DOIUrl":"10.1055/a-2786-0291","url":null,"abstract":"<p><p>Drive to Zero is a mobile health application (app) designed to identify and retain people with HIV (PWH) who have experienced challenges with achieving or maintaining viral suppression. The app targets PWH who have lacked documented HIV care in the past months and are experiencing medication adherence barriers. Features include an interactive chat for communicating with the study team and access to educational resources to support care engagement and health management.This usability study aimed to assess the Drive to Zero app's ease of use and interface design through expert heuristic evaluation and end-user testing.Usability was evaluated through two approaches: heuristic evaluations conducted by five informatics experts following Nielsen's usability principles, and end-user testing with 20 PWH using the validated Post-Study System Usability Questionnaire and qualitative interviews to collect feedback on app functionality and user experience.Heuristic experts and end-users demonstrated satisfaction with the app's appearance, reporting that it has a simple and intuitive interface for identifying and retaining PWH, which will assist them with study engagement and ultimately reengage with HIV care. However, participants highlighted areas needing improvement, suggesting better accessibility of \"home\" and \"help\" buttons to improve user control and a more detailed explanation of the incentive program to enhance user engagement and retention.Usability evaluations provided valuable insights into the Drive to Zero app's design. Areas for improvement were enhancing user controls and improving the readability of the incentive program. These findings will guide iterative refinements, ensuring that future versions of the app improve the usability and acceptability of its target audience.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"39-45"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Abandoned Inpatient Orders: An Opportunity for Improving CPOE Safety and Efficiency. 放弃住院医嘱:提高CPOE安全性和效率的机会。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-01-01 Epub Date: 2026-01-30 DOI: 10.1055/a-2786-0551
Anne Grauer, Yuyang Yang, Jo Applebaum, Yelstin Fernandes, David Liebovitz, Jason Adelman, Bruce Lambert, William Galanter

Abandoned medication orders-those initiated but not signed-represent a potential safety risk and an indicator of electronic health record (EHR) inefficiency. This study explores inpatient medication abandonment across two large tertiary healthcare systems using different EHRs.Silent alerts were deployed to identify abandoned orders at Site 1 (June 2018-May 2019) and Site 2 (July 2020-May 2023). At Site 1, alerts triggered on all inpatient medication orders. At Site 2, alerts were part of a broader study implementing indication alerts; only orders for study medications triggered alerts. An abandoned order was defined as an order initiated but not signed within 24 hours of initiation. We calculated abandonment rates and rates of reorders, and performed regression to examine the association between abandonment and clinician, patient, and order characteristics. Exponential models were fit to characterize the chronology of reordering.Among 6.8 million medication orders, abandonment rates were 11.2% at Site 1 and 25.0% at Site 2. Due to fundamental differences in alert configuration and order capture, no direct statistical comparison of abandonment rates between the two sites was conducted. Over half of abandoned orders were reordered within 24 hours (65.3% at Site 1; 54.2% at Site 2). The chronology of reordering was similar at both institutions. Attendings, the most senior clinicians, had the lowest rates of abandonment. Abandonment rates decreased as clinicians placed more orders, but rose as clinicians ordered on more unique patients. Abandonments were higher when ordering for children compared with adults.Order abandonment is common and varies by patient's age, clinician type, and workload. Abandonment rates declined as house staff providers advanced in training, signifying clinical experience plays a role. Frequent reordering suggests that workflow interruptions or modifications, rather than intentional medication cancellation, may lead to a significant proportion of abandonments. Similarity in the timing of reordering between healthcare systems suggest common reordering processes across sites. Our findings demonstrate significant order abandonment rates, with the potential to use abandonment as a metric to improve computerized provider order entry (CPOE) functionality, clinicians' workflows, and patient safety.

废弃的医嘱——那些已启动但未签署的医嘱——代表着潜在的安全风险和电子健康记录(EHR)效率低下的一个指标。本研究探讨了两个大型三级医疗保健系统中使用不同电子病历的住院患者药物放弃情况。在Site 1(2018年6月至2019年5月)和Site 2(2020年7月至2023年5月)部署了无声警报,以识别废弃订单。在Site 1,所有住院病人的用药订单都会触发警报。在站点2,警报是实施适应症警报的更广泛研究的一部分;只有研究药物的订单才会触发警报。被放弃的订单被定义为启动但未在启动后24小时内签署的订单。我们计算了放弃率和再订货率,并进行了回归来检验放弃与临床医生、患者和订单特征之间的关系。指数模型适合描述重新排序的时间顺序。在680万份用药单中,站点1的放弃率为11.2%,站点2的放弃率为25.0%。由于警报配置和订单捕获的根本差异,没有对两个站点之间的放弃率进行直接统计比较。超过一半的废弃订单在24小时内被重新订购(Site 1为65.3%,Site 2为54.2%)。在这两个机构中,重新排序的时间顺序是相似的。作为最资深的临床医生,主治医生的放弃率最低。放弃率随着临床医生下更多的订单而下降,但随着临床医生对更多独特的患者下订单而上升。与成人相比,为儿童订购的遗弃率更高。订单放弃是常见的,并因患者的年龄、临床医生类型和工作量而异。随着医护人员培训水平的提高,放弃率有所下降,这表明临床经验起了作用。频繁的重新排序表明,工作流程中断或修改,而不是故意取消药物,可能导致很大比例的放弃。医疗保健系统之间重新排序时间的相似性表明跨站点的共同重新排序过程。我们的研究结果显示了显著的订单放弃率,有可能将放弃作为改进计算机化供应商订单输入(CPOE)功能、临床医生工作流程和患者安全的指标。
{"title":"Abandoned Inpatient Orders: An Opportunity for Improving CPOE Safety and Efficiency.","authors":"Anne Grauer, Yuyang Yang, Jo Applebaum, Yelstin Fernandes, David Liebovitz, Jason Adelman, Bruce Lambert, William Galanter","doi":"10.1055/a-2786-0551","DOIUrl":"10.1055/a-2786-0551","url":null,"abstract":"<p><p>Abandoned medication orders-those initiated but not signed-represent a potential safety risk and an indicator of electronic health record (EHR) inefficiency. This study explores inpatient medication abandonment across two large tertiary healthcare systems using different EHRs.Silent alerts were deployed to identify abandoned orders at Site 1 (June 2018-May 2019) and Site 2 (July 2020-May 2023). At Site 1, alerts triggered on all inpatient medication orders. At Site 2, alerts were part of a broader study implementing indication alerts; only orders for study medications triggered alerts. An abandoned order was defined as an order initiated but not signed within 24 hours of initiation. We calculated abandonment rates and rates of reorders, and performed regression to examine the association between abandonment and clinician, patient, and order characteristics. Exponential models were fit to characterize the chronology of reordering.Among 6.8 million medication orders, abandonment rates were 11.2% at Site 1 and 25.0% at Site 2. Due to fundamental differences in alert configuration and order capture, no direct statistical comparison of abandonment rates between the two sites was conducted. Over half of abandoned orders were reordered within 24 hours (65.3% at Site 1; 54.2% at Site 2). The chronology of reordering was similar at both institutions. Attendings, the most senior clinicians, had the lowest rates of abandonment. Abandonment rates decreased as clinicians placed more orders, but rose as clinicians ordered on more unique patients. Abandonments were higher when ordering for children compared with adults.Order abandonment is common and varies by patient's age, clinician type, and workload. Abandonment rates declined as house staff providers advanced in training, signifying clinical experience plays a role. Frequent reordering suggests that workflow interruptions or modifications, rather than intentional medication cancellation, may lead to a significant proportion of abandonments. Similarity in the timing of reordering between healthcare systems suggest common reordering processes across sites. Our findings demonstrate significant order abandonment rates, with the potential to use abandonment as a metric to improve computerized provider order entry (CPOE) functionality, clinicians' workflows, and patient safety.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"28-38"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EHR Workflows Contribute to Disparities by Language Preference in Parent Patient Portal Access. 在父母患者门户访问中,EHR工作流程通过语言偏好导致差异。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-01-01 Epub Date: 2026-01-22 DOI: 10.1055/a-2777-1358
Nymisha Chilukuri, Erin Ballard, Xuan Xu, Tom McPherson, Victor Ritter, Hannah K Bassett, Jennifer Carlson, Natalie M Pageler

Identifying patient portals (PP) activation disparities, especially in electronic health record (EHR) activation workflows, can help facilitate equitable health care access.Our study aimed to assess whether the parent/guardian's preferred language was associated with being offered, activating, and using the PP and the methods used to offer activation codes.This retrospective cohort study examined PP offer, activation, and usage rates at a large freestanding children's hospital. Patients <12 years old with ambulatory visits from July 1, 2022, to June 30, 2023, without prior active proxy PP accounts were included. The primary independent variable was the self-reported parent/guardian preferred language (English/Spanish). Outcomes included the probability of being offered, overall and by specific offer methods, activation, and usage. Zou's modified multivariate Poisson regression models examined the association between preferred language and offer/activate/use status.Among 39,578 patients, 85.1% were patients with English as preferred language (PEPL) and 14.9% had Spanish as preferred language (PSPL). PSPL had a lower probability of being offered (adjusted relative risk ratio [aRR]: 0.65, 95% confidence interval [CI]: 0.63-0.67), activated (aRR: 0.72, 95% CI: 0.70-0.75), and used (aRR: 0.68, 95% CI: 0.65-0.72) a PP compared to PEPL. Specifically, PSPL had a lower probability of activating if ever offered via instant activation (aRR: 0.72, 95% CI: 0.69-0.75), parent/guardian with existing account (aRR: 0.73, 95% CI: 0.69-0.76), and had equal probability of activating if ever offered via letter (aRR: 0.42, 95% CI: 0.19-0.94) and clinician-assisted method (aRR: 0.99, 95% CI: 0.86-1.16), compared to PEPL.PSPL at a large, free-standing pediatric health system had a lower probability of PP offer, activation, and usage than PEPL. Activation methods were not universally effective across language groups, emphasizing the need for equitable workflow optimization. This study highlights an approach to analyzing health disparities in activation workflows to inform targeted interventions to improve equitable PP access.

确定患者门户(PP)激活差异,特别是在电子健康记录(EHR)激活工作流程中,有助于促进公平的医疗保健访问。我们的研究旨在评估父母/监护人的首选语言是否与提供、激活和使用PP以及提供激活代码的方法有关。本回顾性队列研究调查了一家大型独立儿童医院的PP提供、激活和使用率。病人
{"title":"EHR Workflows Contribute to Disparities by Language Preference in Parent Patient Portal Access.","authors":"Nymisha Chilukuri, Erin Ballard, Xuan Xu, Tom McPherson, Victor Ritter, Hannah K Bassett, Jennifer Carlson, Natalie M Pageler","doi":"10.1055/a-2777-1358","DOIUrl":"10.1055/a-2777-1358","url":null,"abstract":"<p><p>Identifying patient portals (PP) activation disparities, especially in electronic health record (EHR) activation workflows, can help facilitate equitable health care access.Our study aimed to assess whether the parent/guardian's preferred language was associated with being offered, activating, and using the PP and the methods used to offer activation codes.This retrospective cohort study examined PP offer, activation, and usage rates at a large freestanding children's hospital. Patients <12 years old with ambulatory visits from July 1, 2022, to June 30, 2023, without prior active proxy PP accounts were included. The primary independent variable was the self-reported parent/guardian preferred language (English/Spanish). Outcomes included the probability of being offered, overall and by specific offer methods, activation, and usage. Zou's modified multivariate Poisson regression models examined the association between preferred language and offer/activate/use status.Among 39,578 patients, 85.1% were patients with English as preferred language (PEPL) and 14.9% had Spanish as preferred language (PSPL). PSPL had a lower probability of being offered (adjusted relative risk ratio [aRR]: 0.65, 95% confidence interval [CI]: 0.63-0.67), activated (aRR: 0.72, 95% CI: 0.70-0.75), and used (aRR: 0.68, 95% CI: 0.65-0.72) a PP compared to PEPL. Specifically, PSPL had a lower probability of activating if ever offered via instant activation (aRR: 0.72, 95% CI: 0.69-0.75), parent/guardian with existing account (aRR: 0.73, 95% CI: 0.69-0.76), and had equal probability of activating if ever offered via letter (aRR: 0.42, 95% CI: 0.19-0.94) and clinician-assisted method (aRR: 0.99, 95% CI: 0.86-1.16), compared to PEPL.PSPL at a large, free-standing pediatric health system had a lower probability of PP offer, activation, and usage than PEPL. Activation methods were not universally effective across language groups, emphasizing the need for equitable workflow optimization. This study highlights an approach to analyzing health disparities in activation workflows to inform targeted interventions to improve equitable PP access.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"19-27"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement Properties of Instruments Assessing Digital Competence in Nursing: A Systematic Review. 护理数字化能力评估仪器的测量特性:系统综述。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-01-01 Epub Date: 2026-01-22 DOI: 10.1055/a-2780-7093
Fabio D'Agostino, Ilaria Erba, Elske Ammenwerth, Vered Robinzon, Gad Segal, Nissim Harel, Elisabetta Corvo, Refael Barkan, Hadas Lewy, Noemi Giannetta

The digital transformation of healthcare is reshaping care delivery among healthcare professionals, requiring nurses to develop digital competencies. These competencies are essential but often underdeveloped due to limited training and resources. Global initiatives emphasize integrating these competencies into nursing education, necessitating valid instruments to assess them.This systematic review aims to identify instruments measuring digital competence in nursing and to assess their measurement properties.This review was registered in PROSPERO (identifier: CRD42024522349) and conducted according to PRISMA guidelines. A systematic search was performed in CINAHL, PubMed/MEDLINE, and Scopus on instruments assessing digital competencies in nursing and reporting measurement properties. Measurement properties and their methodological quality were assessed using the COSMIN criteria, and the overall quality of the evidence was graded using a modified GRADE approach.A total of 27 instruments were identified, relating to three interconnected constructs: nursing informatics, digital health, and information and communication technology. Based on their measurement properties, the instruments were categorized into three groups (A, B, C) following the COSMIN methodology to support recommendations for use. Six instruments were classified under category A (recommended for use): the DigiHealthCom and DigiComInf instruments, the Turkish version of TANIC, the short version of ITASH, the Digital Competence Questionnaire, and the 30-item Arabic version of SANICS. Twenty instruments were categorized under category B (potentially recommendable, but further validation is needed). One instrument was placed in category C (not recommended for use).As digital competence becomes an increasing priority in education and public health, valid and reliable instruments are essential for assessing and monitoring these competencies. Such instruments support the identification of training needs, the evaluation of educational outcomes, and the integration of digital skills into nursing curricula and clinical practice, ultimately strengthening the digital readiness of the nursing workforce.

医疗保健的数字化转型正在重塑医疗保健专业人员的护理服务,要求护士培养数字能力。这些能力是必不可少的,但往往由于培训和资源有限而不发达。全球倡议强调将这些能力纳入护理教育,需要有效的工具来评估它们。本系统综述旨在确定测量护理数字化能力的仪器,并评估其测量特性。该综述已在PROSPERO注册(标识符:CRD42024522349),并按照PRISMA指南进行。在CINAHL、PubMed/MEDLINE和Scopus中进行了系统搜索,以评估护理和报告测量属性的数字能力。使用COSMIN标准评估测量特性及其方法学质量,并使用改进的GRADE方法对证据的总体质量进行分级。总共确定了27种工具,涉及三个相互关联的结构:护理信息学,数字健康以及信息和通信技术。根据其测量特性,这些仪器按照COSMIN方法分为三组(A、B、C),以支持使用建议。六种仪器被分类为A类(推荐使用):DigiHealthCom和DigiComInf仪器、土耳其语版TANIC、ITASH简短版、数字能力问卷和30项阿拉伯语版SANICS。20种仪器被归类为B类(可能值得推荐,但需要进一步验证)。一个仪器被列为C类(不建议使用)。随着数字能力日益成为教育和公共卫生领域的优先事项,有效和可靠的工具对于评估和监测这些能力至关重要。这些工具有助于识别培训需求,评估教育成果,并将数字技能整合到护理课程和临床实践中,最终加强护理人员的数字化准备。
{"title":"Measurement Properties of Instruments Assessing Digital Competence in Nursing: A Systematic Review.","authors":"Fabio D'Agostino, Ilaria Erba, Elske Ammenwerth, Vered Robinzon, Gad Segal, Nissim Harel, Elisabetta Corvo, Refael Barkan, Hadas Lewy, Noemi Giannetta","doi":"10.1055/a-2780-7093","DOIUrl":"10.1055/a-2780-7093","url":null,"abstract":"<p><p>The digital transformation of healthcare is reshaping care delivery among healthcare professionals, requiring nurses to develop digital competencies. These competencies are essential but often underdeveloped due to limited training and resources. Global initiatives emphasize integrating these competencies into nursing education, necessitating valid instruments to assess them.This systematic review aims to identify instruments measuring digital competence in nursing and to assess their measurement properties.This review was registered in PROSPERO (identifier: CRD42024522349) and conducted according to PRISMA guidelines. A systematic search was performed in CINAHL, PubMed/MEDLINE, and Scopus on instruments assessing digital competencies in nursing and reporting measurement properties. Measurement properties and their methodological quality were assessed using the COSMIN criteria, and the overall quality of the evidence was graded using a modified GRADE approach.A total of 27 instruments were identified, relating to three interconnected constructs: nursing informatics, digital health, and information and communication technology. Based on their measurement properties, the instruments were categorized into three groups (A, B, C) following the COSMIN methodology to support recommendations for use. Six instruments were classified under category A (recommended for use): the DigiHealthCom and DigiComInf instruments, the Turkish version of TANIC, the short version of ITASH, the Digital Competence Questionnaire, and the 30-item Arabic version of SANICS. Twenty instruments were categorized under category B (potentially recommendable, but further validation is needed). One instrument was placed in category C (not recommended for use).As digital competence becomes an increasing priority in education and public health, valid and reliable instruments are essential for assessing and monitoring these competencies. Such instruments support the identification of training needs, the evaluation of educational outcomes, and the integration of digital skills into nursing curricula and clinical practice, ultimately strengthening the digital readiness of the nursing workforce.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"1-18"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging 10 Days of Alert Malfunction to Improve Mature Organizational Clinical Decision Support Processes. 关于CDS故障的特刊:利用10天的警报故障来改进成熟的组织临床决策支持过程。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2026-01-01 Epub Date: 2026-01-21 DOI: 10.1055/a-2793-0977
Daria F Ferro, Marc Tobias, Leah H Carr, Pamela Wentz, Melissa Rodriguez, Casey Pitts, Emily Kane, Eric Shelov

Interruptive clinical decision support (CDS) alerts are intended to standardize patient care and prevent harm. However, failures can occur even in organizations with mature CDS governance and advanced analytics. These breakdowns, marked by excessive firings, workflow disruption, and clinician dissatisfaction, can provide insights into systemic weaknesses in CDS design, testing, and monitoring processes.This study aimed to examine a CDS alert malfunction as a lens for identifying system-level gaps and propose strategies to strengthen resilience in CDS operations.A retrospective analysis was conducted on an interruptive alert that was developed through a phased, multistakeholder, committee-driven process, but was removed within 10 days due to poor performance, revealing gaps that persisted despite established governance.The alert fired 1,866 times in 5 days, with a 91% dismissal rate and reports of workflow disruption. Feedback indicated provider frustration and concern for malfunction. Analysis revealed gaps in end-user engagement, testing rigor, committee reviews, and monitoring practices.CDS failures can serve as catalysts for system improvement. This case highlights actionable lessons, such as operationalizing user-centered design, clarifying testing expectations, and distributing monitoring responsibilities, to enhance CDS reliability. Even well-established governance structures must be continuously evaluated and adapted to keep pace with evolving CDS technologies, and such investments position organizations to maintain responsive, sustainable systems aligned with high-quality care.

背景:中断临床决策支持(CDS)警报旨在规范患者护理和防止伤害。然而,即使在具有成熟的CDS治理和高级分析的组织中,也可能发生故障。这些故障,以过度解雇、工作流程中断和临床医生不满为标志,可以洞察CDS设计、测试和监控过程中的系统弱点。目的:检查CDS警报故障作为识别系统级缺口的镜头,并提出加强CDS操作弹性的策略。方法:对中断警报进行回顾性分析,该警报是通过分阶段、多利益相关者、委员会驱动的过程开发的,但由于表现不佳,在十天内被删除,揭示了尽管建立了治理,但仍然存在的差距。结果:该警报在五天内解雇了1866次,解雇率为91%,并报告了工作流程中断。反馈表明供应商对故障感到沮丧和担忧。分析揭示了最终用户参与、测试严谨性、委员会审查和监督实践方面的差距。结论:CDS故障可作为系统改进的催化剂。本案例强调了可操作的经验教训,例如实现以用户为中心的设计、澄清测试期望和分配监视职责,以增强CDS的可靠性。即使是完善的治理结构也必须不断进行评估和调整,以跟上不断发展的CDS技术的步伐,这种投资使组织能够保持响应迅速、可持续的系统,与高质量的护理保持一致。
{"title":"Leveraging 10 Days of Alert Malfunction to Improve Mature Organizational Clinical Decision Support Processes.","authors":"Daria F Ferro, Marc Tobias, Leah H Carr, Pamela Wentz, Melissa Rodriguez, Casey Pitts, Emily Kane, Eric Shelov","doi":"10.1055/a-2793-0977","DOIUrl":"10.1055/a-2793-0977","url":null,"abstract":"<p><p>Interruptive clinical decision support (CDS) alerts are intended to standardize patient care and prevent harm. However, failures can occur even in organizations with mature CDS governance and advanced analytics. These breakdowns, marked by excessive firings, workflow disruption, and clinician dissatisfaction, can provide insights into systemic weaknesses in CDS design, testing, and monitoring processes.This study aimed to examine a CDS alert malfunction as a lens for identifying system-level gaps and propose strategies to strengthen resilience in CDS operations.A retrospective analysis was conducted on an interruptive alert that was developed through a phased, multistakeholder, committee-driven process, but was removed within 10 days due to poor performance, revealing gaps that persisted despite established governance.The alert fired 1,866 times in 5 days, with a 91% dismissal rate and reports of workflow disruption. Feedback indicated provider frustration and concern for malfunction. Analysis revealed gaps in end-user engagement, testing rigor, committee reviews, and monitoring practices.CDS failures can serve as catalysts for system improvement. This case highlights actionable lessons, such as operationalizing user-centered design, clarifying testing expectations, and distributing monitoring responsibilities, to enhance CDS reliability. Even well-established governance structures must be continuously evaluated and adapted to keep pace with evolving CDS technologies, and such investments position organizations to maintain responsive, sustainable systems aligned with high-quality care.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":"46-51"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Baseline Evaluation of Claude Opus 4 for Diabetes Management: A Preliminary Assessment and Lessons for Implementation. 克劳德Opus 4对糖尿病管理的基线评价:初步评估和实施经验教训。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2025-10-01 Epub Date: 2025-12-08 DOI: 10.1055/a-2765-6930
Pouyan Esmaeilzadeh

Claude Opus 4 is a large language model (LLM) that features improved reasoning capabilities and broader contextual understanding compared to earlier versions. Despite the growing use of LLM systems for seeking medical information, structured and simulation-based evaluations of Claude Opus 4's capabilities in diabetes management remain limited, particularly across domains such as patient education, clinical reasoning, and emotional support.This study aimed to conduct a baseline evaluation of Claude Opus 4's performance across key domains of diabetes care (i.e., patient education, clinical reasoning, and emotional support), and to identify preliminary insights that can inform future, evidence-based integration strategies.A three-step evaluation was conducted: (1) 30 diabetes management questions assessed using expert endocrinologist evaluation, (2) five fictional diabetes cases evaluated for clinical decision-making, and (3) emotional support responses assessed for appropriateness and empathy. Three expert endocrinologists graded responses according to American Diabetes Association guidelines.Claude Opus 4 achieved 80% accuracy in general diabetes knowledge, with high response reproducibility (96.7%), indicating baseline rather than clinically adequate performance. Clinical case evaluations showed moderate utility (mean expert rating = 4.4/7), while emotional-support assessments yielded high scores for empathy (6.2/7) and appropriateness (6.0/7). These findings suggest that although the model demonstrates promising informational and emotional-support capabilities, its current performance remains insufficient for autonomous clinical use and should be viewed as preliminary evidence to guide future, patient-inclusive validation studies.Although Claude Opus 4 demonstrates preliminary findings suggesting potential applications in diabetes care, education, and emotional support, this baseline assessment using fictional cases underscores the need for real-world validation with clinical data to determine true clinical utility and patient-centered impact. This simulation-based evaluation also offers practical lessons learned for researchers designing future LLM assessments, highlighting the need for mixed expert-patient panels, contextual validation, and person-centered metrics beyond numerical accuracy.

背景:Claude Opus 4是一个大型语言模型(LLM),与早期版本相比,它具有改进的推理能力和更广泛的上下文理解。尽管越来越多地使用法学硕士系统来寻求医疗信息,但对Claude Opus 4在糖尿病管理方面的能力进行结构化和基于模拟的评估仍然有限,特别是在患者教育、临床推理和情感支持等领域。目的:对Claude Opus 4在糖尿病护理的关键领域(即患者教育、临床推理和情感支持)的表现进行基线评估,并确定初步见解,为未来的循证整合策略提供信息。方法:采用三步评估法:(1)采用内分泌专家评估法对30个糖尿病管理问题进行评估;(2)对5个虚构的糖尿病病例进行临床决策评估;(3)对情绪支持反应进行适当性和共情评估。三位内分泌专家根据美国糖尿病协会的指南对反应进行评分。结果:Claude Opus 4对一般糖尿病知识的准确度达到80%,反应重现性高(96.7%),表明基线而非临床表现足够。临床病例评估显示中等效用(专家平均评分为4.4/7),而情感支持评估在共情(6.2/7)和适当性(6.0/7)方面获得高分。这些发现表明,尽管该模型显示出有希望的信息和情感支持能力,但其目前的表现仍不足以用于自主临床应用,应被视为指导未来患者验证研究的初步证据。结论:虽然Claude Opus 4展示了初步研究结果,提示在糖尿病护理、教育和情感支持方面的潜在应用,但使用虚构病例的基线评估强调了用临床数据验证真实世界的必要性,以确定真正的临床效用和以患者为中心的影响。这种基于模拟的评估也为设计未来法学硕士评估的研究人员提供了实践经验,强调了混合专家-患者小组、上下文验证和以人为本的指标的需求,而不仅仅是数字准确性。
{"title":"Baseline Evaluation of Claude Opus 4 for Diabetes Management: A Preliminary Assessment and Lessons for Implementation.","authors":"Pouyan Esmaeilzadeh","doi":"10.1055/a-2765-6930","DOIUrl":"10.1055/a-2765-6930","url":null,"abstract":"<p><p>Claude Opus 4 is a large language model (LLM) that features improved reasoning capabilities and broader contextual understanding compared to earlier versions. Despite the growing use of LLM systems for seeking medical information, structured and simulation-based evaluations of Claude Opus 4's capabilities in diabetes management remain limited, particularly across domains such as patient education, clinical reasoning, and emotional support.This study aimed to conduct a baseline evaluation of Claude Opus 4's performance across key domains of diabetes care (i.e., patient education, clinical reasoning, and emotional support), and to identify preliminary insights that can inform future, evidence-based integration strategies.A three-step evaluation was conducted: (1) 30 diabetes management questions assessed using expert endocrinologist evaluation, (2) five fictional diabetes cases evaluated for clinical decision-making, and (3) emotional support responses assessed for appropriateness and empathy. Three expert endocrinologists graded responses according to American Diabetes Association guidelines.Claude Opus 4 achieved 80% accuracy in general diabetes knowledge, with high response reproducibility (96.7%), indicating baseline rather than clinically adequate performance. Clinical case evaluations showed moderate utility (mean expert rating = 4.4/7), while emotional-support assessments yielded high scores for empathy (6.2/7) and appropriateness (6.0/7). These findings suggest that although the model demonstrates promising informational and emotional-support capabilities, its current performance remains insufficient for autonomous clinical use and should be viewed as preliminary evidence to guide future, patient-inclusive validation studies.Although Claude Opus 4 demonstrates preliminary findings suggesting potential applications in diabetes care, education, and emotional support, this baseline assessment using fictional cases underscores the need for real-world validation with clinical data to determine true clinical utility and patient-centered impact. This simulation-based evaluation also offers practical lessons learned for researchers designing future LLM assessments, highlighting the need for mixed expert-patient panels, contextual validation, and person-centered metrics beyond numerical accuracy.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":"1881-1891"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Provider Documentation Using a Pediatric Automated Documentation Assistance Tool. 使用儿科自动文档辅助工具改进提供者文档。
IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS Pub Date : 2025-10-01 Epub Date: 2025-12-18 DOI: 10.1055/a-2765-7021
Kevin D Smith, Riley Boland, Matthew Cerasale, Cheng-Kai Kao

Clinical documentation improvement is critical for pediatric care, yet leveraging electronic health record (EHR) tools for this population is not well established. We aimed to adapt and implement a real-time, automated documentation assistance tool (AutoDx) to decrease clinical documentation integrity (CDI) coding queries and improve perceived ease of practice for pediatric inpatient providers.In this quality improvement study at an urban academic pediatric hospital, we adapted and implemented AutoDx for pediatric use by developing and validating pediatric-specific logic rules to alert providers to potential diagnoses based on EHR data. The primary outcome was the rate of CDI queries per 1,000 discharges for targeted diagnoses, aiming for a 50% reduction over a 5-month implementation period compared with a 12-month baseline. Secondary outcomes included provider-surveyed ease of practice, with a goal of a 25% improvement, and tool uptake.The aggregate rate of targeted CDI queries decreased by 58% postimplementation, from 80.7 to 33.9 per 1,000 discharges (p < 0.001). Moreover, analysis by interrupted time series demonstrated an immediate 45.5% reduction in the rate of coding queries (p = 0.028) following the implementation of the tool. The rate of queries for nontargeted diagnoses remained unchanged. Tool adoption increased steadily throughout the study period. While provider-reported time spent on queries did not significantly decrease, a majority of survey respondents (59%) perceived receiving fewer queries, and 46% agreed the tool made it easier to provide quality care.Implementation of a real-time, automated documentation support tool in a pediatric inpatient setting significantly reduced CDI coding queries for targeted diagnoses. Despite a "task substitution" effect where perceived workload did not decrease, the tool improved perceived ease of practice, demonstrating that targeted EHR interventions can enhance documentation accuracy and efficiency in pediatrics.

临床文件的改进对儿科护理至关重要,但利用电子健康记录(EHR)工具为这一人群服务还没有很好地建立起来。我们的目标是适应和实现一个实时、自动化文档辅助工具(AutoDx),以减少临床文档完整性(CDI)编码查询,并提高儿科住院医生实践的易用性。在这个城市学术儿科医院的质量改进研究中,我们通过开发和验证儿科特定的逻辑规则来提醒提供者基于EHR数据的潜在诊断,从而适应并实施了AutoDx用于儿科。主要结果是针对目标诊断的每1000例出院患者的CDI查询率,目标是在5个月的实施期内与12个月的基线相比减少50%。次要结果包括供应商调查的操作便利性,目标是提高25%,以及工具使用率。实施该工具后,目标CDI查询的总比率下降了58%,从每1,000次查询80.7次下降到33.9次(p p = 0.028)。非目标诊断的查询率保持不变。在整个研究期间,工具的采用稳步增加。虽然提供者报告的查询时间并没有显著减少,但大多数受访者(59%)认为收到的查询减少了,46%的受访者认为该工具更容易提供高质量的护理。在儿科住院患者设置中实现实时、自动化文档支持工具可显著减少针对目标诊断的CDI编码查询。尽管存在“任务替代”效应,即感知到的工作量没有减少,但该工具提高了感知到的实践便利性,表明有针对性的电子病历干预可以提高儿科文档的准确性和效率。
{"title":"Improving Provider Documentation Using a Pediatric Automated Documentation Assistance Tool.","authors":"Kevin D Smith, Riley Boland, Matthew Cerasale, Cheng-Kai Kao","doi":"10.1055/a-2765-7021","DOIUrl":"10.1055/a-2765-7021","url":null,"abstract":"<p><p>Clinical documentation improvement is critical for pediatric care, yet leveraging electronic health record (EHR) tools for this population is not well established. We aimed to adapt and implement a real-time, automated documentation assistance tool (AutoDx) to decrease clinical documentation integrity (CDI) coding queries and improve perceived ease of practice for pediatric inpatient providers.In this quality improvement study at an urban academic pediatric hospital, we adapted and implemented AutoDx for pediatric use by developing and validating pediatric-specific logic rules to alert providers to potential diagnoses based on EHR data. The primary outcome was the rate of CDI queries per 1,000 discharges for targeted diagnoses, aiming for a 50% reduction over a 5-month implementation period compared with a 12-month baseline. Secondary outcomes included provider-surveyed ease of practice, with a goal of a 25% improvement, and tool uptake.The aggregate rate of targeted CDI queries decreased by 58% postimplementation, from 80.7 to 33.9 per 1,000 discharges (<i>p</i> < 0.001). Moreover, analysis by interrupted time series demonstrated an immediate 45.5% reduction in the rate of coding queries (<i>p</i> = 0.028) following the implementation of the tool. The rate of queries for nontargeted diagnoses remained unchanged. Tool adoption increased steadily throughout the study period. While provider-reported time spent on queries did not significantly decrease, a majority of survey respondents (59%) perceived receiving fewer queries, and 46% agreed the tool made it easier to provide quality care.Implementation of a real-time, automated documentation support tool in a pediatric inpatient setting significantly reduced CDI coding queries for targeted diagnoses. Despite a \"task substitution\" effect where perceived workload did not decrease, the tool improved perceived ease of practice, demonstrating that targeted EHR interventions can enhance documentation accuracy and efficiency in pediatrics.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"16 5","pages":"1900-1908"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Clinical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1