Background: The integration of large language models (LLMs) into clinical diagnostics presents significant challenges regarding their accuracy and reliability.
Objective: This study aimed to evaluate the performance of DeepSeek R1, an open-source reasoning model, alongside two other LLMs, GPT-4.1 and Claude 3.5 Sonnet, across multiple-choice clinical cases.
Methods: A dataset of complex medical cases representative of real-world clinical practice was selected. For efficiency, models were accessed via application programming interfaces (APIs) and assessed using standardized prompts and a predefined evaluation protocol.
Results: The models demonstrated an overall accuracy of 77.1%, with GPT-4 producing the fewest errors and Claude 3.5 the most. The reproducibility analysis indicated that the tests were very repeatable: DeepSeek (100%), GPT-4.1 (97.5%), and Claude 3.5 Sonnet (92%).
Conclusions: While LLMs show promise for enhancing diagnostics, ongoing scrutiny is required to address error rates and validate standard medical answers. Given the limited dataset and prompting protocol, findings should not be interpreted as broader equivalence in real‑world clinical reasoning. This study demonstrates the need for robust evaluation standards, attention to error rates, and further research.
{"title":"Measuring The Accuracy and Reproducibility of DeepSeek R1, Claude 3.5 Sonnet, and GPT‑4.1 on Complex Clinical Scenarios.","authors":"Robert E Hoyt, Maria Bajwa","doi":"10.1055/a-2807-4256","DOIUrl":"https://doi.org/10.1055/a-2807-4256","url":null,"abstract":"<p><strong>Background: </strong>The integration of large language models (LLMs) into clinical diagnostics presents significant challenges regarding their accuracy and reliability.</p><p><strong>Objective: </strong>This study aimed to evaluate the performance of DeepSeek R1, an open-source reasoning model, alongside two other LLMs, GPT-4.1 and Claude 3.5 Sonnet, across multiple-choice clinical cases.</p><p><strong>Methods: </strong>A dataset of complex medical cases representative of real-world clinical practice was selected. For efficiency, models were accessed via application programming interfaces (APIs) and assessed using standardized prompts and a predefined evaluation protocol.</p><p><strong>Results: </strong>The models demonstrated an overall accuracy of 77.1%, with GPT-4 producing the fewest errors and Claude 3.5 the most. The reproducibility analysis indicated that the tests were very repeatable: DeepSeek (100%), GPT-4.1 (97.5%), and Claude 3.5 Sonnet (92%).</p><p><strong>Conclusions: </strong>While LLMs show promise for enhancing diagnostics, ongoing scrutiny is required to address error rates and validate standard medical answers. Given the limited dataset and prompting protocol, findings should not be interpreted as broader equivalence in real‑world clinical reasoning. This study demonstrates the need for robust evaluation standards, attention to error rates, and further research.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Averi Wilson, Andrew Patrick Bain, Suhani Goyal, Abey Thomas, Robert W Turer, Craig Glazer, DuWayne L Willett, Wendy Yin, Samuel McDonald
Background: Push notifications are a common method of clinical communication in inpatient settings, yet their volume and delivery patterns have not been described. Alert fatigue has been well described in healthcare, and push notifications may be a new contributor.
Objective: To characterize the volume, type, and temporal distribution of push notifications received by hospitalists across distinct clinical roles in a large academic health system.
Methods: We conducted a cross-sectional analysis of electronic health record (EHR) audit log data from June 1, 2024, to June 1, 2025, at a large academic health system using Epic (Verona, WI) EHR. All push notifications received by attending hospitalists were extracted, categorized (secure message, results, other), and summarized by hour, hospitalist role, and device type.
Results: Ninety-seven hospitalists received 1,114,657 push notifications over one year, with a median of 11 (3-24) notifications per hour. Rounding hospitalists received 9 (7-12) notifications per patient per working day. Secure message notifications accounted for the majority and result-related notifications comprised only 2.2% of notifications. Notifications peaked midday and were received throughout the day, including outside of scheduled shift times.
Conclusions: Hospitalists are exposed to a high volume of push notifications, which may contribute to alert fatigue and ultimately impact patient safety and clinician wellbeing. System-level efforts to prioritize clinically meaningful notifications, refine notification settings, and enhance secure-messaging infrastructure are needed to protect clinician attention and support patient safety.
{"title":"Characterizing Push Notification Volume and Delivery Patterns in Hospital Medicine.","authors":"Averi Wilson, Andrew Patrick Bain, Suhani Goyal, Abey Thomas, Robert W Turer, Craig Glazer, DuWayne L Willett, Wendy Yin, Samuel McDonald","doi":"10.1055/a-2802-2912","DOIUrl":"https://doi.org/10.1055/a-2802-2912","url":null,"abstract":"<p><strong>Background: </strong>Push notifications are a common method of clinical communication in inpatient settings, yet their volume and delivery patterns have not been described. Alert fatigue has been well described in healthcare, and push notifications may be a new contributor.</p><p><strong>Objective: </strong>To characterize the volume, type, and temporal distribution of push notifications received by hospitalists across distinct clinical roles in a large academic health system.</p><p><strong>Methods: </strong>We conducted a cross-sectional analysis of electronic health record (EHR) audit log data from June 1, 2024, to June 1, 2025, at a large academic health system using Epic (Verona, WI) EHR. All push notifications received by attending hospitalists were extracted, categorized (secure message, results, other), and summarized by hour, hospitalist role, and device type.</p><p><strong>Results: </strong>Ninety-seven hospitalists received 1,114,657 push notifications over one year, with a median of 11 (3-24) notifications per hour. Rounding hospitalists received 9 (7-12) notifications per patient per working day. Secure message notifications accounted for the majority and result-related notifications comprised only 2.2% of notifications. Notifications peaked midday and were received throughout the day, including outside of scheduled shift times.</p><p><strong>Conclusions: </strong>Hospitalists are exposed to a high volume of push notifications, which may contribute to alert fatigue and ultimately impact patient safety and clinician wellbeing. System-level efforts to prioritize clinically meaningful notifications, refine notification settings, and enhance secure-messaging infrastructure are needed to protect clinician attention and support patient safety.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background and significance: Getting patients out of intensive care units (ICUs) is a major goal for acute care clinicians, as prolonged stays increase the risk of complications and strain critical resources such as staff, equipment, and beds. The ICU Liberation bundle or the ABCDEF (A-F) care bundle is an evidence-based framework for improving outcomes in critically ill patients by addressing pain, sedation, delirium, mobility, and family engagement. However, variability in documentation and lack of standardized data elements hinder effective implementation and evaluation of adherence to bundle components.
Objectives: This study aims to characterize data elements of the A-F liberation bundle using a large, single-center critical care database and to develop standardized bundle cards that map bundle components to controlled vocabularies.
Methods: We conducted a retrospective analysis of data elements related to A-F bundle using the MIMIC-IV database. Clinical concepts were mapped to standardized vocabularies and aligned with the OMOP common data model. Bundle cards were developed for each component to provide structured, accessible documentation of assessment tools, adherence criteria, and terminology mappings.
Results: Pain assessments were documented in over 11,000 patients, with a median of 23 assessments per day. Sedation levels for nearly 59,000 patients were evaluated, with 37.7% meeting Society of Critical Care Medicine (SCCM) adherence criteria. Delirium assessments followed standardized protocols incorporating RASS and CAM-ICU scores. Components E and F lacked formal compliance specifications; bundle cards for these components identified key activities and highlighted gaps in standardized vocabularies. Adherence analyses revealed variability likely due to non-standardized documentation practices.
Conclusion: We developed and validated six ICU Liberation Bundle cards that map bundle components to standardized vocabularies and common data models, enabling retrospective adherence evaluation in real-world data. These information resources promote consistent documentation, support interoperability, and provide a foundation for prospective monitoring to enhance bundle implementation in critical care.
{"title":"Standardizing Data Elements for Implementation of ICU Liberation Bundle.","authors":"Md Fantacher Islam, Molly Douglas, Jarrod Mosier, Vignesh Subbian","doi":"10.1055/a-2802-7458","DOIUrl":"https://doi.org/10.1055/a-2802-7458","url":null,"abstract":"<p><strong>Background and significance: </strong>Getting patients out of intensive care units (ICUs) is a major goal for acute care clinicians, as prolonged stays increase the risk of complications and strain critical resources such as staff, equipment, and beds. The ICU Liberation bundle or the ABCDEF (A-F) care bundle is an evidence-based framework for improving outcomes in critically ill patients by addressing pain, sedation, delirium, mobility, and family engagement. However, variability in documentation and lack of standardized data elements hinder effective implementation and evaluation of adherence to bundle components.</p><p><strong>Objectives: </strong>This study aims to characterize data elements of the A-F liberation bundle using a large, single-center critical care database and to develop standardized bundle cards that map bundle components to controlled vocabularies.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of data elements related to A-F bundle using the MIMIC-IV database. Clinical concepts were mapped to standardized vocabularies and aligned with the OMOP common data model. Bundle cards were developed for each component to provide structured, accessible documentation of assessment tools, adherence criteria, and terminology mappings.</p><p><strong>Results: </strong>Pain assessments were documented in over 11,000 patients, with a median of 23 assessments per day. Sedation levels for nearly 59,000 patients were evaluated, with 37.7% meeting Society of Critical Care Medicine (SCCM) adherence criteria. Delirium assessments followed standardized protocols incorporating RASS and CAM-ICU scores. Components E and F lacked formal compliance specifications; bundle cards for these components identified key activities and highlighted gaps in standardized vocabularies. Adherence analyses revealed variability likely due to non-standardized documentation practices.</p><p><strong>Conclusion: </strong>We developed and validated six ICU Liberation Bundle cards that map bundle components to standardized vocabularies and common data models, enabling retrospective adherence evaluation in real-world data. These information resources promote consistent documentation, support interoperability, and provide a foundation for prospective monitoring to enhance bundle implementation in critical care.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-30DOI: 10.1055/a-2786-0291
Fabiana Cristina Dos Santos, Sophia Mclnerney, Miya C Tate, Aadia Rana, D Scott Batey, Rebecca Schnall
Drive to Zero is a mobile health application (app) designed to identify and retain people with HIV (PWH) who have experienced challenges with achieving or maintaining viral suppression. The app targets PWH who have lacked documented HIV care in the past months and are experiencing medication adherence barriers. Features include an interactive chat for communicating with the study team and access to educational resources to support care engagement and health management.This usability study aimed to assess the Drive to Zero app's ease of use and interface design through expert heuristic evaluation and end-user testing.Usability was evaluated through two approaches: heuristic evaluations conducted by five informatics experts following Nielsen's usability principles, and end-user testing with 20 PWH using the validated Post-Study System Usability Questionnaire and qualitative interviews to collect feedback on app functionality and user experience.Heuristic experts and end-users demonstrated satisfaction with the app's appearance, reporting that it has a simple and intuitive interface for identifying and retaining PWH, which will assist them with study engagement and ultimately reengage with HIV care. However, participants highlighted areas needing improvement, suggesting better accessibility of "home" and "help" buttons to improve user control and a more detailed explanation of the incentive program to enhance user engagement and retention.Usability evaluations provided valuable insights into the Drive to Zero app's design. Areas for improvement were enhancing user controls and improving the readability of the incentive program. These findings will guide iterative refinements, ensuring that future versions of the app improve the usability and acceptability of its target audience.
“走向零”是一款移动健康应用程序,旨在识别和留住在实现或维持病毒抑制方面遇到挑战的艾滋病毒感染者。该应用程序针对的是在过去几个月里缺乏记录在案的艾滋病毒护理的PWH,并且正在经历药物依从性障碍。其功能包括与学习团队进行交流的交互式聊天,以及访问教育资源以支持护理参与和健康管理。这项可用性研究旨在通过专家启发式评估和最终用户测试来评估Drive to Zero应用程序的易用性和界面设计。可用性通过两种方法进行评估:由五位信息学专家根据尼尔森可用性原则进行启发式评估,以及使用经过验证的研究后系统可用性问卷和定性访谈对20个PWH进行最终用户测试,以收集对应用功能和用户体验的反馈。启发式专家和最终用户对应用程序的外观表示满意,报告说它具有简单直观的界面,用于识别和保留PWH,这将帮助他们参与学习并最终重新参与艾滋病毒护理。然而,与会者强调了需要改进的地方,建议增加“主页”和“帮助”按钮的可访问性,以改善用户控制,并更详细地解释激励计划,以提高用户参与度和留存率。可用性评估为Drive to Zero应用的设计提供了有价值的见解。需要改进的领域是加强用户控制和改进奖励方案的可读性。这些发现将指导迭代改进,确保应用程序的未来版本提高其目标受众的可用性和可接受性。
{"title":"Optimizing HIV Care Engagement: Usability of a mHealth App for Identifying and Retaining Individuals with Nonviral Suppression in Digital Cohort.","authors":"Fabiana Cristina Dos Santos, Sophia Mclnerney, Miya C Tate, Aadia Rana, D Scott Batey, Rebecca Schnall","doi":"10.1055/a-2786-0291","DOIUrl":"10.1055/a-2786-0291","url":null,"abstract":"<p><p>Drive to Zero is a mobile health application (app) designed to identify and retain people with HIV (PWH) who have experienced challenges with achieving or maintaining viral suppression. The app targets PWH who have lacked documented HIV care in the past months and are experiencing medication adherence barriers. Features include an interactive chat for communicating with the study team and access to educational resources to support care engagement and health management.This usability study aimed to assess the Drive to Zero app's ease of use and interface design through expert heuristic evaluation and end-user testing.Usability was evaluated through two approaches: heuristic evaluations conducted by five informatics experts following Nielsen's usability principles, and end-user testing with 20 PWH using the validated Post-Study System Usability Questionnaire and qualitative interviews to collect feedback on app functionality and user experience.Heuristic experts and end-users demonstrated satisfaction with the app's appearance, reporting that it has a simple and intuitive interface for identifying and retaining PWH, which will assist them with study engagement and ultimately reengage with HIV care. However, participants highlighted areas needing improvement, suggesting better accessibility of \"home\" and \"help\" buttons to improve user control and a more detailed explanation of the incentive program to enhance user engagement and retention.Usability evaluations provided valuable insights into the Drive to Zero app's design. Areas for improvement were enhancing user controls and improving the readability of the incentive program. These findings will guide iterative refinements, ensuring that future versions of the app improve the usability and acceptability of its target audience.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"39-45"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-30DOI: 10.1055/a-2786-0551
Anne Grauer, Yuyang Yang, Jo Applebaum, Yelstin Fernandes, David Liebovitz, Jason Adelman, Bruce Lambert, William Galanter
Abandoned medication orders-those initiated but not signed-represent a potential safety risk and an indicator of electronic health record (EHR) inefficiency. This study explores inpatient medication abandonment across two large tertiary healthcare systems using different EHRs.Silent alerts were deployed to identify abandoned orders at Site 1 (June 2018-May 2019) and Site 2 (July 2020-May 2023). At Site 1, alerts triggered on all inpatient medication orders. At Site 2, alerts were part of a broader study implementing indication alerts; only orders for study medications triggered alerts. An abandoned order was defined as an order initiated but not signed within 24 hours of initiation. We calculated abandonment rates and rates of reorders, and performed regression to examine the association between abandonment and clinician, patient, and order characteristics. Exponential models were fit to characterize the chronology of reordering.Among 6.8 million medication orders, abandonment rates were 11.2% at Site 1 and 25.0% at Site 2. Due to fundamental differences in alert configuration and order capture, no direct statistical comparison of abandonment rates between the two sites was conducted. Over half of abandoned orders were reordered within 24 hours (65.3% at Site 1; 54.2% at Site 2). The chronology of reordering was similar at both institutions. Attendings, the most senior clinicians, had the lowest rates of abandonment. Abandonment rates decreased as clinicians placed more orders, but rose as clinicians ordered on more unique patients. Abandonments were higher when ordering for children compared with adults.Order abandonment is common and varies by patient's age, clinician type, and workload. Abandonment rates declined as house staff providers advanced in training, signifying clinical experience plays a role. Frequent reordering suggests that workflow interruptions or modifications, rather than intentional medication cancellation, may lead to a significant proportion of abandonments. Similarity in the timing of reordering between healthcare systems suggest common reordering processes across sites. Our findings demonstrate significant order abandonment rates, with the potential to use abandonment as a metric to improve computerized provider order entry (CPOE) functionality, clinicians' workflows, and patient safety.
{"title":"Abandoned Inpatient Orders: An Opportunity for Improving CPOE Safety and Efficiency.","authors":"Anne Grauer, Yuyang Yang, Jo Applebaum, Yelstin Fernandes, David Liebovitz, Jason Adelman, Bruce Lambert, William Galanter","doi":"10.1055/a-2786-0551","DOIUrl":"10.1055/a-2786-0551","url":null,"abstract":"<p><p>Abandoned medication orders-those initiated but not signed-represent a potential safety risk and an indicator of electronic health record (EHR) inefficiency. This study explores inpatient medication abandonment across two large tertiary healthcare systems using different EHRs.Silent alerts were deployed to identify abandoned orders at Site 1 (June 2018-May 2019) and Site 2 (July 2020-May 2023). At Site 1, alerts triggered on all inpatient medication orders. At Site 2, alerts were part of a broader study implementing indication alerts; only orders for study medications triggered alerts. An abandoned order was defined as an order initiated but not signed within 24 hours of initiation. We calculated abandonment rates and rates of reorders, and performed regression to examine the association between abandonment and clinician, patient, and order characteristics. Exponential models were fit to characterize the chronology of reordering.Among 6.8 million medication orders, abandonment rates were 11.2% at Site 1 and 25.0% at Site 2. Due to fundamental differences in alert configuration and order capture, no direct statistical comparison of abandonment rates between the two sites was conducted. Over half of abandoned orders were reordered within 24 hours (65.3% at Site 1; 54.2% at Site 2). The chronology of reordering was similar at both institutions. Attendings, the most senior clinicians, had the lowest rates of abandonment. Abandonment rates decreased as clinicians placed more orders, but rose as clinicians ordered on more unique patients. Abandonments were higher when ordering for children compared with adults.Order abandonment is common and varies by patient's age, clinician type, and workload. Abandonment rates declined as house staff providers advanced in training, signifying clinical experience plays a role. Frequent reordering suggests that workflow interruptions or modifications, rather than intentional medication cancellation, may lead to a significant proportion of abandonments. Similarity in the timing of reordering between healthcare systems suggest common reordering processes across sites. Our findings demonstrate significant order abandonment rates, with the potential to use abandonment as a metric to improve computerized provider order entry (CPOE) functionality, clinicians' workflows, and patient safety.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"28-38"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-22DOI: 10.1055/a-2777-1358
Nymisha Chilukuri, Erin Ballard, Xuan Xu, Tom McPherson, Victor Ritter, Hannah K Bassett, Jennifer Carlson, Natalie M Pageler
Identifying patient portals (PP) activation disparities, especially in electronic health record (EHR) activation workflows, can help facilitate equitable health care access.Our study aimed to assess whether the parent/guardian's preferred language was associated with being offered, activating, and using the PP and the methods used to offer activation codes.This retrospective cohort study examined PP offer, activation, and usage rates at a large freestanding children's hospital. Patients <12 years old with ambulatory visits from July 1, 2022, to June 30, 2023, without prior active proxy PP accounts were included. The primary independent variable was the self-reported parent/guardian preferred language (English/Spanish). Outcomes included the probability of being offered, overall and by specific offer methods, activation, and usage. Zou's modified multivariate Poisson regression models examined the association between preferred language and offer/activate/use status.Among 39,578 patients, 85.1% were patients with English as preferred language (PEPL) and 14.9% had Spanish as preferred language (PSPL). PSPL had a lower probability of being offered (adjusted relative risk ratio [aRR]: 0.65, 95% confidence interval [CI]: 0.63-0.67), activated (aRR: 0.72, 95% CI: 0.70-0.75), and used (aRR: 0.68, 95% CI: 0.65-0.72) a PP compared to PEPL. Specifically, PSPL had a lower probability of activating if ever offered via instant activation (aRR: 0.72, 95% CI: 0.69-0.75), parent/guardian with existing account (aRR: 0.73, 95% CI: 0.69-0.76), and had equal probability of activating if ever offered via letter (aRR: 0.42, 95% CI: 0.19-0.94) and clinician-assisted method (aRR: 0.99, 95% CI: 0.86-1.16), compared to PEPL.PSPL at a large, free-standing pediatric health system had a lower probability of PP offer, activation, and usage than PEPL. Activation methods were not universally effective across language groups, emphasizing the need for equitable workflow optimization. This study highlights an approach to analyzing health disparities in activation workflows to inform targeted interventions to improve equitable PP access.
{"title":"EHR Workflows Contribute to Disparities by Language Preference in Parent Patient Portal Access.","authors":"Nymisha Chilukuri, Erin Ballard, Xuan Xu, Tom McPherson, Victor Ritter, Hannah K Bassett, Jennifer Carlson, Natalie M Pageler","doi":"10.1055/a-2777-1358","DOIUrl":"10.1055/a-2777-1358","url":null,"abstract":"<p><p>Identifying patient portals (PP) activation disparities, especially in electronic health record (EHR) activation workflows, can help facilitate equitable health care access.Our study aimed to assess whether the parent/guardian's preferred language was associated with being offered, activating, and using the PP and the methods used to offer activation codes.This retrospective cohort study examined PP offer, activation, and usage rates at a large freestanding children's hospital. Patients <12 years old with ambulatory visits from July 1, 2022, to June 30, 2023, without prior active proxy PP accounts were included. The primary independent variable was the self-reported parent/guardian preferred language (English/Spanish). Outcomes included the probability of being offered, overall and by specific offer methods, activation, and usage. Zou's modified multivariate Poisson regression models examined the association between preferred language and offer/activate/use status.Among 39,578 patients, 85.1% were patients with English as preferred language (PEPL) and 14.9% had Spanish as preferred language (PSPL). PSPL had a lower probability of being offered (adjusted relative risk ratio [aRR]: 0.65, 95% confidence interval [CI]: 0.63-0.67), activated (aRR: 0.72, 95% CI: 0.70-0.75), and used (aRR: 0.68, 95% CI: 0.65-0.72) a PP compared to PEPL. Specifically, PSPL had a lower probability of activating if ever offered via instant activation (aRR: 0.72, 95% CI: 0.69-0.75), parent/guardian with existing account (aRR: 0.73, 95% CI: 0.69-0.76), and had equal probability of activating if ever offered via letter (aRR: 0.42, 95% CI: 0.19-0.94) and clinician-assisted method (aRR: 0.99, 95% CI: 0.86-1.16), compared to PEPL.PSPL at a large, free-standing pediatric health system had a lower probability of PP offer, activation, and usage than PEPL. Activation methods were not universally effective across language groups, emphasizing the need for equitable workflow optimization. This study highlights an approach to analyzing health disparities in activation workflows to inform targeted interventions to improve equitable PP access.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"19-27"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The digital transformation of healthcare is reshaping care delivery among healthcare professionals, requiring nurses to develop digital competencies. These competencies are essential but often underdeveloped due to limited training and resources. Global initiatives emphasize integrating these competencies into nursing education, necessitating valid instruments to assess them.This systematic review aims to identify instruments measuring digital competence in nursing and to assess their measurement properties.This review was registered in PROSPERO (identifier: CRD42024522349) and conducted according to PRISMA guidelines. A systematic search was performed in CINAHL, PubMed/MEDLINE, and Scopus on instruments assessing digital competencies in nursing and reporting measurement properties. Measurement properties and their methodological quality were assessed using the COSMIN criteria, and the overall quality of the evidence was graded using a modified GRADE approach.A total of 27 instruments were identified, relating to three interconnected constructs: nursing informatics, digital health, and information and communication technology. Based on their measurement properties, the instruments were categorized into three groups (A, B, C) following the COSMIN methodology to support recommendations for use. Six instruments were classified under category A (recommended for use): the DigiHealthCom and DigiComInf instruments, the Turkish version of TANIC, the short version of ITASH, the Digital Competence Questionnaire, and the 30-item Arabic version of SANICS. Twenty instruments were categorized under category B (potentially recommendable, but further validation is needed). One instrument was placed in category C (not recommended for use).As digital competence becomes an increasing priority in education and public health, valid and reliable instruments are essential for assessing and monitoring these competencies. Such instruments support the identification of training needs, the evaluation of educational outcomes, and the integration of digital skills into nursing curricula and clinical practice, ultimately strengthening the digital readiness of the nursing workforce.
{"title":"Measurement Properties of Instruments Assessing Digital Competence in Nursing: A Systematic Review.","authors":"Fabio D'Agostino, Ilaria Erba, Elske Ammenwerth, Vered Robinzon, Gad Segal, Nissim Harel, Elisabetta Corvo, Refael Barkan, Hadas Lewy, Noemi Giannetta","doi":"10.1055/a-2780-7093","DOIUrl":"10.1055/a-2780-7093","url":null,"abstract":"<p><p>The digital transformation of healthcare is reshaping care delivery among healthcare professionals, requiring nurses to develop digital competencies. These competencies are essential but often underdeveloped due to limited training and resources. Global initiatives emphasize integrating these competencies into nursing education, necessitating valid instruments to assess them.This systematic review aims to identify instruments measuring digital competence in nursing and to assess their measurement properties.This review was registered in PROSPERO (identifier: CRD42024522349) and conducted according to PRISMA guidelines. A systematic search was performed in CINAHL, PubMed/MEDLINE, and Scopus on instruments assessing digital competencies in nursing and reporting measurement properties. Measurement properties and their methodological quality were assessed using the COSMIN criteria, and the overall quality of the evidence was graded using a modified GRADE approach.A total of 27 instruments were identified, relating to three interconnected constructs: nursing informatics, digital health, and information and communication technology. Based on their measurement properties, the instruments were categorized into three groups (A, B, C) following the COSMIN methodology to support recommendations for use. Six instruments were classified under category A (recommended for use): the DigiHealthCom and DigiComInf instruments, the Turkish version of TANIC, the short version of ITASH, the Digital Competence Questionnaire, and the 30-item Arabic version of SANICS. Twenty instruments were categorized under category B (potentially recommendable, but further validation is needed). One instrument was placed in category C (not recommended for use).As digital competence becomes an increasing priority in education and public health, valid and reliable instruments are essential for assessing and monitoring these competencies. Such instruments support the identification of training needs, the evaluation of educational outcomes, and the integration of digital skills into nursing curricula and clinical practice, ultimately strengthening the digital readiness of the nursing workforce.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"17 1","pages":"1-18"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-21DOI: 10.1055/a-2793-0977
Daria F Ferro, Marc Tobias, Leah H Carr, Pamela Wentz, Melissa Rodriguez, Casey Pitts, Emily Kane, Eric Shelov
Interruptive clinical decision support (CDS) alerts are intended to standardize patient care and prevent harm. However, failures can occur even in organizations with mature CDS governance and advanced analytics. These breakdowns, marked by excessive firings, workflow disruption, and clinician dissatisfaction, can provide insights into systemic weaknesses in CDS design, testing, and monitoring processes.This study aimed to examine a CDS alert malfunction as a lens for identifying system-level gaps and propose strategies to strengthen resilience in CDS operations.A retrospective analysis was conducted on an interruptive alert that was developed through a phased, multistakeholder, committee-driven process, but was removed within 10 days due to poor performance, revealing gaps that persisted despite established governance.The alert fired 1,866 times in 5 days, with a 91% dismissal rate and reports of workflow disruption. Feedback indicated provider frustration and concern for malfunction. Analysis revealed gaps in end-user engagement, testing rigor, committee reviews, and monitoring practices.CDS failures can serve as catalysts for system improvement. This case highlights actionable lessons, such as operationalizing user-centered design, clarifying testing expectations, and distributing monitoring responsibilities, to enhance CDS reliability. Even well-established governance structures must be continuously evaluated and adapted to keep pace with evolving CDS technologies, and such investments position organizations to maintain responsive, sustainable systems aligned with high-quality care.
{"title":"Leveraging 10 Days of Alert Malfunction to Improve Mature Organizational Clinical Decision Support Processes.","authors":"Daria F Ferro, Marc Tobias, Leah H Carr, Pamela Wentz, Melissa Rodriguez, Casey Pitts, Emily Kane, Eric Shelov","doi":"10.1055/a-2793-0977","DOIUrl":"10.1055/a-2793-0977","url":null,"abstract":"<p><p>Interruptive clinical decision support (CDS) alerts are intended to standardize patient care and prevent harm. However, failures can occur even in organizations with mature CDS governance and advanced analytics. These breakdowns, marked by excessive firings, workflow disruption, and clinician dissatisfaction, can provide insights into systemic weaknesses in CDS design, testing, and monitoring processes.This study aimed to examine a CDS alert malfunction as a lens for identifying system-level gaps and propose strategies to strengthen resilience in CDS operations.A retrospective analysis was conducted on an interruptive alert that was developed through a phased, multistakeholder, committee-driven process, but was removed within 10 days due to poor performance, revealing gaps that persisted despite established governance.The alert fired 1,866 times in 5 days, with a 91% dismissal rate and reports of workflow disruption. Feedback indicated provider frustration and concern for malfunction. Analysis revealed gaps in end-user engagement, testing rigor, committee reviews, and monitoring practices.CDS failures can serve as catalysts for system improvement. This case highlights actionable lessons, such as operationalizing user-centered design, clarifying testing expectations, and distributing monitoring responsibilities, to enhance CDS reliability. Even well-established governance structures must be continuously evaluated and adapted to keep pace with evolving CDS technologies, and such investments position organizations to maintain responsive, sustainable systems aligned with high-quality care.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":"46-51"},"PeriodicalIF":2.2,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-12-08DOI: 10.1055/a-2765-6930
Pouyan Esmaeilzadeh
Claude Opus 4 is a large language model (LLM) that features improved reasoning capabilities and broader contextual understanding compared to earlier versions. Despite the growing use of LLM systems for seeking medical information, structured and simulation-based evaluations of Claude Opus 4's capabilities in diabetes management remain limited, particularly across domains such as patient education, clinical reasoning, and emotional support.This study aimed to conduct a baseline evaluation of Claude Opus 4's performance across key domains of diabetes care (i.e., patient education, clinical reasoning, and emotional support), and to identify preliminary insights that can inform future, evidence-based integration strategies.A three-step evaluation was conducted: (1) 30 diabetes management questions assessed using expert endocrinologist evaluation, (2) five fictional diabetes cases evaluated for clinical decision-making, and (3) emotional support responses assessed for appropriateness and empathy. Three expert endocrinologists graded responses according to American Diabetes Association guidelines.Claude Opus 4 achieved 80% accuracy in general diabetes knowledge, with high response reproducibility (96.7%), indicating baseline rather than clinically adequate performance. Clinical case evaluations showed moderate utility (mean expert rating = 4.4/7), while emotional-support assessments yielded high scores for empathy (6.2/7) and appropriateness (6.0/7). These findings suggest that although the model demonstrates promising informational and emotional-support capabilities, its current performance remains insufficient for autonomous clinical use and should be viewed as preliminary evidence to guide future, patient-inclusive validation studies.Although Claude Opus 4 demonstrates preliminary findings suggesting potential applications in diabetes care, education, and emotional support, this baseline assessment using fictional cases underscores the need for real-world validation with clinical data to determine true clinical utility and patient-centered impact. This simulation-based evaluation also offers practical lessons learned for researchers designing future LLM assessments, highlighting the need for mixed expert-patient panels, contextual validation, and person-centered metrics beyond numerical accuracy.
背景:Claude Opus 4是一个大型语言模型(LLM),与早期版本相比,它具有改进的推理能力和更广泛的上下文理解。尽管越来越多地使用法学硕士系统来寻求医疗信息,但对Claude Opus 4在糖尿病管理方面的能力进行结构化和基于模拟的评估仍然有限,特别是在患者教育、临床推理和情感支持等领域。目的:对Claude Opus 4在糖尿病护理的关键领域(即患者教育、临床推理和情感支持)的表现进行基线评估,并确定初步见解,为未来的循证整合策略提供信息。方法:采用三步评估法:(1)采用内分泌专家评估法对30个糖尿病管理问题进行评估;(2)对5个虚构的糖尿病病例进行临床决策评估;(3)对情绪支持反应进行适当性和共情评估。三位内分泌专家根据美国糖尿病协会的指南对反应进行评分。结果:Claude Opus 4对一般糖尿病知识的准确度达到80%,反应重现性高(96.7%),表明基线而非临床表现足够。临床病例评估显示中等效用(专家平均评分为4.4/7),而情感支持评估在共情(6.2/7)和适当性(6.0/7)方面获得高分。这些发现表明,尽管该模型显示出有希望的信息和情感支持能力,但其目前的表现仍不足以用于自主临床应用,应被视为指导未来患者验证研究的初步证据。结论:虽然Claude Opus 4展示了初步研究结果,提示在糖尿病护理、教育和情感支持方面的潜在应用,但使用虚构病例的基线评估强调了用临床数据验证真实世界的必要性,以确定真正的临床效用和以患者为中心的影响。这种基于模拟的评估也为设计未来法学硕士评估的研究人员提供了实践经验,强调了混合专家-患者小组、上下文验证和以人为本的指标的需求,而不仅仅是数字准确性。
{"title":"Baseline Evaluation of Claude Opus 4 for Diabetes Management: A Preliminary Assessment and Lessons for Implementation.","authors":"Pouyan Esmaeilzadeh","doi":"10.1055/a-2765-6930","DOIUrl":"10.1055/a-2765-6930","url":null,"abstract":"<p><p>Claude Opus 4 is a large language model (LLM) that features improved reasoning capabilities and broader contextual understanding compared to earlier versions. Despite the growing use of LLM systems for seeking medical information, structured and simulation-based evaluations of Claude Opus 4's capabilities in diabetes management remain limited, particularly across domains such as patient education, clinical reasoning, and emotional support.This study aimed to conduct a baseline evaluation of Claude Opus 4's performance across key domains of diabetes care (i.e., patient education, clinical reasoning, and emotional support), and to identify preliminary insights that can inform future, evidence-based integration strategies.A three-step evaluation was conducted: (1) 30 diabetes management questions assessed using expert endocrinologist evaluation, (2) five fictional diabetes cases evaluated for clinical decision-making, and (3) emotional support responses assessed for appropriateness and empathy. Three expert endocrinologists graded responses according to American Diabetes Association guidelines.Claude Opus 4 achieved 80% accuracy in general diabetes knowledge, with high response reproducibility (96.7%), indicating baseline rather than clinically adequate performance. Clinical case evaluations showed moderate utility (mean expert rating = 4.4/7), while emotional-support assessments yielded high scores for empathy (6.2/7) and appropriateness (6.0/7). These findings suggest that although the model demonstrates promising informational and emotional-support capabilities, its current performance remains insufficient for autonomous clinical use and should be viewed as preliminary evidence to guide future, patient-inclusive validation studies.Although Claude Opus 4 demonstrates preliminary findings suggesting potential applications in diabetes care, education, and emotional support, this baseline assessment using fictional cases underscores the need for real-world validation with clinical data to determine true clinical utility and patient-centered impact. This simulation-based evaluation also offers practical lessons learned for researchers designing future LLM assessments, highlighting the need for mixed expert-patient panels, contextual validation, and person-centered metrics beyond numerical accuracy.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":"1881-1891"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-12-18DOI: 10.1055/a-2765-7021
Kevin D Smith, Riley Boland, Matthew Cerasale, Cheng-Kai Kao
Clinical documentation improvement is critical for pediatric care, yet leveraging electronic health record (EHR) tools for this population is not well established. We aimed to adapt and implement a real-time, automated documentation assistance tool (AutoDx) to decrease clinical documentation integrity (CDI) coding queries and improve perceived ease of practice for pediatric inpatient providers.In this quality improvement study at an urban academic pediatric hospital, we adapted and implemented AutoDx for pediatric use by developing and validating pediatric-specific logic rules to alert providers to potential diagnoses based on EHR data. The primary outcome was the rate of CDI queries per 1,000 discharges for targeted diagnoses, aiming for a 50% reduction over a 5-month implementation period compared with a 12-month baseline. Secondary outcomes included provider-surveyed ease of practice, with a goal of a 25% improvement, and tool uptake.The aggregate rate of targeted CDI queries decreased by 58% postimplementation, from 80.7 to 33.9 per 1,000 discharges (p < 0.001). Moreover, analysis by interrupted time series demonstrated an immediate 45.5% reduction in the rate of coding queries (p = 0.028) following the implementation of the tool. The rate of queries for nontargeted diagnoses remained unchanged. Tool adoption increased steadily throughout the study period. While provider-reported time spent on queries did not significantly decrease, a majority of survey respondents (59%) perceived receiving fewer queries, and 46% agreed the tool made it easier to provide quality care.Implementation of a real-time, automated documentation support tool in a pediatric inpatient setting significantly reduced CDI coding queries for targeted diagnoses. Despite a "task substitution" effect where perceived workload did not decrease, the tool improved perceived ease of practice, demonstrating that targeted EHR interventions can enhance documentation accuracy and efficiency in pediatrics.
临床文件的改进对儿科护理至关重要,但利用电子健康记录(EHR)工具为这一人群服务还没有很好地建立起来。我们的目标是适应和实现一个实时、自动化文档辅助工具(AutoDx),以减少临床文档完整性(CDI)编码查询,并提高儿科住院医生实践的易用性。在这个城市学术儿科医院的质量改进研究中,我们通过开发和验证儿科特定的逻辑规则来提醒提供者基于EHR数据的潜在诊断,从而适应并实施了AutoDx用于儿科。主要结果是针对目标诊断的每1000例出院患者的CDI查询率,目标是在5个月的实施期内与12个月的基线相比减少50%。次要结果包括供应商调查的操作便利性,目标是提高25%,以及工具使用率。实施该工具后,目标CDI查询的总比率下降了58%,从每1,000次查询80.7次下降到33.9次(p p = 0.028)。非目标诊断的查询率保持不变。在整个研究期间,工具的采用稳步增加。虽然提供者报告的查询时间并没有显著减少,但大多数受访者(59%)认为收到的查询减少了,46%的受访者认为该工具更容易提供高质量的护理。在儿科住院患者设置中实现实时、自动化文档支持工具可显著减少针对目标诊断的CDI编码查询。尽管存在“任务替代”效应,即感知到的工作量没有减少,但该工具提高了感知到的实践便利性,表明有针对性的电子病历干预可以提高儿科文档的准确性和效率。
{"title":"Improving Provider Documentation Using a Pediatric Automated Documentation Assistance Tool.","authors":"Kevin D Smith, Riley Boland, Matthew Cerasale, Cheng-Kai Kao","doi":"10.1055/a-2765-7021","DOIUrl":"10.1055/a-2765-7021","url":null,"abstract":"<p><p>Clinical documentation improvement is critical for pediatric care, yet leveraging electronic health record (EHR) tools for this population is not well established. We aimed to adapt and implement a real-time, automated documentation assistance tool (AutoDx) to decrease clinical documentation integrity (CDI) coding queries and improve perceived ease of practice for pediatric inpatient providers.In this quality improvement study at an urban academic pediatric hospital, we adapted and implemented AutoDx for pediatric use by developing and validating pediatric-specific logic rules to alert providers to potential diagnoses based on EHR data. The primary outcome was the rate of CDI queries per 1,000 discharges for targeted diagnoses, aiming for a 50% reduction over a 5-month implementation period compared with a 12-month baseline. Secondary outcomes included provider-surveyed ease of practice, with a goal of a 25% improvement, and tool uptake.The aggregate rate of targeted CDI queries decreased by 58% postimplementation, from 80.7 to 33.9 per 1,000 discharges (<i>p</i> < 0.001). Moreover, analysis by interrupted time series demonstrated an immediate 45.5% reduction in the rate of coding queries (<i>p</i> = 0.028) following the implementation of the tool. The rate of queries for nontargeted diagnoses remained unchanged. Tool adoption increased steadily throughout the study period. While provider-reported time spent on queries did not significantly decrease, a majority of survey respondents (59%) perceived receiving fewer queries, and 46% agreed the tool made it easier to provide quality care.Implementation of a real-time, automated documentation support tool in a pediatric inpatient setting significantly reduced CDI coding queries for targeted diagnoses. Despite a \"task substitution\" effect where perceived workload did not decrease, the tool improved perceived ease of practice, demonstrating that targeted EHR interventions can enhance documentation accuracy and efficiency in pediatrics.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"16 5","pages":"1900-1908"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}