首页 > 最新文献

JAMIA Open最新文献

英文 中文
Measles Tracker: a near-real-time data hub for measles surveillance. 麻疹追踪器:麻疹监测的近实时数据中心。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-27 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf062
Francesco Branda, Maria Tomasso, Mohamed Mustaf Ahmed, Massimo Ciccozzi, Fabio Scarpa

Objectives: Measles continues to pose a serious threat to global public health, fueled by declining vaccination rates, international travel, and persistent immunization gaps. Early outbreak detection and response remain hampered by fragmented surveillance systems, which often lack interoperability and limit data accessibility.

Materials and methods: To address the major limitations of current measles surveillance systems-including data fragmentation and lack of standardization-we developed Measles Tracker, an integrated near-real-time data hub that centralizes and harmonizes measles surveillance data in the United States using publicly available sources. The system aggregates data from multiple layers, including: (1) official reports from public health agencies, (2) epidemiological surveillance bulletins, and (3) outbreak reports, mainly captured through news websites or via news aggregators. The platform architecture implements (1) geospatial normalization of key epidemiological variables (case counts, vaccination coverage, age-stratified incidence) and (2) dynamic visualization interfaces to support coordination of evidence-based response.

Results: Measles Tracker enhances situational awareness by integrating disparate data streams in near real-time, enabling rapid geospatial detection of outbreak clusters, mapping vaccination gaps, and supporting dynamic risk stratification of vulnerable populations. It is intended exclusively as a complementary tool to official public health systems, providing educational and situational awareness without interfering with contact tracing, vaccination, or outbreak control activities.

Conclusions: As a centralized, scalable tool, Measles Tracker advances measles surveillance by leveraging digital epidemiology principles. Future iterations will incorporate additional data streams (eg, climate variables, genomic surveillance) and advanced analytics (eg, machine learning for risk prediction, network models for transmission dynamics) to further optimize outbreak preparedness and resource allocation. This framework underscores the transformative potential of integrated data systems in global measles elimination efforts.

目标:由于疫苗接种率下降、国际旅行和免疫差距持续存在,麻疹继续对全球公共卫生构成严重威胁。早期发现和应对疫情仍然受到分散的监测系统的阻碍,这些系统往往缺乏互操作性,限制了数据的可访问性。材料和方法:为了解决当前麻疹监测系统的主要局限性,包括数据碎片化和缺乏标准化,我们开发了麻疹追踪器,这是一个综合的近实时数据中心,利用公开来源集中和协调美国的麻疹监测数据。该系统收集了多个层面的数据,包括:(1)公共卫生机构的官方报告,(2)流行病学监测公报,(3)疫情报告,主要通过新闻网站或新闻聚合器获取。该平台架构实现了(1)关键流行病学变量(病例数、疫苗接种覆盖率、年龄分层发病率)的地理空间归一化和(2)动态可视化界面,以支持循证应对的协调。结果:麻疹追踪器通过近乎实时地整合不同的数据流,增强态势感知能力,实现疫情集群的快速地理空间检测,绘制疫苗接种差距,并支持弱势群体的动态风险分层。它完全是作为官方公共卫生系统的补充工具,在不干扰接触者追踪、疫苗接种或疫情控制活动的情况下提供教育和态势感知。结论:作为一种集中式、可扩展的工具,麻疹追踪器通过利用数字流行病学原理推进麻疹监测。未来的迭代将纳入更多的数据流(例如,气候变量、基因组监测)和高级分析(例如,用于风险预测的机器学习、传播动力学的网络模型),以进一步优化疫情准备和资源分配。该框架强调了综合数据系统在全球消除麻疹工作中的变革潜力。
{"title":"Measles Tracker: a near-real-time data hub for measles surveillance.","authors":"Francesco Branda, Maria Tomasso, Mohamed Mustaf Ahmed, Massimo Ciccozzi, Fabio Scarpa","doi":"10.1093/jamiaopen/ooaf062","DOIUrl":"10.1093/jamiaopen/ooaf062","url":null,"abstract":"<p><strong>Objectives: </strong>Measles continues to pose a serious threat to global public health, fueled by declining vaccination rates, international travel, and persistent immunization gaps. Early outbreak detection and response remain hampered by fragmented surveillance systems, which often lack interoperability and limit data accessibility.</p><p><strong>Materials and methods: </strong>To address the major limitations of current measles surveillance systems-including data fragmentation and lack of standardization-we developed Measles Tracker, an integrated near-real-time data hub that centralizes and harmonizes measles surveillance data in the United States using publicly available sources. The system aggregates data from multiple layers, including: (1) official reports from public health agencies, (2) epidemiological surveillance bulletins, and (3) outbreak reports, mainly captured through news websites or via news aggregators. The platform architecture implements (1) geospatial normalization of key epidemiological variables (case counts, vaccination coverage, age-stratified incidence) and (2) dynamic visualization interfaces to support coordination of evidence-based response.</p><p><strong>Results: </strong>Measles Tracker enhances situational awareness by integrating disparate data streams in near real-time, enabling rapid geospatial detection of outbreak clusters, mapping vaccination gaps, and supporting dynamic risk stratification of vulnerable populations. It is intended exclusively as a complementary tool to official public health systems, providing educational and situational awareness without interfering with contact tracing, vaccination, or outbreak control activities.</p><p><strong>Conclusions: </strong>As a centralized, scalable tool, Measles Tracker advances measles surveillance by leveraging digital epidemiology principles. Future iterations will incorporate additional data streams (eg, climate variables, genomic surveillance) and advanced analytics (eg, machine learning for risk prediction, network models for transmission dynamics) to further optimize outbreak preparedness and resource allocation. This framework underscores the transformative potential of integrated data systems in global measles elimination efforts.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf062"},"PeriodicalIF":2.5,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203508/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative analysis of machine learning models and human expertise for nursing intervention classification. 护理干预分类中机器学习模型与人类专业知识的比较分析。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-27 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf057
Jerome Niyirora, Lynne Longtin, Cynthia Grabski, David Patrishkoff, Andriana Semko

Objective: This study compares the performance of machine learning (ML) models and human experts in mapping unstructured nursing notes to the standardized Nursing Interventions Classification (NIC) system. The aim is to advance automated nursing documentation classification, facilitating cross-facility benchmarking of patient care and organizational outcomes.

Materials and methods: We developed and compared 4 ML models: TF-IDF text-based vectorization, UMLS semantic mapping, fine-tuned GPT-4o mini, and Bio-Clinical BERT. These models were evaluated against classifications provided by 2 expert nurses using a dataset of de-identified home healthcare nursing notes obtained from a Florida, USA-based medical clearinghouse. Model performance was assessed using agreement statistics, precision, recall, F1 scores, and Cohen's Kappa.

Results: Human raters achieved the highest agreement with consensus labels, scoring 0.75 and 0.62, with corresponding F1 scores of 0.61 and 0.45, respectively. In comparison, ML models showed lower performance, with GPT achieving the best among them (agreement: 0.50, F1 score: 0.31). A distribution analysis of NIC categories revealed that ML models performed well in prevalent and clearly defined categories, such as drug management, but struggled with minority classes and context-dependent interventions, like information management.

Discussion: Current ML approaches show promise in supporting clinical classification tasks, but the performance gap in handling complex, context-dependent interventions highlights the need for improved methods that can better capture the nuanced nature of clinical documentation. Future research should focus on developing methods to process clinical terminology and context-specific documentation with greater precision and adaptability.

Conclusion: Current ML models can aid-but not fully replace-human judgment in classifying nuanced nursing interventions.

目的:比较机器学习(ML)模型和人类专家在将非结构化护理笔记映射到标准化护理干预分类(NIC)系统中的表现。目的是推进自动化护理文件分类,促进患者护理和组织结果的跨设施基准。材料和方法:我们开发并比较了4种ML模型:基于TF-IDF文本的矢量化,UMLS语义映射,微调gpt - 40mini和生物临床BERT。这些模型是根据2名专家护士提供的分类进行评估的,这些分类使用了从美国佛罗里达州的医疗信息交换所获得的去识别的家庭保健护理笔记数据集。使用协议统计、精度、召回率、F1分数和Cohen’s Kappa来评估模型性能。结果:人类评分者与共识标签的一致性最高,得分分别为0.75和0.62,相应的F1得分分别为0.61和0.45。相比之下,ML模型的性能较低,其中GPT达到最佳(一致性:0.50,F1分数:0.31)。对NIC类别的分布分析显示,ML模型在流行和明确定义的类别(如药物管理)中表现良好,但在少数类别和上下文相关干预(如信息管理)中表现不佳。讨论:当前的机器学习方法在支持临床分类任务方面显示出希望,但是在处理复杂的、上下文相关的干预措施方面的性能差距突出了对改进方法的需求,这些方法可以更好地捕捉临床文档的细微差别。未来的研究应侧重于开发处理临床术语和上下文特定文件的方法,以更高的精度和适应性。结论:目前的机器学习模型可以帮助-但不能完全取代-人类对细致护理干预的分类判断。
{"title":"A comparative analysis of machine learning models and human expertise for nursing intervention classification.","authors":"Jerome Niyirora, Lynne Longtin, Cynthia Grabski, David Patrishkoff, Andriana Semko","doi":"10.1093/jamiaopen/ooaf057","DOIUrl":"10.1093/jamiaopen/ooaf057","url":null,"abstract":"<p><strong>Objective: </strong>This study compares the performance of machine learning (ML) models and human experts in mapping unstructured nursing notes to the standardized Nursing Interventions Classification (NIC) system. The aim is to advance automated nursing documentation classification, facilitating cross-facility benchmarking of patient care and organizational outcomes.</p><p><strong>Materials and methods: </strong>We developed and compared 4 ML models: TF-IDF text-based vectorization, UMLS semantic mapping, fine-tuned GPT-4o mini, and Bio-Clinical BERT. These models were evaluated against classifications provided by 2 expert nurses using a dataset of de-identified home healthcare nursing notes obtained from a Florida, USA-based medical clearinghouse. Model performance was assessed using agreement statistics, precision, recall, F1 scores, and Cohen's Kappa.</p><p><strong>Results: </strong>Human raters achieved the highest agreement with consensus labels, scoring 0.75 and 0.62, with corresponding F1 scores of 0.61 and 0.45, respectively. In comparison, ML models showed lower performance, with GPT achieving the best among them (agreement: 0.50, F1 score: 0.31). A distribution analysis of NIC categories revealed that ML models performed well in prevalent and clearly defined categories, such as drug management, but struggled with minority classes and context-dependent interventions, like information management.</p><p><strong>Discussion: </strong>Current ML approaches show promise in supporting clinical classification tasks, but the performance gap in handling complex, context-dependent interventions highlights the need for improved methods that can better capture the nuanced nature of clinical documentation. Future research should focus on developing methods to process clinical terminology and context-specific documentation with greater precision and adaptability.</p><p><strong>Conclusion: </strong>Current ML models can aid-but not fully replace-human judgment in classifying nuanced nursing interventions.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf057"},"PeriodicalIF":2.5,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of the International Classification of Health Interventions for coding interventions in adults with sensorineural hearing loss. 国际健康干预分类在成人感音神经性听力损失患者编码干预中的应用。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-27 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf063
Faheema Mahomed-Asmail, Ilze Oosthuizen, Catherine Sykes, Soraya Maart, Richard Madden, De Wet Swanepoel, Vinaya Manchaiah

Objective: The International Classification of Health Interventions (ICHI), currently being developed, seeks to span all sectors of the health system. Our objective was to determine the coverage of the ICHI for hearing interventions commonly delivered to adults with sensorineural hearing loss (SNHL).

Material and methods: A 3-phase content mapping method was used, which included (1) identification of source terms with an expert panel in audiology rehabilitation; (2) 3 coders independently applied the classification to the source terms; and (3) the coders reached a consensus for each intervention and identified reasons for initial discrepancies with options not linked to a specific code were identified.

Results: Nineteen different ICHI Target categories were identified, with 23 different ICHI Action categories and 82% of the means being "Other and unspecified." There was consensus in codes for 54.3% of source terms, with no ICHI code found for 8.5% of source terms. The greatest number of discrepancies arose from the action, followed by the target. Coding discrepancies occurred as a result of misunderstanding of source terms, the clinical use thereof, and difficulty determining the type of Target.

Discussion: Despite its broad scope, ICHI's current framework has gaps in its coverage of audiological interventions, particularly those related to sensorineural hearing loss. Addressing these gaps is crucial for improving global data standardization and facilitating the development of more targeted hearing health policies.

Conclusion: This study makes an important contribution to the further development and refinement of the classification, specifically in the context of hearing healthcare.

目标:目前正在制定的《国际卫生干预措施分类》力求涵盖卫生系统的所有部门。我们的目的是确定ICHI对成人感音神经性听力损失(SNHL)听力干预的覆盖范围。材料和方法:采用三阶段内容映射法,包括(1)与听力学康复专家小组识别源项;(2) 3个编码器独立对源项进行分类;(3)编码人员对每个干预措施达成共识,并确定了与特定代码不相关的选项初始差异的原因。结果:确定了19种不同的ICHI目标类别,23种不同的ICHI动作类别,82%的手段是“其他和未指定的”。54.3%的源项的代码是一致的,8.5%的源项没有找到ICHI代码。最多的差异来自行动,其次是目标。编码差异的发生是由于对源术语的误解、临床使用以及难以确定目标类型造成的。讨论:尽管其范围广泛,但ICHI目前的框架在听力学干预方面存在差距,特别是与感音神经性听力损失相关的听力学干预。解决这些差距对于改善全球数据标准化和促进制定更有针对性的听力卫生政策至关重要。结论:本研究为进一步发展和完善该分类,特别是在听力保健方面做出了重要贡献。
{"title":"Application of the International Classification of Health Interventions for coding interventions in adults with sensorineural hearing loss.","authors":"Faheema Mahomed-Asmail, Ilze Oosthuizen, Catherine Sykes, Soraya Maart, Richard Madden, De Wet Swanepoel, Vinaya Manchaiah","doi":"10.1093/jamiaopen/ooaf063","DOIUrl":"10.1093/jamiaopen/ooaf063","url":null,"abstract":"<p><strong>Objective: </strong>The International Classification of Health Interventions (ICHI), currently being developed, seeks to span all sectors of the health system. Our objective was to determine the coverage of the ICHI for hearing interventions commonly delivered to adults with sensorineural hearing loss (SNHL).</p><p><strong>Material and methods: </strong>A 3-phase content mapping method was used, which included (1) identification of source terms with an expert panel in audiology rehabilitation; (2) 3 coders independently applied the classification to the source terms; and (3) the coders reached a consensus for each intervention and identified reasons for initial discrepancies with options not linked to a specific code were identified.</p><p><strong>Results: </strong>Nineteen different ICHI Target categories were identified, with 23 different ICHI Action categories and 82% of the means being \"Other and unspecified.\" There was consensus in codes for 54.3% of source terms, with no ICHI code found for 8.5% of source terms. The greatest number of discrepancies arose from the action, followed by the target. Coding discrepancies occurred as a result of misunderstanding of source terms, the clinical use thereof, and difficulty determining the type of Target.</p><p><strong>Discussion: </strong>Despite its broad scope, ICHI's current framework has gaps in its coverage of audiological interventions, particularly those related to sensorineural hearing loss. Addressing these gaps is crucial for improving global data standardization and facilitating the development of more targeted hearing health policies.</p><p><strong>Conclusion: </strong>This study makes an important contribution to the further development and refinement of the classification, specifically in the context of hearing healthcare.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf063"},"PeriodicalIF":2.5,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203548/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Establishing data governance for sharing and access to real-world data: a case study. 为共享和访问真实数据建立数据治理:一个案例研究。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-23 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf041
Heath A Davis, Diva Kerkman, Asher A Hoberg, Michele Countryman, Wendy Beaver, Kiley Bybee, James M Blum, Boyd M Knosp

Importance: Data governance, the policies, and procedures for managing data, is a critical factor for secondary use of clinical data for research.

Objectives: This paper describes the evolution of an academic health-care organization's data governance for research, development of an external data sharing process, implementation of related processes, continuous improvement, and ongoing observations of data governance maturity.

Materials and methods: The program was designed to improve the access to and sharing of real-world data for research. Using a combination of qualitative and quantitative methods, we evaluated the program's effectiveness.

Results: Our results describe a significant improvement in data accessibility as seen in new data-driven performance indicators and in data understanding indicated by new processes, policies, and strategies.

Discussion: The paper outlines the development of a data governance process at an academic health center to support external data sharing, emphasizing the importance of data literacy, cross-office collaboration, and structured workflows to manage complex review requirements. The formalized process improved data access, identified gaps, and enabled continuous quality improvement, though it introduced new bottlenecks and required navigating multi-office reviews and researcher education.

Conclusion: These findings suggest data governance practices that may apply to other institutions.

重要性:数据治理,即管理数据的政策和程序,是临床数据用于研究的二次使用的关键因素。目的:本文描述了学术医疗保健组织用于研究的数据治理的演变、外部数据共享流程的开发、相关流程的实施、持续改进以及对数据治理成熟度的持续观察。材料和方法:该计划旨在改善对真实世界研究数据的访问和共享。采用定性和定量相结合的方法,我们评估了该计划的有效性。结果:我们的结果描述了数据可访问性的显著改善,这体现在新的数据驱动性能指标和新流程、政策和战略所指示的数据理解上。讨论:本文概述了在学术医疗中心开发数据治理流程以支持外部数据共享,强调了数据素养、跨办公室协作和结构化工作流程的重要性,以管理复杂的审查需求。形式化的过程改进了数据访问,确定了差距,并实现了持续的质量改进,尽管它引入了新的瓶颈,并需要导航多办公室审查和研究人员教育。结论:这些发现表明数据治理实践可能适用于其他机构。
{"title":"Establishing data governance for sharing and access to real-world data: a case study.","authors":"Heath A Davis, Diva Kerkman, Asher A Hoberg, Michele Countryman, Wendy Beaver, Kiley Bybee, James M Blum, Boyd M Knosp","doi":"10.1093/jamiaopen/ooaf041","DOIUrl":"10.1093/jamiaopen/ooaf041","url":null,"abstract":"<p><strong>Importance: </strong>Data governance, the policies, and procedures for managing data, is a critical factor for secondary use of clinical data for research.</p><p><strong>Objectives: </strong>This paper describes the evolution of an academic health-care organization's data governance for research, development of an external data sharing process, implementation of related processes, continuous improvement, and ongoing observations of data governance maturity.</p><p><strong>Materials and methods: </strong>The program was designed to improve the access to and sharing of real-world data for research. Using a combination of qualitative and quantitative methods, we evaluated the program's effectiveness.</p><p><strong>Results: </strong>Our results describe a significant improvement in data accessibility as seen in new data-driven performance indicators and in data understanding indicated by new processes, policies, and strategies.</p><p><strong>Discussion: </strong>The paper outlines the development of a data governance process at an academic health center to support external data sharing, emphasizing the importance of data literacy, cross-office collaboration, and structured workflows to manage complex review requirements. The formalized process improved data access, identified gaps, and enabled continuous quality improvement, though it introduced new bottlenecks and required navigating multi-office reviews and researcher education.</p><p><strong>Conclusion: </strong>These findings suggest data governance practices that may apply to other institutions.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf041"},"PeriodicalIF":3.4,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206003/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of falls detected by natural language processing algorithm and not coded external cause of morbidity. 评估由自然语言处理算法检测的跌倒,没有编码的外部致病原因。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-20 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf047
Daniel J Hekman, Apoorva P Maru, Hanna J Barton, Douglas Wiegmann, Manish N Shah, Amy L Cochran, Erkin Ötleş, Brian W Patterson

Objective: Falls are a leading cause of morbidity and mortality among older adults. Common methods for identifying fall-related ED visits within both claims and electronic health record datasets rely on diagnosis code-based definitions, which underestimate the true prevalence of falls. This study applies a natural language processing (NLP) algorithm to ED provider notes to identify patients presenting due to falls and compares the characteristics of NLP-identified cases to those identified through diagnosis codes to identify the impact of identification strategy.

Materials and methods: This cross-sectional study analyzed ED encounter data from older adult patients who visited an ED between December 2016 and 2020. The NLP algorithm identified falls based on provider notes, searching for keywords related to falls and excluding negated and spurious matches. We also applied common ICD code methods to identify falls.

Results: We processed 50 153 ED encounters and the NLP approach identified 14 604 encounters for patients who fell. Of those, 7086 (49%) were not identified using external cause of morbidity ICD codes. Patients identified by just the NLP algorithm exhibited higher Elixhauser comorbidity scores and increased likelihood of 30-day mortality. Patients identified by NLP algorithm but not ICD codes were more likely to have severe underlying conditions such as sepsis or acute kidney disease rather than traumatic injuries.

Discussion: The NLP algorithm identifies many fall-related visits not identified by traditional methods.

Conclusion: If the causal relationships between falls and comorbid conditions are not considered in NLP algorithms, they can easily identify patients who fell, but the fall was a sequela of underlying medical illness.

目的:跌倒是老年人发病和死亡的主要原因。在索赔和电子健康记录数据集中识别与跌倒相关的急诊科就诊的常用方法依赖于基于诊断代码的定义,这低估了跌倒的真实患病率。本研究将自然语言处理(NLP)算法应用于急诊医生的记录,以识别因跌倒而就诊的患者,并将NLP识别的病例的特征与通过诊断代码识别的病例的特征进行比较,以确定识别策略的影响。材料和方法:本横断面研究分析了2016年12月至2020年12月期间访问ED的老年患者的ED遭遇数据。NLP算法根据提供者的说明识别瀑布,搜索与瀑布相关的关键字,并排除否定和虚假匹配。我们还应用了常见的ICD编码方法来识别跌倒。结果:我们处理了50 153例ED遭遇,NLP方法确定了14 604例跌倒患者。其中,7086例(49%)未使用ICD编码确定发病外因。仅通过NLP算法识别的患者表现出更高的Elixhauser合并症评分和30天死亡率增加的可能性。通过NLP算法而非ICD代码识别的患者更有可能患有严重的潜在疾病,如败血症或急性肾脏疾病,而不是创伤性损伤。讨论:NLP算法识别了许多传统方法无法识别的与跌倒相关的访问。结论:如果在NLP算法中不考虑跌倒与合并症之间的因果关系,它们可以很容易地识别跌倒的患者,但跌倒是潜在医学疾病的后遗症。
{"title":"Evaluation of falls detected by natural language processing algorithm and not coded external cause of morbidity.","authors":"Daniel J Hekman, Apoorva P Maru, Hanna J Barton, Douglas Wiegmann, Manish N Shah, Amy L Cochran, Erkin Ötleş, Brian W Patterson","doi":"10.1093/jamiaopen/ooaf047","DOIUrl":"10.1093/jamiaopen/ooaf047","url":null,"abstract":"<p><strong>Objective: </strong>Falls are a leading cause of morbidity and mortality among older adults. Common methods for identifying fall-related ED visits within both claims and electronic health record datasets rely on diagnosis code-based definitions, which underestimate the true prevalence of falls. This study applies a natural language processing (NLP) algorithm to ED provider notes to identify patients presenting due to falls and compares the characteristics of NLP-identified cases to those identified through diagnosis codes to identify the impact of identification strategy.</p><p><strong>Materials and methods: </strong>This cross-sectional study analyzed ED encounter data from older adult patients who visited an ED between December 2016 and 2020. The NLP algorithm identified falls based on provider notes, searching for keywords related to falls and excluding negated and spurious matches. We also applied common ICD code methods to identify falls.</p><p><strong>Results: </strong>We processed 50 153 ED encounters and the NLP approach identified 14 604 encounters for patients who fell. Of those, 7086 (49%) were not identified using external cause of morbidity ICD codes. Patients identified by just the NLP algorithm exhibited higher Elixhauser comorbidity scores and increased likelihood of 30-day mortality. Patients identified by NLP algorithm but not ICD codes were more likely to have severe underlying conditions such as sepsis or acute kidney disease rather than traumatic injuries.</p><p><strong>Discussion: </strong>The NLP algorithm identifies many fall-related visits not identified by traditional methods.</p><p><strong>Conclusion: </strong>If the causal relationships between falls and comorbid conditions are not considered in NLP algorithms, they can easily identify patients who fell, but the fall was a sequela of underlying medical illness.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf047"},"PeriodicalIF":3.4,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducible generative artificial intelligence evaluation for health care: a clinician-in-the-loop approach. 医疗保健的可再生生成人工智能评估:临床医生在循环中的方法。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-16 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf054
Leah Livingston, Amber Featherstone-Uwague, Amanda Barry, Kenneth Barretto, Tara Morey, Drahomira Herrmannova, Venkatesh Avula

Objectives: To develop and apply a reproducible methodology for evaluating generative artificial intelligence (AI) powered systems in health care, addressing the gap between theoretical evaluation frameworks and practical implementation guidance.

Materials and methods: A 5-dimension evaluation framework was developed to assess query comprehension and response helpfulness, correctness, completeness, and potential clinical harm. The framework was applied to evaluate ClinicalKey AI using queries drawn from user logs, a benchmark dataset, and subject matter expert curated queries. Forty-one board-certified physicians and pharmacists were recruited to independently evaluate query-response pairs. An agreement protocol using the mode and modified Delphi method resolved disagreements in evaluation scores.

Results: Of 633 queries, 614 (96.99%) produced evaluable responses, with subject matter experts completing evaluations of 426 query-response pairs. Results demonstrated high rates of response correctness (95.5%) and query comprehension (98.6%), with 94.4% of responses rated as helpful. Two responses (0.47%) received scores indicating potential clinical harm. Pairwise consensus occurred in 60.6% of evaluations, with remaining cases requiring third tie-breaker review.

Discussion: The framework demonstrated effectiveness in quantifying performance through comprehensive evaluation dimensions and structured scoring resolution methods. Key strengths included representative query sampling, standardized rating scales, and robust subject matter expert agreement protocols. Challenges emerged in managing subjective assessments of open-ended responses and achieving consensus on potential harm classification.

Conclusion: This framework offers a reproducible methodology for evaluating health-care generative AI systems, establishing foundational processes that can inform future efforts while supporting the implementation of generative AI applications in clinical settings.

目的:开发和应用一种可重复的方法来评估卫生保健中的生成式人工智能(AI)驱动系统,解决理论评估框架和实际实施指导之间的差距。材料和方法:开发了一个5维评估框架来评估查询理解和响应的帮助性、正确性、完整性和潜在的临床危害。使用从用户日志、基准数据集和主题专家策划的查询中提取的查询,应用该框架来评估ClinicalKey AI。41名委员会认证的医生和药剂师被招募来独立评估询问-回应对。采用模型和改进的德尔菲法的协议协议解决了评价分数的分歧。结果:在633个查询中,614个(96.99%)产生了可评估的回复,主题专家完成了426个查询-回复对的评估。结果显示了较高的回答正确性(95.5%)和查询理解率(98.6%),其中94.4%的回答被评为有帮助。2个应答(0.47%)获得潜在临床危害评分。60.6%的评估出现两两共识,其余病例需要第三次决胜审查。讨论:该框架通过综合评价维度和结构化评分解决方法证明了量化绩效的有效性。主要优势包括代表性查询抽样、标准化评级尺度和健壮的主题专家协议协议。在管理开放式答复的主观评估和就潜在危害分类达成共识方面出现了挑战。结论:该框架为评估卫生保健生成式人工智能系统提供了可重复的方法,建立了基础流程,可以为未来的工作提供信息,同时支持在临床环境中实施生成式人工智能应用。
{"title":"Reproducible generative artificial intelligence evaluation for health care: a clinician-in-the-loop approach.","authors":"Leah Livingston, Amber Featherstone-Uwague, Amanda Barry, Kenneth Barretto, Tara Morey, Drahomira Herrmannova, Venkatesh Avula","doi":"10.1093/jamiaopen/ooaf054","DOIUrl":"10.1093/jamiaopen/ooaf054","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and apply a reproducible methodology for evaluating generative artificial intelligence (AI) powered systems in health care, addressing the gap between theoretical evaluation frameworks and practical implementation guidance.</p><p><strong>Materials and methods: </strong>A 5-dimension evaluation framework was developed to assess query comprehension and response helpfulness, correctness, completeness, and potential clinical harm. The framework was applied to evaluate ClinicalKey AI using queries drawn from user logs, a benchmark dataset, and subject matter expert curated queries. Forty-one board-certified physicians and pharmacists were recruited to independently evaluate query-response pairs. An agreement protocol using the mode and modified Delphi method resolved disagreements in evaluation scores.</p><p><strong>Results: </strong>Of 633 queries, 614 (96.99%) produced evaluable responses, with subject matter experts completing evaluations of 426 query-response pairs. Results demonstrated high rates of response correctness (95.5%) and query comprehension (98.6%), with 94.4% of responses rated as helpful. Two responses (0.47%) received scores indicating potential clinical harm. Pairwise consensus occurred in 60.6% of evaluations, with remaining cases requiring third tie-breaker review.</p><p><strong>Discussion: </strong>The framework demonstrated effectiveness in quantifying performance through comprehensive evaluation dimensions and structured scoring resolution methods. Key strengths included representative query sampling, standardized rating scales, and robust subject matter expert agreement protocols. Challenges emerged in managing subjective assessments of open-ended responses and achieving consensus on potential harm classification.</p><p><strong>Conclusion: </strong>This framework offers a reproducible methodology for evaluating health-care generative AI systems, establishing foundational processes that can inform future efforts while supporting the implementation of generative AI applications in clinical settings.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf054"},"PeriodicalIF":2.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12169418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computerized diagnostic decision support systems-Isabel Pro versus ChatGPT-4 part II. 计算机诊断决策支持系统- isabel Pro与ChatGPT-4第二部分。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-16 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf048
Joe M Bridges, Xiaoqian Jiang, Michael Ige, Oluwatoniloba Toyobo

Objective: Does a Tree-of-Thought prompt and reconsideration of Isabel Pro's differential improve ChatGPT-4's accuracy; does increasing expert panel size improve ChatGPT-4's accuracy; does ChatGPT-4 produce consistent outputs in sequential requests; what is the frequency of fabricated references?

Materials and methods: Isabel Pro, a computerized diagnostic decision support system, and ChatGPT-4, a large language model. Using 201 cases from the New England Journal of Medicine, each system produced a differential diagnosis ranked by likelihood. Statistics were Mean Reciprocal Rank, Recall at Rank, Average Rank, Number of Correct Diagnoses, and Rank Improvement. For reproducibility, the study compared the initial expert panel run to each subsequent run, using the r-squared calculation from a scatter plot of each run.

Results: ChatGPT-4 improved MRR and Recall at 10 to 0.72 but produced fewer correct diagnoses and lower average rank. Reconsideration of the Isabel Pro differential produced an improvement in Recall at 10 of 11%. The expert panel size of two produced the best result. The reproducibility runs were within 4% on average for Recall at 10, but the scatterplots showed an r-squared ranging from 0.44 to 034, suggesting poor reproducibility. Reference accuracy was 34.8% for citations and 37.8% for DOIs.

Discussion: ChatGPT-4 performs well with images and electrocardiography and in administrative practice management, but diagnosis has not proven as promising.

Conclusions: As noted above, the results demonstrate concerns for diagnostic accuracy, reproducibility, and reference citation accuracy. Until these issues are resolved, clinical usage for diagnosis will be minimal, if at all.

目的:思考树提示和重新考虑Isabel Pro的差异是否能提高ChatGPT-4的准确性;增加专家小组的规模是否能提高ChatGPT-4的准确性?ChatGPT-4是否在顺序请求中产生一致的输出;捏造参考文献的频率是多少?材料和方法:计算机诊断决策支持系统Isabel Pro和大型语言模型ChatGPT-4。使用来自《新英格兰医学杂志》(New England Journal of Medicine)的201例病例,每个系统都根据可能性进行了分类诊断。统计为平均互惠等级、等级召回率、平均等级、正确诊断数和等级改善。为了再现性,该研究比较了最初的专家小组运行和每次后续运行,使用从每次运行的散点图中计算的r平方。结果:ChatGPT-4提高了MRR和召回率在10到0.72之间,但产生的正确诊断较少,平均排名较低。重新考虑伊莎贝尔Pro的差异使召回率提高了10%(11%)。两个专家小组的规模产生了最好的结果。在召回率为10时,重复性运行平均在4%以内,但散点图显示r平方范围为0.44 ~ 034,表明重复性较差。引文的参考文献准确率为34.8%,doi的参考文献准确率为37.8%。讨论:ChatGPT-4在图像和心电图以及行政实践管理方面表现良好,但诊断尚未被证明有希望。结论:如上所述,结果表明了对诊断准确性、可重复性和参考文献引用准确性的关注。在这些问题得到解决之前,临床诊断的使用将是最小的,如果有的话。
{"title":"Computerized diagnostic decision support systems-Isabel Pro versus ChatGPT-4 part II.","authors":"Joe M Bridges, Xiaoqian Jiang, Michael Ige, Oluwatoniloba Toyobo","doi":"10.1093/jamiaopen/ooaf048","DOIUrl":"10.1093/jamiaopen/ooaf048","url":null,"abstract":"<p><strong>Objective: </strong>Does a Tree-of-Thought prompt and reconsideration of Isabel Pro's differential improve ChatGPT-4's accuracy; does increasing expert panel size improve ChatGPT-4's accuracy; does ChatGPT-4 produce consistent outputs in sequential requests; what is the frequency of fabricated references?</p><p><strong>Materials and methods: </strong>Isabel Pro, a computerized diagnostic decision support system, and ChatGPT-4, a large language model. Using 201 cases from the New England Journal of Medicine, each system produced a differential diagnosis ranked by likelihood. Statistics were Mean Reciprocal Rank, Recall at Rank, Average Rank, Number of Correct Diagnoses, and Rank Improvement. For reproducibility, the study compared the initial expert panel run to each subsequent run, using the r-squared calculation from a scatter plot of each run.</p><p><strong>Results: </strong>ChatGPT-4 improved MRR and Recall at 10 to 0.72 but produced fewer correct diagnoses and lower average rank. Reconsideration of the Isabel Pro differential produced an improvement in Recall at 10 of 11%. The expert panel size of two produced the best result. The reproducibility runs were within 4% on average for Recall at 10, but the scatterplots showed an r-squared ranging from 0.44 to 034, suggesting poor reproducibility. Reference accuracy was 34.8% for citations and 37.8% for DOIs.</p><p><strong>Discussion: </strong>ChatGPT-4 performs well with images and electrocardiography and in administrative practice management, but diagnosis has not proven as promising.</p><p><strong>Conclusions: </strong>As noted above, the results demonstrate concerns for diagnostic accuracy, reproducibility, and reference citation accuracy. Until these issues are resolved, clinical usage for diagnosis will be minimal, if at all.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf048"},"PeriodicalIF":2.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12169417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complexities and approaches for deriving longitudinal daily morphine milligram equivalents using electronic health record prescription data. 利用电子健康记录处方数据获得纵向每日吗啡毫克当量的复杂性和方法。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-16 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf053
Samantha H Chang, Shawn C Hirsch, Sonia M Thomas, Mark J Edlund, Rowena J Dolor, Timothy J Ives, Charlene M Dewey, Padma Gulur, Paul R Chelminski, Kristin R Archer, Li-Tzy Wu, Janis Curtis, Adam O Goldstein, Lauren A McCormack

Objective: To describe challenges and solutions for calculating longitudinal daily opioid dose in morphine milligram equivalents from electronic health record prescriptions for a clinical trial of voluntary opioid reduction in patients with chronic non-cancer pain.

Materials and methods: Researchers obtained opioid prescriptions for 525 participants from the National Patient-Centered Clinical Research Network datamart at three health systems. Daily opioid dose was calculated using dose conversions and summing across prescriptions after applying assumptions, reviewing suspect prescribing patterns, and removing spurious prescriptions.

Results: Out of 16 071 extracted prescriptions, 1207 (8%) were unusable, and 14 864 (92%) were analyzed.

Discussion: Numerous challenges were identified related to incomplete data, inaccurate refill dates, and overlapping or duplicate prescriptions.

Conclusion: Using electronic prescription data to calculate daily doses of opioid consumption is challenging and requires significant cleaning prior to use in research. This paper recommends steps to review and clean electronic opioid prescription data.

目的:描述在慢性非癌性疼痛患者自愿减少阿片类药物的临床试验中,从电子健康记录处方中计算以吗啡毫克当量为单位的纵向每日阿片类药物剂量的挑战和解决方案。材料和方法:研究人员从三个卫生系统的国家以患者为中心的临床研究网络数据中心获得了525名参与者的阿片类药物处方。每日阿片类药物剂量通过剂量转换计算,并在应用假设、审查可疑处方模式和去除虚假处方后对处方进行汇总。结果:在提取的16 071张处方中,有1207张(8%)不能使用,有14 864张(92%)被分析。讨论:确定了与数据不完整、补药日期不准确以及处方重叠或重复有关的许多挑战。结论:使用电子处方数据来计算阿片类药物的每日用量是具有挑战性的,在研究中使用前需要进行大量的清理。本文建议审查和清理电子阿片类药物处方数据的步骤。
{"title":"Complexities and approaches for deriving longitudinal daily morphine milligram equivalents using electronic health record prescription data.","authors":"Samantha H Chang, Shawn C Hirsch, Sonia M Thomas, Mark J Edlund, Rowena J Dolor, Timothy J Ives, Charlene M Dewey, Padma Gulur, Paul R Chelminski, Kristin R Archer, Li-Tzy Wu, Janis Curtis, Adam O Goldstein, Lauren A McCormack","doi":"10.1093/jamiaopen/ooaf053","DOIUrl":"10.1093/jamiaopen/ooaf053","url":null,"abstract":"<p><strong>Objective: </strong>To describe challenges and solutions for calculating longitudinal daily opioid dose in morphine milligram equivalents from electronic health record prescriptions for a clinical trial of voluntary opioid reduction in patients with chronic non-cancer pain.</p><p><strong>Materials and methods: </strong>Researchers obtained opioid prescriptions for 525 participants from the National Patient-Centered Clinical Research Network datamart at three health systems. Daily opioid dose was calculated using dose conversions and summing across prescriptions after applying assumptions, reviewing suspect prescribing patterns, and removing spurious prescriptions.</p><p><strong>Results: </strong>Out of 16 071 extracted prescriptions, 1207 (8%) were unusable, and 14 864 (92%) were analyzed.</p><p><strong>Discussion: </strong>Numerous challenges were identified related to incomplete data, inaccurate refill dates, and overlapping or duplicate prescriptions.</p><p><strong>Conclusion: </strong>Using electronic prescription data to calculate daily doses of opioid consumption is challenging and requires significant cleaning prior to use in research. This paper recommends steps to review and clean electronic opioid prescription data.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf053"},"PeriodicalIF":2.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12169419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time automated billing for tobacco treatment: developing and validating a scalable machine learning approach. 烟草治疗的实时自动计费:开发和验证可扩展的机器学习方法。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-12 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf039
Derek J Baughman, Layth Qassem, Lina Sulieman, Michael E Matheny, Daniel Fabbri, Hilary A Tindle, Aubrey Cole Goodman, Scott D Nelson, Adam Wright

Objectives: To develop CigStopper, a real-time, automated medical billing prototype designed to identify eligible tobacco cessation care codes, thereby reducing administrative workload while improving billing accuracy.

Materials and methods: ChatGPT prompt engineering generated a synthetic corpus of physician-style clinical notes categorized for CPT codes 99406/99407. Practicing clinicians annotated the dataset to train multiple machine learning (ML) models focused on accurately predicting billing code eligibility.

Results: Decision tree and random forest models performed best. Mean performance across all models: PRC AUC = 0.857, F1 score = 0.835. Generalizability testing on deidentified notes confirmed that tree-based models performed best.

Discussion: CigStopper shows promise for streamlining manual billing inefficiencies that hinder tobacco cessation care. ML methods lay the groundwork for clinical implementation based on good performance using synthetic data. Automating high-volume, low-value tasks simplify complexities in a multi-payer system and promote financial sustainability for healthcare practices.

Conclusion: CigStopper validates foundational methods for automating the discernment of appropriate billing codes for eligible smoking cessation counseling care.

目的:开发CigStopper,一种实时、自动化的医疗计费原型,旨在识别合格的戒烟护理代码,从而减少行政工作量,同时提高计费准确性。材料和方法:ChatGPT提示工程生成了一个医生风格的临床笔记合成语料库,分类为CPT代码99406/99407。执业临床医生对数据集进行注释,以训练多个机器学习(ML)模型,重点是准确预测计费代码的合格性。结果:决策树模型和随机森林模型效果最好。所有模型的平均性能:PRC AUC = 0.857, F1得分= 0.835。在未识别的笔记上进行的通用性测试证实,基于树的模型表现最好。讨论:CigStopper有望简化阻碍戒烟护理的低效率手动计费。基于合成数据的良好性能,ML方法为临床实施奠定了基础。自动化大容量、低价值的任务简化了多付款人系统的复杂性,并促进了医疗保健实践的财务可持续性。结论:CigStopper验证了自动识别合适的戒烟咨询护理账单代码的基本方法。
{"title":"Real-time automated billing for tobacco treatment: developing and validating a scalable machine learning approach.","authors":"Derek J Baughman, Layth Qassem, Lina Sulieman, Michael E Matheny, Daniel Fabbri, Hilary A Tindle, Aubrey Cole Goodman, Scott D Nelson, Adam Wright","doi":"10.1093/jamiaopen/ooaf039","DOIUrl":"10.1093/jamiaopen/ooaf039","url":null,"abstract":"<p><strong>Objectives: </strong>To develop CigStopper, a real-time, automated medical billing prototype designed to identify eligible tobacco cessation care codes, thereby reducing administrative workload while improving billing accuracy.</p><p><strong>Materials and methods: </strong>ChatGPT prompt engineering generated a synthetic corpus of physician-style clinical notes categorized for CPT codes 99406/99407. Practicing clinicians annotated the dataset to train multiple machine learning (ML) models focused on accurately predicting billing code eligibility.</p><p><strong>Results: </strong>Decision tree and random forest models performed best. Mean performance across all models: PRC AUC = 0.857, F1 score = 0.835. Generalizability testing on deidentified notes confirmed that tree-based models performed best.</p><p><strong>Discussion: </strong>CigStopper shows promise for streamlining manual billing inefficiencies that hinder tobacco cessation care. ML methods lay the groundwork for clinical implementation based on good performance using synthetic data. Automating high-volume, low-value tasks simplify complexities in a multi-payer system and promote financial sustainability for healthcare practices.</p><p><strong>Conclusion: </strong>CigStopper validates foundational methods for automating the discernment of appropriate billing codes for eligible smoking cessation counseling care.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf039"},"PeriodicalIF":2.5,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144286691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative analysis of large language models in clinical diagnosis: performance evaluation across common and complex medical cases. 大型语言模型在临床诊断中的比较分析:跨常见和复杂医疗病例的绩效评估。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-06-12 eCollection Date: 2025-06-01 DOI: 10.1093/jamiaopen/ooaf055
Mehmed T Dinc, Ali E Bardak, Furkan Bahar, Craig Noronha

Objectives: This study aimed to systematically evaluate and compare the diagnostic performance of leading large language models (LLMs) in common and complex clinical scenarios, assessing their potential for enhancing clinical reasoning and diagnostic accuracy in authentic clinical decision-making processes.

Materials and methods: Diagnostic capabilities of advanced LLMs (Anthropic's Claude, OpenAI's GPT variants, Google's Gemini) were assessed using 60 common cases and 104 complex, real-world cases from Clinical Problem Solvers' morning rounds. Clinical details were disclosed in stages, mirroring authentic clinical decision-making. Models were evaluated on primary and differential diagnosis accuracy at each stage.

Results: Advanced LLMs showed high diagnostic accuracy (>90%) in common scenarios, with Claude 3.7 achieving perfect accuracy (100%) in certain conditions. In complex cases, Claude 3.7 achieved the highest accuracy (83.3%) at the final diagnostic stage, significantly outperforming smaller models. Smaller models notably performed well in common scenarios, matching the performance of larger models.

Discussion: This study evaluated leading LLMs for diagnostic accuracy using staged information disclosure, mirroring real-world practice. Notably, Claude 3.7 Sonnet was the top performer. Employing a novel LLM-based evaluation method for large-scale analysis, the research highlights artificial intelligence's (AI's) potential to enhance diagnostics. It underscores the need for useful frameworks to translate accuracy into clinical impact and integrate AI into medical education.

Conclusion: Leading LLMs show remarkable diagnostic accuracy in diverse clinical cases. To fully realize their potential for improving patient care, we must now focus on creating practical implementation frameworks and translational research to integrate these powerful AI tools into medicine.

目的:本研究旨在系统地评估和比较主流大型语言模型(LLMs)在常见和复杂临床场景中的诊断性能,评估它们在真实临床决策过程中提高临床推理和诊断准确性的潜力。材料和方法:使用临床问题解决者上午查班的60例常见病例和104例复杂的真实病例,评估高级llm (Anthropic的Claude, OpenAI的GPT变体,b谷歌的Gemini)的诊断能力。临床细节分阶段披露,反映真实的临床决策。在每个阶段对模型进行初步和鉴别诊断的准确性评估。结果:高级LLMs在常见情况下具有较高的诊断准确率(bb0 90%), Claude 3.7在某些情况下具有完美的准确率(100%)。在复杂的病例中,Claude 3.7在最终诊断阶段达到了最高的准确率(83.3%),显著优于较小的模型。较小的模型在常见场景中表现良好,与较大模型的性能相匹配。讨论:本研究评估了领先的llm使用分阶段信息披露的诊断准确性,反映了现实世界的实践。值得注意的是,克劳德·十四行诗是表现最好的。该研究采用了一种新的基于llm的大规模分析评估方法,强调了人工智能(AI)在增强诊断方面的潜力。它强调需要有用的框架,将准确性转化为临床影响,并将人工智能纳入医学教育。结论:领先LLMs在不同的临床病例中具有显著的诊断准确性。为了充分发挥它们改善患者护理的潜力,我们现在必须专注于创建实用的实施框架和转化研究,将这些强大的人工智能工具整合到医学中。
{"title":"Comparative analysis of large language models in clinical diagnosis: performance evaluation across common and complex medical cases.","authors":"Mehmed T Dinc, Ali E Bardak, Furkan Bahar, Craig Noronha","doi":"10.1093/jamiaopen/ooaf055","DOIUrl":"10.1093/jamiaopen/ooaf055","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to systematically evaluate and compare the diagnostic performance of leading large language models (LLMs) in common and complex clinical scenarios, assessing their potential for enhancing clinical reasoning and diagnostic accuracy in authentic clinical decision-making processes.</p><p><strong>Materials and methods: </strong>Diagnostic capabilities of advanced LLMs (Anthropic's Claude, OpenAI's GPT variants, Google's Gemini) were assessed using 60 common cases and 104 complex, real-world cases from Clinical Problem Solvers' morning rounds. Clinical details were disclosed in stages, mirroring authentic clinical decision-making. Models were evaluated on primary and differential diagnosis accuracy at each stage.</p><p><strong>Results: </strong>Advanced LLMs showed high diagnostic accuracy (>90%) in common scenarios, with Claude 3.7 achieving perfect accuracy (100%) in certain conditions. In complex cases, Claude 3.7 achieved the highest accuracy (83.3%) at the final diagnostic stage, significantly outperforming smaller models. Smaller models notably performed well in common scenarios, matching the performance of larger models.</p><p><strong>Discussion: </strong>This study evaluated leading LLMs for diagnostic accuracy using staged information disclosure, mirroring real-world practice. Notably, Claude 3.7 Sonnet was the top performer. Employing a novel LLM-based evaluation method for large-scale analysis, the research highlights artificial intelligence's (AI's) potential to enhance diagnostics. It underscores the need for useful frameworks to translate accuracy into clinical impact and integrate AI into medical education.</p><p><strong>Conclusion: </strong>Leading LLMs show remarkable diagnostic accuracy in diverse clinical cases. To fully realize their potential for improving patient care, we must now focus on creating practical implementation frameworks and translational research to integrate these powerful AI tools into medicine.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 3","pages":"ooaf055"},"PeriodicalIF":2.5,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144286665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JAMIA Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1