Mayo Clinic Proceedings. Digital health最新文献_第10页

Accuracy of Smartwatch Pulse Oximetry Measurements in Hospitalized Patients With Coronavirus Disease 2019 2019 年冠状病毒疾病住院患者智能手表脉搏氧饱和度测量的准确性

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-26 DOI: 10.1016/j.mcpdig.2024.02.001

Kevin Rajakariar MBBS , Paul Buntine MBBS , Andrew Ghaly MBBS , Zheng Cheng Zhu MBBS , Vihangi Abeygunawardana MD , Sarah Visakhamoorthy MBBS , Patrick J. Owen PhD , Shaun Tham MD , Liam Hackett MPH , Louise Roberts PhD , Jithin K. Sajeev MBBS, PhD , Nicholas Jones MBBS , Andrew W. Teh MBBS, PhD

Objective

To assess the ability of 2 commercially available smartwatches to accurately detect clinically significant hypoxia in patients hospitalized with coronavirus-19 (COVID-19).

Patients and Methods

A prospective multicenter validation study was performed from November 1, 2021, to August 31, 2022, assessing the Apple Watch Series 7 and Withings ScanWatch inbuilt pulse oximetry, against simultaneous ward-based oximetry as the reference standard. Patients hospitalized with active COVID-19 infection not requiring intensive care admission were recruited.

Results

A total of 750 smartwatch pulse oximetry measurements and 400 ward oximetry readings were successfully obtained from 200 patients (male 54%, age 66±18 years). For the detection of clinically significant hypoxia, the Apple Watch had a sensitivity and specificity of 34.8% and 97.5%, respectively with a positive predictive value of 78.1% and negative predictive value of 85.6%. The Withings ScanWatch had a sensitivity and specificity of 68.5% and 80.8%, respectively with a positive predictive value of 44.7% and negative predictive value of 91.9%. The overall accuracy was 84.9% for the Apple Watch and 78.5% for the Withings ScanWatch. The Spearman rank correlation coefficients reported a moderate correlation to ward-based photoplethysmography (Apple: r_s=0.61; Withings: r_s=0.51, both P<.01).

Conclusion

Although smartwatches are able to provide SpO₂ readings, their overall accuracy may not be sufficient to replace the standard photoplethysmography technology in detecting hypoxia in patients with COVID-19.

患者和方法 2021 年 11 月 1 日至 2022 年 8 月 31 日进行了一项前瞻性多中心验证研究，评估 Apple Watch Series 7 和 Withings ScanWatch 内置脉搏血氧仪与作为参考标准的病房同步血氧仪的对比。结果从 200 名患者（男性占 54%，年龄为 66±18 岁）中成功获得了 750 个智能手表脉搏血氧仪测量值和 400 个病房血氧仪读数。Apple Watch 检测临床显著缺氧的灵敏度和特异度分别为 34.8% 和 97.5%，阳性预测值为 78.1%，阴性预测值为 85.6%。Withings ScanWatch 的灵敏度和特异性分别为 68.5% 和 80.8%，阳性预测值为 44.7%，阴性预测值为 91.9%。Apple Watch 和 Withings ScanWatch 的总体准确率分别为 84.9% 和 78.5%。斯皮尔曼等级相关系数显示，与病房光电血压计有一定的相关性（Apple：rs=0.61；Withings：rs=0.51，均为P<.01）。结论虽然智能手表能够提供 SpO2 读数，但其总体准确性可能不足以取代标准光电血压计技术来检测 COVID-19 患者的缺氧情况。

{"title":"Accuracy of Smartwatch Pulse Oximetry Measurements in Hospitalized Patients With Coronavirus Disease 2019","authors":"Kevin Rajakariar MBBS , Paul Buntine MBBS , Andrew Ghaly MBBS , Zheng Cheng Zhu MBBS , Vihangi Abeygunawardana MD , Sarah Visakhamoorthy MBBS , Patrick J. Owen PhD , Shaun Tham MD , Liam Hackett MPH , Louise Roberts PhD , Jithin K. Sajeev MBBS, PhD , Nicholas Jones MBBS , Andrew W. Teh MBBS, PhD","doi":"10.1016/j.mcpdig.2024.02.001","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.02.001","url":null,"abstract":"<div><h3>Objective</h3><p>To assess the ability of 2 commercially available smartwatches to accurately detect clinically significant hypoxia in patients hospitalized with coronavirus-19 (COVID-19).</p></div><div><h3>Patients and Methods</h3><p>A prospective multicenter validation study was performed from November 1, 2021, to August 31, 2022, assessing the Apple Watch Series 7 and Withings ScanWatch inbuilt pulse oximetry, against simultaneous ward-based oximetry as the reference standard. Patients hospitalized with active COVID-19 infection not requiring intensive care admission were recruited.</p></div><div><h3>Results</h3><p>A total of 750 smartwatch pulse oximetry measurements and 400 ward oximetry readings were successfully obtained from 200 patients (male 54%, age 66±18 years). For the detection of clinically significant hypoxia, the Apple Watch had a sensitivity and specificity of 34.8% and 97.5%, respectively with a positive predictive value of 78.1% and negative predictive value of 85.6%. The Withings ScanWatch had a sensitivity and specificity of 68.5% and 80.8%, respectively with a positive predictive value of 44.7% and negative predictive value of 91.9%. The overall accuracy was 84.9% for the Apple Watch and 78.5% for the Withings ScanWatch. The Spearman rank correlation coefficients reported a moderate correlation to ward-based photoplethysmography (Apple: r<sub>s</sub>=0.61; Withings: r<sub>s</sub>=0.51, both <em>P</em><.01).</p></div><div><h3>Conclusion</h3><p>Although smartwatches are able to provide SpO<sub>2</sub> readings, their overall accuracy may not be sufficient to replace the standard photoplethysmography technology in detecting hypoxia in patients with COVID-19.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 152-158"},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000105/pdfft?md5=dbf3bf07a6737561ec1ad6f4adb7fdcd&pid=1-s2.0-S2949761224000105-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139985368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging the Metaverse for Enhanced Longevity as a Component of Health 4.0 作为健康 4.0 的一个组成部分，利用 Metaverse 提高寿命

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-25 DOI: 10.1016/j.mcpdig.2024.01.007

Srinivasan S. Pillay MD , Patrick Candela BA , Ivana T. Croghan PhD , Ryan T. Hurt MD, PhD , Sara L. Bonnes MD, MS , Ravindra Ganesh MBBS, MD , Brent A. Bauer MD

In this review, we describe evidence that supports building a metaverse to promote healthy longevity. We propose that the metaverse offers several physical advantages (architecture, music, and nature), social (accessibility, affordability, community-building, and relief of social anxiety), and therapeutic (immersive, anti-inflammatory, and adjunctive use in complementary and integrative medicine). Lifelogging by patients may help clinicians personalize interventions by matching data to therapeutic outcomes. Although the metaverse cannot entirely replace our current model of care, a strategic approach will ensure adequate resource allocation and value assessment. In a collaborative effort between Reulay, Inc and Mayo Clinic, we are building a platform for the delivery of personalized and idiographic interventions to promote healthy longevity. To this end, we are using specific science-informed art design to reduce stress and anxiety for patients, with the progressive addition of integrated care elements that connect to this framework and connect treatment response to biomarkers that are relevant to healthy longevity. This review is a commentary on the thought process behind this effort.

在这篇综述中，我们描述了支持建立元宇宙以促进健康长寿的证据。我们认为，元宇宙具有多种物理优势（建筑、音乐和自然）、社交优势（可访问性、可负担性、社区建设和缓解社交焦虑）和治疗优势（身临其境、消炎和在补充和综合医学中的辅助使用）。患者的生活记录可以通过将数据与治疗结果相匹配，帮助临床医生进行个性化干预。虽然元宇宙不能完全取代我们目前的医疗模式，但战略性的方法将确保充分的资源分配和价值评估。在 Reulay 公司和梅奥诊所的合作努力中，我们正在建立一个平台，用于提供个性化和特异性干预，以促进健康长寿。为此，我们正在使用特定的科学艺术设计来减轻患者的压力和焦虑，并逐步增加与这一框架相关的综合护理元素，将治疗反应与与健康长寿相关的生物标志物联系起来。本评论是对这一努力背后的思考过程的评论。

{"title":"Leveraging the Metaverse for Enhanced Longevity as a Component of Health 4.0","authors":"Srinivasan S. Pillay MD , Patrick Candela BA , Ivana T. Croghan PhD , Ryan T. Hurt MD, PhD , Sara L. Bonnes MD, MS , Ravindra Ganesh MBBS, MD , Brent A. Bauer MD","doi":"10.1016/j.mcpdig.2024.01.007","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.007","url":null,"abstract":"<div><p>In this review, we describe evidence that supports building a metaverse to promote healthy longevity. We propose that the metaverse offers several physical advantages (architecture, music, and nature), social (accessibility, affordability, community-building, and relief of social anxiety), and therapeutic (immersive, anti-inflammatory, and adjunctive use in complementary and integrative medicine). Lifelogging by patients may help clinicians personalize interventions by matching data to therapeutic outcomes. Although the metaverse cannot entirely replace our current model of care, a strategic approach will ensure adequate resource allocation and value assessment. In a collaborative effort between Reulay, Inc and Mayo Clinic, we are building a platform for the delivery of personalized and idiographic interventions to promote healthy longevity. To this end, we are using specific science-informed art design to reduce stress and anxiety for patients, with the progressive addition of integrated care elements that connect to this framework and connect treatment response to biomarkers that are relevant to healthy longevity. This review is a commentary on the thought process behind this effort.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 139-151"},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000087/pdfft?md5=fe65aacfc7a505e1acf93b8a7a7b844e&pid=1-s2.0-S2949761224000087-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139944992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automated Approach for Diagnosing Allergic Contact Dermatitis Using Deep Learning to Support Democratization of Patch Testing 利用深度学习支持斑贴测试民主化的过敏性接触性皮炎自动诊断方法

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-23 DOI: 10.1016/j.mcpdig.2024.01.006

Matthew R. Hall MD , Alexander D. Weston PhD , Mikolaj A. Wieczorek BA , Misty M. Hobbs MD , Maria A. Caruso BA , Habeeba Siddiqui BA , Laura M. Pacheco-Spann MS , Johanny L. Lopez-Dominguez MD , Coralle Escoda-Diaz BA , Rickey E. Carter PhD , Charles J. Bruce MB, ChB

Objective

To develop a deep learning algorithm for the analysis of patch testing.

Patients and Methods

A retrospective case series between January 1, 2010, and December 31, 2020, was constructed to develop a deep learning model for the classification of patch test results from photographs. The performance of human expert readers reviewing the same photographs blinded to the original clinical physical examination findings was measured to benchmark model performance.

Results

Model performance on the independent test set (n=5070 test site locations from 37 patients) achieved an area under the receiver operating characteristic curve of 0.89 (95% CI, 0.86-0.91) and an F1 score of 37.1. The optimal cutoff had a sensitivity of 70.1% (136/194; 95% CI, 63.1%-76.5%) and a specificity of 91.7% (4472/4876; 95% CI, 90.9%-92.5%).

Conclusion

We demonstrated proof-of-concept utility for detecting allergic contact dermatitis using an automated deep learning approach.

目标开发一种用于分析斑贴测试的深度学习算法。患者与方法构建了2010年1月1日至2020年12月31日期间的回顾性病例系列，以开发一种深度学习模型，用于对照片中的斑贴测试结果进行分类。结果模型在独立测试集（来自 37 名患者的 5070 个测试部位）上的表现达到了接收器操作特征曲线下面积 0.89（95% CI，0.86-0.91）和 F1 分数 37.1。最佳临界值的灵敏度为 70.1%（136/194；95% CI，63.1%-76.5%），特异性为 91.7%（4472/4876；95% CI，90.9%-92.5%）。

{"title":"An Automated Approach for Diagnosing Allergic Contact Dermatitis Using Deep Learning to Support Democratization of Patch Testing","authors":"Matthew R. Hall MD , Alexander D. Weston PhD , Mikolaj A. Wieczorek BA , Misty M. Hobbs MD , Maria A. Caruso BA , Habeeba Siddiqui BA , Laura M. Pacheco-Spann MS , Johanny L. Lopez-Dominguez MD , Coralle Escoda-Diaz BA , Rickey E. Carter PhD , Charles J. Bruce MB, ChB","doi":"10.1016/j.mcpdig.2024.01.006","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.006","url":null,"abstract":"<div><h3>Objective</h3><p>To develop a deep learning algorithm for the analysis of patch testing.</p></div><div><h3>Patients and Methods</h3><p>A retrospective case series between January 1, 2010, and December 31, 2020, was constructed to develop a deep learning model for the classification of patch test results from photographs. The performance of human expert readers reviewing the same photographs blinded to the original clinical physical examination findings was measured to benchmark model performance.</p></div><div><h3>Results</h3><p>Model performance on the independent test set (n=5070 test site locations from 37 patients) achieved an area under the receiver operating characteristic curve of 0.89 (95% CI, 0.86-0.91) and an F1 score of 37.1. The optimal cutoff had a sensitivity of 70.1% (136/194; 95% CI, 63.1%-76.5%) and a specificity of 91.7% (4472/4876; 95% CI, 90.9%-92.5%).</p></div><div><h3>Conclusion</h3><p>We demonstrated proof-of-concept utility for detecting allergic contact dermatitis using an automated deep learning approach.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 131-138"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000075/pdfft?md5=ec00a6cd5a159970ea4a21e054923ea6&pid=1-s2.0-S2949761224000075-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139942544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Transformative Future for Health Care: On the First Year of Mayo Clinic Proceedings: Digital Health 医疗保健的变革性未来：梅奥诊所论文集的第一年：数字医疗

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-17 DOI: 10.1016/j.mcpdig.2024.02.002

Gianrico Farrugia MD

引用次数: 0

Differences Between Patient and Clinician-Taken Images: Implications for Virtual Care of Skin Conditions 患者和临床医生拍摄的图像之间的差异：皮肤病虚拟治疗的意义

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-15 DOI: 10.1016/j.mcpdig.2024.01.005

Rajeev V. Rikhye PhD , Grace Eunhae Hong BA , Preeti Singh MS , Margaret Ann Smith MBA , Aaron Loh MS , Vijaytha Muralidharan MD , Doris Wong BS , Rory Sayres PhD , Michelle Phung MS , Nicolas Betancourt MD , Bradley Fong BS , Rachna Sahasrabudhe BA , Khoban Nasim BS , Alec Eschholz BA , Yossi Matias PhD , Greg S. Corrado PhD , Katherine Chou MS , Dale R. Webster PhD , Peggy Bui MD, MBA , Yuan Liu PhD , Steven Lin MD

Objective

To understand and highlight the differences in clinical, demographic, and image quality characteristics between patient-taken (PAT) and clinic-taken (CLIN) photographs of skin conditions.

Patients and Methods

This retrospective study applied logistic regression to data from 2500 deidentified cases in Stanford Health Care’s eConsult system, from November 2015 to January 2021. Cases with undiagnosable or multiple conditions or cases with both patient and clinician image sources were excluded, leaving 628 PAT cases and 1719 CLIN cases. Demographic characteristic factors, such as age and sex were self-reported, whereas anatomic location, estimated skin type, clinical signs and symptoms, condition duration, and condition frequency were summarized from patient health records. Image quality variables such as blur, lighting issues and whether the image contained skin, hair, or nails were estimated through a deep learning model.

Results

Factors that were positively associated with CLIN photographs, post-2020 were as follows: age 60 years or older, darker skin types (eFST V/VI), and presence of skin growths. By contrast, factors that were positively associated with PAT photographs include conditions appearing intermittently, cases with blurry photographs, photographs with substantial nonskin (or nail/hair) regions and cases with more than 3 photographs. Within the PAT cohort, older age was associated with blurry photographs.

Conclusion

There are various demographic, clinical, and image quality characteristic differences between PAT and CLIN photographs of skin concerns. The demographic characteristic differences present important considerations for improving digital literacy or access, whereas the image quality differences point to the need for improved patient education and better image capture workflows, particularly among elderly patients.

目的了解并强调患者拍摄的皮肤状况照片（PAT）和临床医生拍摄的皮肤状况照片（CLIN）在临床、人口统计学和图像质量特征方面的差异。患者和方法这项回顾性研究对斯坦福医疗保健公司 eConsult 系统中 2015 年 11 月至 2021 年 1 月的 2500 个去标识化病例的数据进行了逻辑回归。排除了无法诊断或患有多种疾病的病例，或同时具有患者和临床医生图片来源的病例，剩下 628 个 PAT 病例和 1719 个 CLIN 病例。年龄和性别等人口统计学特征因素由患者自我报告，而解剖位置、估计皮肤类型、临床症状和体征、病程和发病频率则由患者健康记录汇总。图像质量变量，如模糊、光照问题以及图像是否包含皮肤、毛发或指甲，则通过深度学习模型进行估算。结果2020年后与CLIN照片呈正相关的因素如下：60岁或以上、深色皮肤类型（eFST V/VI）以及存在皮肤增生。相比之下，与 PAT 照片呈正相关的因素包括间歇性出现的病症、照片模糊的病例、有大量非皮肤（或指甲/头发）区域的照片以及有 3 张以上照片的病例。结论：PAT 和 CLIN 皮肤照片在人口统计学、临床和图像质量特征方面存在各种差异。人口统计学特征差异是提高数字扫盲或数字访问的重要考虑因素，而图像质量差异则表明需要加强患者教育和改进图像采集工作流程，尤其是在老年患者中。

{"title":"Differences Between Patient and Clinician-Taken Images: Implications for Virtual Care of Skin Conditions","authors":"Rajeev V. Rikhye PhD , Grace Eunhae Hong BA , Preeti Singh MS , Margaret Ann Smith MBA , Aaron Loh MS , Vijaytha Muralidharan MD , Doris Wong BS , Rory Sayres PhD , Michelle Phung MS , Nicolas Betancourt MD , Bradley Fong BS , Rachna Sahasrabudhe BA , Khoban Nasim BS , Alec Eschholz BA , Yossi Matias PhD , Greg S. Corrado PhD , Katherine Chou MS , Dale R. Webster PhD , Peggy Bui MD, MBA , Yuan Liu PhD , Steven Lin MD","doi":"10.1016/j.mcpdig.2024.01.005","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.005","url":null,"abstract":"<div><h3>Objective</h3><p>To understand and highlight the differences in clinical, demographic, and image quality characteristics between patient-taken (PAT) and clinic-taken (CLIN) photographs of skin conditions.</p></div><div><h3>Patients and Methods</h3><p>This retrospective study applied logistic regression to data from 2500 deidentified cases in Stanford Health Care’s eConsult system, from November 2015 to January 2021. Cases with undiagnosable or multiple conditions or cases with both patient and clinician image sources were excluded, leaving 628 PAT cases and 1719 CLIN cases. Demographic characteristic factors, such as age and sex were self-reported, whereas anatomic location, estimated skin type, clinical signs and symptoms, condition duration, and condition frequency were summarized from patient health records. Image quality variables such as blur, lighting issues and whether the image contained skin, hair, or nails were estimated through a deep learning model.</p></div><div><h3>Results</h3><p>Factors that were positively associated with CLIN photographs, post-2020 were as follows: age 60 years or older, darker skin types (eFST V/VI), and presence of skin growths. By contrast, factors that were positively associated with PAT photographs include conditions appearing intermittently, cases with blurry photographs, photographs with substantial nonskin (or nail/hair) regions and cases with more than 3 photographs. Within the PAT cohort, older age was associated with blurry photographs.</p></div><div><h3>Conclusion</h3><p>There are various demographic, clinical, and image quality characteristic differences between PAT and CLIN photographs of skin concerns. The demographic characteristic differences present important considerations for improving digital literacy or access, whereas the image quality differences point to the need for improved patient education and better image capture workflows, particularly among elderly patients.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 107-118"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000063/pdfft?md5=b6821d4312bb7e3ec9c3c66208aec937&pid=1-s2.0-S2949761224000063-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Untapped Potential of Artificial Intelligence for Analysis of Epileptic Seizure Videos: A Clinician’s Expectation 人工智能在分析癫痫发作视频方面尚未开发的潜力：临床医生的期望

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-15 DOI: 10.1016/j.mcpdig.2024.01.004

Naotaka Usui MD, PhD

引用次数: 0

Beyond Atrial Fibrillation: Machine Learning Algorithm Predicts Stroke in Adult Patients With Congenital Heart Disease 超越心房颤动：机器学习算法可预测先天性心脏病成人患者的中风情况

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-15 DOI: 10.1016/j.mcpdig.2023.12.002

Anca Chiriac MD, PhD , Che Ngufor PhD , Holly K. van Houten BA , Raphael Mwangi MS , Malini Madhavan MBBS , Peter A. Noseworthy MD , Samuel J. Asirvatham MD , Sabrina D. Phillips MD , Christopher J. McLeod MB ChB, PhD

Objective

To develop and validate a robust risk prediction model for stroke and systemic embolism (SSE) in adult patients with congenital heart disease (ACHD), using artificial intelligence.

Patients and Methods

Deidentified insurance claims from the Optum Labs Data Warehouse, including enrollment records and medical and pharmacy claims for commercial and Medicare Advantage enrollees, were used to identify 49,276 patients with ACHD, followed between January 1, 2009, and December 31, 2014. The group was randomly divided into development (70%) and validation (30%) cohorts. The development cohort was used to train 2 machine learning (ML) algorithms, regularized Cox regression (RegCox), and extreme gradient boosting (XGBoost) to predict SSE at 1, 2, and 5 years. The Shapley additive explanations (SHAP) model was used to identify the variables particularly driving the SSE risk.

Results

Within this large and diverse cohort of patients with ACHD (mean age, 59 ± 19 years; 25,390 (51.5%) female, 35,766 [77.6%]) white), 1756 (3.6%) patients experienced SSE during follow-up. In the Validation cohort, CHA₂DS₂-VASC had an area under the receiver operating characteristics curve (AUC) of 0.66 for predicting SSE at 1-,2, and 5-years. RegCox had the best predictive performance, with AUCs of 0.82,.81, and.80 at 1-, 2, and 5-years. XGBoost had AUCs of 0.81, 0.80, and 0.79 respectively. Atrial septal defect (ASD) emerged as an important predictor for SSE uncovered by the unbiased ML algorithms. A new clinical risk score, the CHA₂DS₂-VASC-ASD₂ score, provides improved SSE prediction in ACHD. Yet, the ML models still outperformed this.

Conclusion

ML models significantly outperformed the clinical risk scores in patients with ACHD.

患者和方法从 Optum 实验室数据仓库（Optum Labs Data Warehouse）中识别保险索赔，包括商业保险和医疗保险优势参保者的参保记录、医疗和药房索赔，用于识别 49276 名先天性心脏病（ACHD）患者，随访时间为 2009 年 1 月 1 日至 2014 年 12 月 31 日。该群体被随机分为开发队列（70%）和验证队列（30%）。开发组群用于训练两种机器学习（ML）算法：正则化 Cox 回归（RegCox）和极梯度提升（XGBoost），以预测 1 年、2 年和 5 年的 SSE。结果在这个庞大而多样化的 ACHD 患者队列（平均年龄为 59 ± 19 岁；25,390（51.5%）名女性，35,766 [77.6%]）中，有 1756（3.6%）名患者在随访期间发生了 SSE。在验证队列中，CHA2DS2-VASC 预测 1 年、2 年和 5 年 SSE 的接收器操作特征曲线下面积 (AUC) 为 0.66。RegCox 的预测效果最好，1 年、2 年和 5 年的 AUC 分别为 0.82、0.81 和 0.80。XGBoost 的 AUC 分别为 0.81、0.80 和 0.79。无偏 ML 算法发现，房间隔缺损（ASD）是 SSE 的重要预测因素。一种新的临床风险评分--CHA2DS2-VASC-ASD2 评分--改进了对 ACHD 患者 SSE 的预测。结论ML 模型在 ACHD 患者中的表现明显优于临床风险评分。

{"title":"Beyond Atrial Fibrillation: Machine Learning Algorithm Predicts Stroke in Adult Patients With Congenital Heart Disease","authors":"Anca Chiriac MD, PhD , Che Ngufor PhD , Holly K. van Houten BA , Raphael Mwangi MS , Malini Madhavan MBBS , Peter A. Noseworthy MD , Samuel J. Asirvatham MD , Sabrina D. Phillips MD , Christopher J. McLeod MB ChB, PhD","doi":"10.1016/j.mcpdig.2023.12.002","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2023.12.002","url":null,"abstract":"<div><h3>Objective</h3><p>To develop and validate a robust risk prediction model for stroke and systemic embolism (SSE) in adult patients with congenital heart disease (ACHD), using artificial intelligence.</p></div><div><h3>Patients and Methods</h3><p>Deidentified insurance claims from the Optum Labs Data Warehouse, including enrollment records and medical and pharmacy claims for commercial and Medicare Advantage enrollees, were used to identify 49,276 patients with ACHD, followed between January 1, 2009, and December 31, 2014. The group was randomly divided into development (70%) and validation (30%) cohorts. The development cohort was used to train 2 machine learning (ML) algorithms, regularized Cox regression (RegCox), and extreme gradient boosting (XGBoost) to predict SSE at 1, 2, and 5 years. The Shapley additive explanations (SHAP) model was used to identify the variables particularly driving the SSE risk.</p></div><div><h3>Results</h3><p>Within this large and diverse cohort of patients with ACHD (mean age, 59 ± 19 years; 25,390 (51.5%) female, 35,766 [77.6%]) white), 1756 (3.6%) patients experienced SSE during follow-up. In the Validation cohort, CHA<sub>2</sub>DS<sub>2</sub>-VASC had an area under the receiver operating characteristics curve (AUC) of 0.66 for predicting SSE at 1-,2, and 5-years. RegCox had the best predictive performance, with AUCs of 0.82,.81, and.80 at 1-, 2, and 5-years. XGBoost had AUCs of 0.81, 0.80, and 0.79 respectively. Atrial septal defect (ASD) emerged as an important predictor for SSE uncovered by the unbiased ML algorithms. A new clinical risk score, the CHA<sub>2</sub>DS<sub>2</sub>-VASC-ASD<sub>2</sub> score, provides improved SSE prediction in ACHD. Yet, the ML models still outperformed this.</p></div><div><h3>Conclusion</h3><p>ML models significantly outperformed the clinical risk scores in patients with ACHD.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 92-103"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000026/pdfft?md5=c34fed3977be03552486d0740a93fe5f&pid=1-s2.0-S2949761224000026-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model 基于在线聊天的人工智能模型提供的眼科建议的适宜性

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-15 DOI: 10.1016/j.mcpdig.2024.01.003

Prashant D. Tailor MD , Timothy T. Xu MD , Blake H. Fortes MD , Raymond Iezzi MD , Timothy W. Olsen MD , Matthew R. Starr MD , Sophie J. Bakri MD , Brittni A. Scruggs MD, PhD , Andrew J. Barkmeier MD , Sanjay V. Patel MD , Keith H. Baratz MD , Ashlie A. Bernhisel MD , Lilly H. Wagner MD , Andrea A. Tooley MD , Gavin W. Roddy MD, PhD , Arthur J. Sit MD , Kristi Y. Wu MD , Erick D. Bothun MD , Sasha A. Mansukhani MBBS , Brian G. Mohney MD , Lauren A. Dalvin MD

Objective

To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.

Patients and Methods

Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.

Results

For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.

Conclusion

This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

患者和方法2023 年 4 月 1 日至 2023 年 4 月 30 日进行的横断面定性研究。共生成 192 个问题，涵盖所有眼科亚专科。每个问题都向大语言模型（LLM）提出 3 次。相应的亚专科医生在两个分级情境中将回答分为适当、不适当或不可靠。第一种分级情境是信息是否出现在患者信息网站上。第二种是由 LLM 生成的对电子病历 (EMR) 发送的患者询问的回复草稿。适当被定义为足够准确和具体，可以作为医生批准信息的替代。主要结果指标为每个亚专科的适当回复百分比。结果对于与患者信息网站相关的问题，LLM 提供的总体平均适当回复率为 79%。各眼科亚专科对患者信息网站信息的平均合适率从 56% 到 100% 不等：白内障或屈光（92%）、角膜（56%）、青光眼（72%）、神经眼科（67%）、眼部整形或眼眶手术（80%）、眼部肿瘤（100%）、儿科（89%）、玻璃体视网膜疾病（86%）和葡萄膜炎（65%）。对于通过电子病历回答患者问题的草稿，法律硕士平均提供了 74% 的适当答复，并因亚专科而异：白内障或屈光（85%）、角膜（54%）、青光眼（77%）、神经眼科（63%）、眼部整形或眼眶手术（62%）、眼部肿瘤（90%）、儿科（94%）、玻璃体视网膜疾病（88%）和葡萄膜炎（55%）。对健康信息类别（疾病和病情、风险和预防、手术相关以及治疗和管理）的分层评分显示出显著但不明显的差异，在这两种情况下，疾病和病情的适当性往往被评为最高（72%和69%），而手术相关的适当性则被评为最低（55%和51%）。在临床广泛使用之前，目前的 LLM 产品需要优化和改进。

{"title":"Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model","authors":"Prashant D. Tailor MD , Timothy T. Xu MD , Blake H. Fortes MD , Raymond Iezzi MD , Timothy W. Olsen MD , Matthew R. Starr MD , Sophie J. Bakri MD , Brittni A. Scruggs MD, PhD , Andrew J. Barkmeier MD , Sanjay V. Patel MD , Keith H. Baratz MD , Ashlie A. Bernhisel MD , Lilly H. Wagner MD , Andrea A. Tooley MD , Gavin W. Roddy MD, PhD , Arthur J. Sit MD , Kristi Y. Wu MD , Erick D. Bothun MD , Sasha A. Mansukhani MBBS , Brian G. Mohney MD , Lauren A. Dalvin MD","doi":"10.1016/j.mcpdig.2024.01.003","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.003","url":null,"abstract":"<div><h3>Objective</h3><p>To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.</p></div><div><h3>Patients and Methods</h3><p>Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.</p></div><div><h3>Results</h3><p>For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.</p></div><div><h3>Conclusion</h3><p>This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 119-128"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294976122400004X/pdfft?md5=5523855f19c376cfc730f0de31cbe918&pid=1-s2.0-S294976122400004X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence Detection and Segmentation Models: A Systematic Review and Meta-Analysis of Brain Tumors in Magnetic Resonance Imaging 人工智能检测和分割模型：磁共振成像中脑肿瘤的系统回顾和元分析

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-04 DOI: 10.1016/j.mcpdig.2024.01.002

Ting-Wei Wang MD, PhD , Yu-Chieh Shiao MD , Jia-Sheng Hong PhD , Wei-Kai Lee PhD , Ming-Sheng Hsu MD , Hao-Min Cheng MD, PhD , Huai-Che Yang MD, PhD , Cheng-Chia Lee MD, PhD , Hung-Chuan Pan MD, PhD , Weir Chiang You MD, PhD , Jiing-Feng Lirng MD , Wan-Yuo Guo MD, PhD , Yu-Te Wu PhD

Objective

To thoroughly analyze factors affecting the generalization ability of deep learning algorithms on brain tumor detection and segmentation models.

Patients and Methods

We searched PubMed, Embase, Web of Science, Cochrane Library, and IEEE from inception to July 25, 2023, and 19 studies with 12,000 patients were identified. The criteria required studies to use magnetic resonance imaging (MRI) for brain tumor detection and segmentation, offer clear performance metrics, and use external validation data sets. The study focused on outcomes such as sensitivity and Dice score. Study quality was assessed using QUADAS-2 and CLAIM tools. The meta-analysis evaluated varying algorithms and their performance across different validation data sets.

Results

MRI hardware as the manufacturer may contribute to data set diversity, impacting AI model generalizability. The study found that the best algorithms had a pooled lesion-wise Dice score of 84%, with pooled sensitivities of 87% (patient-wise) and 86% (lesion-wise). Post-2022 methodologies highlighted evolving artificial intelligence techniques. Performance differences were evident among tumor types, likely due to size disparities. 3D models outperformed their 2D and ensemble counterparts in detection. Although specific preprocessing techniques improved segmentation outcomes, some hindered detection.

Conclusion

The study underscores the potential of deep learning in improving brain tumor diagnostics and treatment planning. We also identify the need for further research, including developing a comprehensive diversity index, expanded meta-analyses, and using generative adversarial networks for data diversification, paving the way for AI-driven advancements in oncological patient care.

Trial Registration

PROPERO (CRD42023459108).

目的深入分析影响深度学习算法对脑肿瘤检测和分割模型的泛化能力的因素。患者和方法我们检索了从开始到2023年7月25日的PubMed、Embase、Web of Science、Cochrane Library和IEEE，共发现19项研究，涉及12000名患者。研究标准要求研究使用磁共振成像（MRI）进行脑肿瘤检测和分割，提供明确的性能指标，并使用外部验证数据集。研究重点关注灵敏度和 Dice 评分等结果。研究质量采用 QUADAS-2 和 CLAIM 工具进行评估。荟萃分析评估了不同算法及其在不同验证数据集上的性能。结果MRI 硬件作为制造商可能会导致数据集的多样性，从而影响人工智能模型的通用性。研究发现，最佳算法在病灶方面的集合 Dice 得分为 84%，集合敏感度为 87%（患者方面）和 86%（病灶方面）。2022 年后的方法突出了不断发展的人工智能技术。不同肿瘤类型的性能差异明显，这可能是由于肿瘤大小不同造成的。三维模型的检测结果优于二维模型和集合模型。虽然特定的预处理技术提高了分割结果，但有些技术却阻碍了检测。我们还发现了进一步研究的必要性，包括开发全面的多样性指数、扩大荟萃分析以及使用生成对抗网络进行数据多样化，从而为人工智能驱动的肿瘤患者护理进步铺平道路。

{"title":"Artificial Intelligence Detection and Segmentation Models: A Systematic Review and Meta-Analysis of Brain Tumors in Magnetic Resonance Imaging","authors":"Ting-Wei Wang MD, PhD , Yu-Chieh Shiao MD , Jia-Sheng Hong PhD , Wei-Kai Lee PhD , Ming-Sheng Hsu MD , Hao-Min Cheng MD, PhD , Huai-Che Yang MD, PhD , Cheng-Chia Lee MD, PhD , Hung-Chuan Pan MD, PhD , Weir Chiang You MD, PhD , Jiing-Feng Lirng MD , Wan-Yuo Guo MD, PhD , Yu-Te Wu PhD","doi":"10.1016/j.mcpdig.2024.01.002","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.002","url":null,"abstract":"<div><h3>Objective</h3><p>To thoroughly analyze factors affecting the generalization ability of deep learning algorithms on brain tumor detection and segmentation models.</p></div><div><h3>Patients and Methods</h3><p>We searched PubMed, Embase, Web of Science, Cochrane Library, and IEEE from inception to July 25, 2023, and 19 studies with 12,000 patients were identified. The criteria required studies to use magnetic resonance imaging (MRI) for brain tumor detection and segmentation, offer clear performance metrics, and use external validation data sets. The study focused on outcomes such as sensitivity and Dice score. Study quality was assessed using QUADAS-2 and CLAIM tools. The meta-analysis evaluated varying algorithms and their performance across different validation data sets.</p></div><div><h3>Results</h3><p>MRI hardware as the manufacturer may contribute to data set diversity, impacting AI model generalizability. The study found that the best algorithms had a pooled lesion-wise Dice score of 84%, with pooled sensitivities of 87% (patient-wise) and 86% (lesion-wise). Post-2022 methodologies highlighted evolving artificial intelligence techniques. Performance differences were evident among tumor types, likely due to size disparities. 3D models outperformed their 2D and ensemble counterparts in detection. Although specific preprocessing techniques improved segmentation outcomes, some hindered detection.</p></div><div><h3>Conclusion</h3><p>The study underscores the potential of deep learning in improving brain tumor diagnostics and treatment planning. We also identify the need for further research, including developing a comprehensive diversity index, expanded meta-analyses, and using generative adversarial networks for data diversification, paving the way for AI-driven advancements in oncological patient care.</p></div><div><h3>Trial Registration</h3><p>PROPERO (CRD42023459108).</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 75-91"},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000038/pdfft?md5=462accb0c195aebed809efe8ef0de1df&pid=1-s2.0-S2949761224000038-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139675998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records 通过电子健康记录的自然语言处理识别甲状腺超声检查适宜性

Mayo Clinic Proceedings. Digital health

Pub Date : 2024-02-01 DOI: 10.1016/j.mcpdig.2024.01.001

Cristian Soto Jacome MD , Danny Segura Torres MD , Jungwei W. Fan PhD , Ricardo Loor-Torres MD , Mayra Duran MD , Misk Al Zahidy MS , Esteban Cabezas MD , Mariana Borras-Osorio MD , David Toro-Tobon MD , Yuqi Wu PhD , Yonghui Wu PhD , Naykky Singh Ospina MD, MS , Juan P. Brito MD, MS

Objective

To address thyroid cancer overdiagnosis, we aim to develop a natural language processing (NLP) algorithm to determine the appropriateness of thyroid ultrasounds (TUS).

Patients and Methods

Between 2017 and 2021, we identified 18,000 TUS patients at Mayo Clinic and selected 628 for chart review to create a ground truth dataset based on consensus. We developed a rule-based NLP pipeline to identify TUS as appropriate TUS (aTUS) or inappropriate TUS (iTUS) using patients’ clinical notes and additional meta information. In addition, we designed an abbreviated NLP pipeline (aNLP) solely focusing on labels from TUS order requisitions to facilitate deployment at other health care systems. Our dataset was split into a training set of 468 (75%) and a test set of 160 (25%), using the former for rule development and the latter for performance evaluation.

Results

There were 449 (95.9%) patients identified as aTUS and 19 (4.06%) as iTUS in the training set; there are 155 (96.88%) patients identified as aTUS and 5 (3.12%) were iTUS in the test set. In the training set, the pipeline achieved a sensitivity of 0.99, specificity of 0.95, and positive predictive value of 1.0 for detecting aTUS. The testing cohort revealed a sensitivity of 0.96, specificity of 0.80, and positive predictive value of 0.99. Similar performance metrics were observed in the aNLP pipeline.

Conclusion

The NLP models can accurately identify the appropriateness of a thyroid ultrasound from clinical documentation and order requisition information, a critical initial step toward evaluating the drivers and outcomes of TUS use and subsequent thyroid cancer overdiagnosis.

目标为解决甲状腺癌过度诊断问题，我们旨在开发一种自然语言处理（NLP）算法，以确定甲状腺超声检查（TUS）的适当性。患者和方法2017年至2021年期间，我们在梅奥诊所确定了18000名TUS患者，并选择了628名患者进行病历审查，以创建基于共识的基本真实数据集。我们开发了基于规则的 NLP 管道，利用患者的临床笔记和其他元信息将 TUS 识别为合适的 TUS（aTUS）或不合适的 TUS（iTUS）。此外，我们还设计了一个简略的 NLP 管道 (aNLP)，仅关注 TUS 订单申请单中的标签，以方便在其他医疗系统中部署。我们的数据集分为 468 个训练集（占 75%）和 160 个测试集（占 25%），前者用于规则开发，后者用于性能评估。结果在训练集中，有 449 名（95.9%）患者被识别为 aTUS，19 名（4.06%）被识别为 iTUS；在测试集中，有 155 名（96.88%）患者被识别为 aTUS，5 名（3.12%）被识别为 iTUS。在训练集中，管道检测 aTUS 的灵敏度为 0.99，特异度为 0.95，阳性预测值为 1.0。测试组的灵敏度为 0.96，特异性为 0.80，阳性预测值为 0.99。结论：NLP 模型可以从临床文件和请购单信息中准确识别甲状腺超声检查的适当性，这是评估甲状腺超声检查使用及随后甲状腺癌过度诊断的驱动因素和结果的关键性第一步。

{"title":"Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records","authors":"Cristian Soto Jacome MD , Danny Segura Torres MD , Jungwei W. Fan PhD , Ricardo Loor-Torres MD , Mayra Duran MD , Misk Al Zahidy MS , Esteban Cabezas MD , Mariana Borras-Osorio MD , David Toro-Tobon MD , Yuqi Wu PhD , Yonghui Wu PhD , Naykky Singh Ospina MD, MS , Juan P. Brito MD, MS","doi":"10.1016/j.mcpdig.2024.01.001","DOIUrl":"https://doi.org/10.1016/j.mcpdig.2024.01.001","url":null,"abstract":"<div><h3>Objective</h3><p>To address thyroid cancer overdiagnosis, we aim to develop a natural language processing (NLP) algorithm to determine the appropriateness of thyroid ultrasounds (TUS).</p></div><div><h3>Patients and Methods</h3><p>Between 2017 and 2021, we identified 18,000 TUS patients at Mayo Clinic and selected 628 for chart review to create a ground truth dataset based on consensus. We developed a rule-based NLP pipeline to identify TUS as appropriate TUS (aTUS) or inappropriate TUS (iTUS) using patients’ clinical notes and additional meta information. In addition, we designed an abbreviated NLP pipeline (aNLP) solely focusing on labels from TUS order requisitions to facilitate deployment at other health care systems. Our dataset was split into a training set of 468 (75%) and a test set of 160 (25%), using the former for rule development and the latter for performance evaluation.</p></div><div><h3>Results</h3><p>There were 449 (95.9%) patients identified as aTUS and 19 (4.06%) as iTUS in the training set; there are 155 (96.88%) patients identified as aTUS and 5 (3.12%) were iTUS in the test set. In the training set, the pipeline achieved a sensitivity of 0.99, specificity of 0.95, and positive predictive value of 1.0 for detecting aTUS. The testing cohort revealed a sensitivity of 0.96, specificity of 0.80, and positive predictive value of 0.99. Similar performance metrics were observed in the aNLP pipeline.</p></div><div><h3>Conclusion</h3><p>The NLP models can accurately identify the appropriateness of a thyroid ultrasound from clinical documentation and order requisition information, a critical initial step toward evaluating the drivers and outcomes of TUS use and subsequent thyroid cancer overdiagnosis.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 67-74"},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000014/pdfft?md5=b25e9a7547bfbd148935d7e81234eadb&pid=1-s2.0-S2949761224000014-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139674437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0