Large language models can accurately populate Vascular Quality Initiative procedural databases using narrative operative reports.

IF 3.9 2区 医学 Q1 PERIPHERAL VASCULAR DISEASE Journal of Vascular Surgery Pub Date : 2024-12-16 DOI:10.1016/j.jvs.2024.12.002
Colleen P Flanagan, Karen Trang, Joyce Nacario, Peter A Schneider, Warren J Gasper, Michael S Conte, Elizabeth C Wick, Allan M Conway
{"title":"Large language models can accurately populate Vascular Quality Initiative procedural databases using narrative operative reports.","authors":"Colleen P Flanagan, Karen Trang, Joyce Nacario, Peter A Schneider, Warren J Gasper, Michael S Conte, Elizabeth C Wick, Allan M Conway","doi":"10.1016/j.jvs.2024.12.002","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Participation in the Vascular Quality Initiative (VQI) provides important resources to surgeons, but the ability to do so is often limited by time and data entry personnel. Large language models (LLMs) such as ChatGPT (OpenAI) are examples of generative artificial intelligence products that may help bridge this gap. Trained on large volumes of data, the models are used for natural language processing and text generation. We evaluated the ability of LLMs to accurately populate VQI procedural databases using operative reports.</p><p><strong>Methods: </strong>A single-center, retrospective study was performed using institutional VQI data from 2021 to 2023. The most recent procedures for carotid endarterectomy (CEA), endovascular aneurysm repair (EVAR), and infrainguinal lower extremity bypass (LEB) were analyzed using Versa, a HIPAA (Health Insurance Portability and Accountability Act)-compliant institutional version of ChatGPT. We created an automated function to analyze operative reports and generate a shareable VQI file using two models: gpt-35-turbo and gpt-4. Application of the LLMs was accomplished with a cloud-based programming interface. The outputs of this model were compared with VQI data for accuracy. We defined a metric as \"unavailable\" to the LLM if it was discussed by surgeons in <20% of operative reports.</p><p><strong>Results: </strong>A total of 150 operative notes were analyzed, including 50 CEA, 50 EVAR, and 50 LEB. These procedural VQI databases included 25, 179, and 51 metrics, respectively. For all fields, gpt-35-turbo had a median accuracy of 84.0% for CEA (interquartile range [IQR]: 80.0%-88.0%), 92.2% for EVAR (IQR: 87.2%-94.0%), and 84.3% for LEB (IQR: 80.2%-88.1%). A total of 3 of 25, 6 of 179, and 7 of 51 VQI variables were unavailable in the operative reports, respectively. Excluding metric information routinely unavailable in operative reports, the median accuracy rate was 95.5% for each CEA procedure (IQR: 90.9%-100.0%), 94.8% for EVAR (IQR: 92.2%-98.5%), and 93.2% for LEB (IQR: 90.2%-96.4%). Across procedures, gpt-4 did not meaningfully improve performance compared with gpt-35 (P = .97, .85, and .95 for CEA, EVAR, and LEB overall performance, respectively). The cost for 150 operative reports analyzed with gpt-35-turbo and gpt-4 was $0.12 and $3.39, respectively.</p><p><strong>Conclusions: </strong>LLMs can accurately populate VQI procedural databases with both structured and unstructured data, while incurring only minor processing costs. Increased workflow efficiency may improve center ability to successfully participate in the VQI. Further work examining other VQI databases and methods to increase accuracy is needed.</p>","PeriodicalId":17475,"journal":{"name":"Journal of Vascular Surgery","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Vascular Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvs.2024.12.002","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PERIPHERAL VASCULAR DISEASE","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Participation in the Vascular Quality Initiative (VQI) provides important resources to surgeons, but the ability to do so is often limited by time and data entry personnel. Large language models (LLMs) such as ChatGPT (OpenAI) are examples of generative artificial intelligence products that may help bridge this gap. Trained on large volumes of data, the models are used for natural language processing and text generation. We evaluated the ability of LLMs to accurately populate VQI procedural databases using operative reports.

Methods: A single-center, retrospective study was performed using institutional VQI data from 2021 to 2023. The most recent procedures for carotid endarterectomy (CEA), endovascular aneurysm repair (EVAR), and infrainguinal lower extremity bypass (LEB) were analyzed using Versa, a HIPAA (Health Insurance Portability and Accountability Act)-compliant institutional version of ChatGPT. We created an automated function to analyze operative reports and generate a shareable VQI file using two models: gpt-35-turbo and gpt-4. Application of the LLMs was accomplished with a cloud-based programming interface. The outputs of this model were compared with VQI data for accuracy. We defined a metric as "unavailable" to the LLM if it was discussed by surgeons in <20% of operative reports.

Results: A total of 150 operative notes were analyzed, including 50 CEA, 50 EVAR, and 50 LEB. These procedural VQI databases included 25, 179, and 51 metrics, respectively. For all fields, gpt-35-turbo had a median accuracy of 84.0% for CEA (interquartile range [IQR]: 80.0%-88.0%), 92.2% for EVAR (IQR: 87.2%-94.0%), and 84.3% for LEB (IQR: 80.2%-88.1%). A total of 3 of 25, 6 of 179, and 7 of 51 VQI variables were unavailable in the operative reports, respectively. Excluding metric information routinely unavailable in operative reports, the median accuracy rate was 95.5% for each CEA procedure (IQR: 90.9%-100.0%), 94.8% for EVAR (IQR: 92.2%-98.5%), and 93.2% for LEB (IQR: 90.2%-96.4%). Across procedures, gpt-4 did not meaningfully improve performance compared with gpt-35 (P = .97, .85, and .95 for CEA, EVAR, and LEB overall performance, respectively). The cost for 150 operative reports analyzed with gpt-35-turbo and gpt-4 was $0.12 and $3.39, respectively.

Conclusions: LLMs can accurately populate VQI procedural databases with both structured and unstructured data, while incurring only minor processing costs. Increased workflow efficiency may improve center ability to successfully participate in the VQI. Further work examining other VQI databases and methods to increase accuracy is needed.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型可以使用叙述性手术报告准确地填充血管质量倡议程序数据库。
目的:参与血管质量倡议(VQI)为外科医生提供了重要的资源,但这样做的能力往往受到时间和数据输入人员的限制。大型语言模型(llm),如ChatGPT (OpenAI)是生成式人工智能(AI)产品的例子,可以帮助弥合这一差距。这些模型经过大量数据的训练,可用于自然语言处理(NLP)和文本生成。我们评估了LLMs使用手术报告准确填充VQI程序数据库的能力。方法:采用2021-2023年机构VQI数据进行单中心回顾性研究。最近的颈动脉内膜切除术(CEA)、血管内动脉瘤修复(EVAR)和腹股沟下下肢搭桥(LEB)手术使用Versa(符合hipaa的ChatGPT机构版本)进行分析。我们创建了一个自动化功能来分析操作报告,并使用两种模型(gpt-35-turbo和gpt-4)生成可共享的VQI文件。llm的应用是通过基于云的编程接口完成的。该模型的输出与VQI数据的准确性进行了比较。如果外科医生在结果中讨论了一个指标,我们将其定义为LLM“不可用”:分析了150个手术记录,包括50个CEA, 50个EVAR和50个LEB。这些程序性VQI数据库分别包括25、179和51个指标。对于所有领域,gpt-35-turbo对CEA (IQR 80.0-88.0%)、EVAR (IQR 87.2-94.0%)和LEB (IQR 80.2-88.1%)的中位准确率分别为84.0%、92.2%和84.3%。25个VQI变量中有3个,179个VQI变量中有6个,51个VQI变量中有7个在手术报告中无法获得。排除手术报告中常规无法获得的度量信息,每个CEA手术的中位准确率为95.5% (IQR 90.9-100.0%), EVAR的中位准确率为94.8% (IQR 92.2-98.5), LEB的中位准确率为93.2% (IQR 90.2-96.4%)。在整个过程中,与gpt-35相比,gpt-4没有显着提高性能(CEA, EVAR, LEB总体性能p=0.97, 0.85, 0.95)。使用gpt-35-turbo和gpt-4分析150例手术报告的费用分别为0.12美元和3.39美元。结论:llm可以准确地将结构化和非结构化数据填充到VQI过程数据库中,而只产生很小的处理成本。工作流程效率的提高可以提高中心成功参与VQI的能力。需要进一步研究其他VQI数据库和方法以提高准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.70
自引率
18.60%
发文量
1469
审稿时长
54 days
期刊介绍: Journal of Vascular Surgery ® aims to be the premier international journal of medical, endovascular and surgical care of vascular diseases. It is dedicated to the science and art of vascular surgery and aims to improve the management of patients with vascular diseases by publishing relevant papers that report important medical advances, test new hypotheses, and address current controversies. To acheive this goal, the Journal will publish original clinical and laboratory studies, and reports and papers that comment on the social, economic, ethical, legal, and political factors, which relate to these aims. As the official publication of The Society for Vascular Surgery, the Journal will publish, after peer review, selected papers presented at the annual meeting of this organization and affiliated vascular societies, as well as original articles from members and non-members.
期刊最新文献
The great gender dilemma in complex aortic repair: Why do women fare worse with FEVAR? Controversies in the management strategy for symptomatic chronic internal carotid artery occlusion. Endovascular strategies for the short distance between the lowest renal artery and aortic bifurcation. How do we afford the BEST care for females with chronic limb-threatening ischemia? How should failed infrarenal endovascular aortic repair with a short renal artery to bifurcation distance be managed?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1