评估作为临床医生学术支持工具的 GPT-4:文献中病例记录的比较分析

B.L. Fabre , M.A.F. Magalhaes Filho , P.N. Aguiar Jr , F.M. da Costa , B. Gutierres , W.N. William Jr , A. Del Giglio
{"title":"评估作为临床医生学术支持工具的 GPT-4:文献中病例记录的比较分析","authors":"B.L. Fabre ,&nbsp;M.A.F. Magalhaes Filho ,&nbsp;P.N. Aguiar Jr ,&nbsp;F.M. da Costa ,&nbsp;B. Gutierres ,&nbsp;W.N. William Jr ,&nbsp;A. Del Giglio","doi":"10.1016/j.esmorw.2024.100042","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures.</p></div><div><h3>Materials and methods</h3><p>We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the <em>New England Journal of Medicine</em> after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up.</p></div><div><h3>Results</h3><p>The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen.</p></div><div><h3>Conclusions</h3><p>Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.</p></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"4 ","pages":"Article 100042"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949820124000201/pdfft?md5=f32f81c8fc771b7987718bc67be461a8&pid=1-s2.0-S2949820124000201-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature\",\"authors\":\"B.L. Fabre ,&nbsp;M.A.F. Magalhaes Filho ,&nbsp;P.N. Aguiar Jr ,&nbsp;F.M. da Costa ,&nbsp;B. Gutierres ,&nbsp;W.N. William Jr ,&nbsp;A. Del Giglio\",\"doi\":\"10.1016/j.esmorw.2024.100042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures.</p></div><div><h3>Materials and methods</h3><p>We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the <em>New England Journal of Medicine</em> after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up.</p></div><div><h3>Results</h3><p>The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen.</p></div><div><h3>Conclusions</h3><p>Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.</p></div>\",\"PeriodicalId\":100491,\"journal\":{\"name\":\"ESMO Real World Data and Digital Oncology\",\"volume\":\"4 \",\"pages\":\"Article 100042\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949820124000201/pdfft?md5=f32f81c8fc771b7987718bc67be461a8&pid=1-s2.0-S2949820124000201-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESMO Real World Data and Digital Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949820124000201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949820124000201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景人工智能(AI)和自然语言处理(NLP)的进步催生了像 GPT-4.0 这样的先进工具,使临床医生能够探索其作为医疗保健管理支持工具的效用。我们的研究旨在评估GPT-4在建议明确诊断和适当检查以尽量减少不必要手术方面的能力。材料与方法我们进行了一项回顾性比较分析,从2022年以后发表在《新英格兰医学杂志》上的10个病例中提取临床数据,并将这些数据输入GPT-4,以生成诊断和检查建议。主要终点:正确识别最终诊断的能力。次要终点:将明确诊断列为五个最可能的鉴别诊断之首,并确定充分工作检查的能力。结果在 10 个病例中,人工智能有 2 个无法确定明确诊断(不准确率为 20%)。在人工智能正确识别的 8 个病例中,有 5 个病例(63%)将明确诊断列在了鉴别诊断列表的首位。在诊断化验和检查方面,人工智能在两个病例中建议了不必要的程序,占人工智能未能正确确定最终诊断的病例的 40%。此外,有 7 个病例(70%)的人工智能无法提出适当的治疗建议。结论我们的研究证明了 GPT-4 作为学术支持工具的潜力,尽管它不能正确识别 20% 病例的最终诊断,而且人工智能要求对 40% 的患者进行不必要的额外诊断检测。未来的研究应侧重于使用更广泛、更多样的样本来评估 GPT-4 的性能,纳入前瞻性评估,并研究其提高诊断和治疗准确性的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature

Background

Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures.

Materials and methods

We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up.

Results

The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen.

Conclusions

Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Utility of automated data transfer for cancer clinical trials and considerations for implementation Characterisation of oncology EHR-derived real-world data in the UK, Germany, and Japan Evolving treatment patterns and outcomes among patients with metastatic urothelial carcinoma post-avelumab maintenance approval: insights from The US Oncology Network Collaborating across sectors in service of open science, precision oncology, and patients: an overview of the AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) Biopharma Collaborative (BPC) Data analytics for real-world data integration in TKI-treated NSCLC patients using electronic health records
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1