比医生更快更好？评估ChatGPT对视神经脊髓炎谱系障碍误诊患者的诊断能力。

IF 3.6 3区医学 Q1 CLINICAL NEUROLOGY Journal of the Neurological Sciences Pub Date : 2025-01-15 Epub Date: 2024-12-19 DOI:10.1016/j.jns.2024.123360

Kevin Shan, Mahi A Patel, Morgan McCreary, Tom G Punnen, Francisco Villalobos, Lauren M Tardo, Lindsay A Horton, Peter V Sguigna, Kyle M Blackburn, Shanan B Munoz, Katy W Burgess, Tatum M Moog, Alexander D Smith, Darin T Okuda

{"title":"比医生更快更好？评估ChatGPT对视神经脊髓炎谱系障碍误诊患者的诊断能力。","authors":"Kevin Shan, Mahi A Patel, Morgan McCreary, Tom G Punnen, Francisco Villalobos, Lauren M Tardo, Lindsay A Horton, Peter V Sguigna, Kyle M Blackburn, Shanan B Munoz, Katy W Burgess, Tatum M Moog, Alexander D Smith, Darin T Okuda","doi":"10.1016/j.jns.2024.123360","DOIUrl":null,"url":null,"abstract":"Background: Neuromyelitis optica spectrum disorder (NMOSD) is a commonly misdiagnosed condition. Driven by cost-consciousness and technological fluency, distinct generations may gravitate towards healthcare alternatives, including artificial intelligence (AI) models, such as ChatGPT (Generative Pre-trained Transformer). Our objective was to evaluate the speed and accuracy of ChatGPT-3.5 (GPT-3.5) in the diagnosis of people with NMOSD (PwNMOSD) initially misdiagnosed.Methods: Misdiagnosed PwNMOSD were retrospectively identified with clinical symptoms and time line of medically related events processed through GPT-3.5. For each subject, seven digital derivatives representing different races, ethnicities, and sexes were created and processed identically to evaluate the impact of these variables on accuracy. Scoresheets were used to track diagnostic success and time to diagnosis. Diagnostic speed of GPT-3.5 was evaluated against physicians using a Cox proportional hazards model, clustered by subject. Logistical regression was used to estimate the diagnostic accuracy of GPT-3.5 compared with the estimated accuracy of physicians.Results: Clinical time lines for 68 individuals (59 female, 42 Black/African American, 13 White, 11 Hispanic, 2 Asian; mean age at first symptoms 34.4 years (y) (standard deviation = 15.5y)) were analyzed and 476 digital simulations created, yielding 544 conversations for analysis. The instantaneous probability of correct diagnosis was 70.65% less for physicians relative to GPT-3.5 within 240 days of symptom onset (p < 0.0001). The estimated probability of correct diagnosis for GPT-3.5 was 80.88% [95% CI = (76.35%, 99.81%)].Conclusion: GPT-3.5 may be of value in recognizing NMOSD. However, the manner in which medical information is conveyed, combined with the potential for inaccuracies may result in unnecessary psychological stress.","PeriodicalId":17417,"journal":{"name":"Journal of the Neurological Sciences","volume":"468 ","pages":"123360"},"PeriodicalIF":3.6000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Faster and better than a physician?: Assessing diagnostic proficiency of ChatGPT in misdiagnosed individuals with neuromyelitis optica spectrum disorder.\",\"authors\":\"Kevin Shan, Mahi A Patel, Morgan McCreary, Tom G Punnen, Francisco Villalobos, Lauren M Tardo, Lindsay A Horton, Peter V Sguigna, Kyle M Blackburn, Shanan B Munoz, Katy W Burgess, Tatum M Moog, Alexander D Smith, Darin T Okuda\",\"doi\":\"10.1016/j.jns.2024.123360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Neuromyelitis optica spectrum disorder (NMOSD) is a commonly misdiagnosed condition. Driven by cost-consciousness and technological fluency, distinct generations may gravitate towards healthcare alternatives, including artificial intelligence (AI) models, such as ChatGPT (Generative Pre-trained Transformer). Our objective was to evaluate the speed and accuracy of ChatGPT-3.5 (GPT-3.5) in the diagnosis of people with NMOSD (PwNMOSD) initially misdiagnosed.Methods: Misdiagnosed PwNMOSD were retrospectively identified with clinical symptoms and time line of medically related events processed through GPT-3.5. For each subject, seven digital derivatives representing different races, ethnicities, and sexes were created and processed identically to evaluate the impact of these variables on accuracy. Scoresheets were used to track diagnostic success and time to diagnosis. Diagnostic speed of GPT-3.5 was evaluated against physicians using a Cox proportional hazards model, clustered by subject. Logistical regression was used to estimate the diagnostic accuracy of GPT-3.5 compared with the estimated accuracy of physicians.Results: Clinical time lines for 68 individuals (59 female, 42 Black/African American, 13 White, 11 Hispanic, 2 Asian; mean age at first symptoms 34.4 years (y) (standard deviation = 15.5y)) were analyzed and 476 digital simulations created, yielding 544 conversations for analysis. The instantaneous probability of correct diagnosis was 70.65% less for physicians relative to GPT-3.5 within 240 days of symptom onset (p < 0.0001). The estimated probability of correct diagnosis for GPT-3.5 was 80.88% [95% CI = (76.35%, 99.81%)].Conclusion: GPT-3.5 may be of value in recognizing NMOSD. However, the manner in which medical information is conveyed, combined with the potential for inaccuracies may result in unnecessary psychological stress.\",\"PeriodicalId\":17417,\"journal\":{\"name\":\"Journal of the Neurological Sciences\",\"volume\":\"468 \",\"pages\":\"123360\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Neurological Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jns.2024.123360\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Neurological Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jns.2024.123360","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：视神经脊髓炎谱系障碍（NMOSD）是一种常被误诊的疾病。在成本意识和技术流畅性的驱动下，不同的世代可能会被医疗保健替代方案所吸引，包括人工智能（AI）模型，如ChatGPT（生成预训练变压器）。我们的目的是评估ChatGPT-3.5 （GPT-3.5）在诊断最初误诊的NMOSD （PwNMOSD）患者中的速度和准确性。方法：对误诊的PwNMOSD进行回顾性鉴定，并通过GPT-3.5处理临床症状和医学相关事件时间线。对于每个主题，七个代表不同种族、民族和性别的数字衍生品被创建和处理，以评估这些变量对准确性的影响。记分表用于跟踪诊断成功和诊断时间。GPT-3.5的诊断速度采用Cox比例风险模型对医生进行评估，按受试者聚类。使用逻辑回归来估计GPT-3.5的诊断准确性，并与医生的估计准确性进行比较。结果：68例患者的临床时间线(女性59例，黑人/非裔美国人42例，白人13例，西班牙裔11例，亚洲人2例；分析了首次出现症状的平均年龄34.4岁（y)（标准差= 15.5y）），并创建了476个数字模拟，产生了544个用于分析的对话。在症状出现后240天内，医师对NMOSD的即时诊断正确率比GPT-3.5低70.65% (p)。然而，医疗信息的传递方式，加上可能出现的不准确，可能会造成不必要的心理压力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Faster and better than a physician?: Assessing diagnostic proficiency of ChatGPT in misdiagnosed individuals with neuromyelitis optica spectrum disorder.

Background: Neuromyelitis optica spectrum disorder (NMOSD) is a commonly misdiagnosed condition. Driven by cost-consciousness and technological fluency, distinct generations may gravitate towards healthcare alternatives, including artificial intelligence (AI) models, such as ChatGPT (Generative Pre-trained Transformer). Our objective was to evaluate the speed and accuracy of ChatGPT-3.5 (GPT-3.5) in the diagnosis of people with NMOSD (PwNMOSD) initially misdiagnosed.

Methods: Misdiagnosed PwNMOSD were retrospectively identified with clinical symptoms and time line of medically related events processed through GPT-3.5. For each subject, seven digital derivatives representing different races, ethnicities, and sexes were created and processed identically to evaluate the impact of these variables on accuracy. Scoresheets were used to track diagnostic success and time to diagnosis. Diagnostic speed of GPT-3.5 was evaluated against physicians using a Cox proportional hazards model, clustered by subject. Logistical regression was used to estimate the diagnostic accuracy of GPT-3.5 compared with the estimated accuracy of physicians.

Results: Clinical time lines for 68 individuals (59 female, 42 Black/African American, 13 White, 11 Hispanic, 2 Asian; mean age at first symptoms 34.4 years (y) (standard deviation = 15.5y)) were analyzed and 476 digital simulations created, yielding 544 conversations for analysis. The instantaneous probability of correct diagnosis was 70.65% less for physicians relative to GPT-3.5 within 240 days of symptom onset (p < 0.0001). The estimated probability of correct diagnosis for GPT-3.5 was 80.88% [95% CI = (76.35%, 99.81%)].

Conclusion: GPT-3.5 may be of value in recognizing NMOSD. However, the manner in which medical information is conveyed, combined with the potential for inaccuracies may result in unnecessary psychological stress.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Neurological Sciences 医学-临床神经学

CiteScore

7.60

自引率

2.30%

发文量

313

审稿时长

22 days

期刊介绍： The Journal of the Neurological Sciences provides a medium for the prompt publication of original articles in neurology and neuroscience from around the world. JNS places special emphasis on articles that: 1) provide guidance to clinicians around the world (Best Practices, Global Neurology); 2) report cutting-edge science related to neurology (Basic and Translational Sciences); 3) educate readers about relevant and practical clinical outcomes in neurology (Outcomes Research); and 4) summarize or editorialize the current state of the literature (Reviews, Commentaries, and Editorials). JNS accepts most types of manuscripts for consideration including original research papers, short communications, reviews, book reviews, letters to the Editor, opinions and editorials. Topics considered will be from neurology-related fields that are of interest to practicing physicians around the world. Examples include neuromuscular diseases, demyelination, atrophies, dementia, neoplasms, infections, epilepsies, disturbances of consciousness, stroke and cerebral circulation, growth and development, plasticity and intermediary metabolism.