A Andrew Dimmick, Charlie C Su, Hanan S Rafiuddin, David C Cicero
{"title":"Evaluating ChatGPT for neurocognitive disorder diagnosis: a multicenter study.","authors":"A Andrew Dimmick, Charlie C Su, Hanan S Rafiuddin, David C Cicero","doi":"10.1080/13854046.2025.2475567","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective</b>: To evaluate the accuracy and reliability of ChatGPT 4 Omni in diagnosing neurocognitive disorders using comprehensive clinical data and compare its performance to previous versions of ChatGPT. <b>Method</b>: This project utilized a two-part design: Study 1 examined diagnostic agreement between ChatGPT 4 Omni and clinicians using a few-shot prompt approach, and Study 2 compared the diagnostic performance of ChatGPT models using a zero-shot prompt approach using data from the National Alzheimer's Coordinating Center (NACC) Uniform Data Set 3. Study 1 included 12,922 older adults (<i>M<sub>age</sub></i> = 69.13, <i>SD</i> = 9.87), predominantly female (57%) and White (80%). Study 2 involved 537 older adults (<i>M<sub>age</sub></i> = 67.88, <i>SD</i> = 9.52), majority female (57%) and White (81%). Diagnoses included no cognitive impairment, amnestic mild cognitive impairment (MCI), nonamnestic MCI, and dementia. <b>Results</b>: In Study 1, ChatGPT 4 Omni showed fair association with clinician diagnoses (χ2 (9) = 6021.96, <i>p</i> < .001; κ = .33). Notable predictive measures of agreement included the MoCA and memory recall tests. ChatGPT 4 Omni demonstrated high internal reliability (α = .96). In Study 2, no significant diagnostic agreement was found between ChatGPT versions and clinicians. <b>Conclusions</b>: Although ChatGPT 4 Omni shows potential in aligning with clinician diagnoses, its diagnostic accuracy is insufficient for clinical application without human oversight. Continued refinement and comprehensive training of AI models are essential to enhance their utility in neuropsychological assessment. With rapidly developing technological innovations, integrating AI tools in clinical practice could soon improve diagnostic efficiency and accessibility to neuropsychological services.</p>","PeriodicalId":55250,"journal":{"name":"Clinical Neuropsychologist","volume":" ","pages":"1-16"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Neuropsychologist","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1080/13854046.2025.2475567","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate the accuracy and reliability of ChatGPT 4 Omni in diagnosing neurocognitive disorders using comprehensive clinical data and compare its performance to previous versions of ChatGPT. Method: This project utilized a two-part design: Study 1 examined diagnostic agreement between ChatGPT 4 Omni and clinicians using a few-shot prompt approach, and Study 2 compared the diagnostic performance of ChatGPT models using a zero-shot prompt approach using data from the National Alzheimer's Coordinating Center (NACC) Uniform Data Set 3. Study 1 included 12,922 older adults (Mage = 69.13, SD = 9.87), predominantly female (57%) and White (80%). Study 2 involved 537 older adults (Mage = 67.88, SD = 9.52), majority female (57%) and White (81%). Diagnoses included no cognitive impairment, amnestic mild cognitive impairment (MCI), nonamnestic MCI, and dementia. Results: In Study 1, ChatGPT 4 Omni showed fair association with clinician diagnoses (χ2 (9) = 6021.96, p < .001; κ = .33). Notable predictive measures of agreement included the MoCA and memory recall tests. ChatGPT 4 Omni demonstrated high internal reliability (α = .96). In Study 2, no significant diagnostic agreement was found between ChatGPT versions and clinicians. Conclusions: Although ChatGPT 4 Omni shows potential in aligning with clinician diagnoses, its diagnostic accuracy is insufficient for clinical application without human oversight. Continued refinement and comprehensive training of AI models are essential to enhance their utility in neuropsychological assessment. With rapidly developing technological innovations, integrating AI tools in clinical practice could soon improve diagnostic efficiency and accessibility to neuropsychological services.
期刊介绍:
The Clinical Neuropsychologist (TCN) serves as the premier forum for (1) state-of-the-art clinically-relevant scientific research, (2) in-depth professional discussions of matters germane to evidence-based practice, and (3) clinical case studies in neuropsychology. Of particular interest are papers that can make definitive statements about a given topic (thereby having implications for the standards of clinical practice) and those with the potential to expand today’s clinical frontiers. Research on all age groups, and on both clinical and normal populations, is considered.