Alejandro García-Rudolph, David Sanchez-Pinsach, Eloy Opisso
{"title":"Evaluating AI Models: Performance Validation Using Formal Multiple-Choice Questions in Neuropsychology.","authors":"Alejandro García-Rudolph, David Sanchez-Pinsach, Eloy Opisso","doi":"10.1093/arclin/acae068","DOIUrl":null,"url":null,"abstract":"<p><p>High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the \"Assessment\" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The \"Assessment\" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as \"Neurodegenerative Diseases\" and \"Neuropsychological Testing and Interpretation.\"</p>","PeriodicalId":8176,"journal":{"name":"Archives of Clinical Neuropsychology","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Clinical Neuropsychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1093/arclin/acae068","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the "Assessment" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The "Assessment" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as "Neurodegenerative Diseases" and "Neuropsychological Testing and Interpretation."
期刊介绍:
The journal publishes original contributions dealing with psychological aspects of the etiology, diagnosis, and treatment of disorders arising out of dysfunction of the central nervous system. Archives of Clinical Neuropsychology will also consider manuscripts involving the established principles of the profession of neuropsychology: (a) delivery and evaluation of services, (b) ethical and legal issues, and (c) approaches to education and training. Preference will be given to empirical reports and key reviews. Brief research reports, case studies, and commentaries on published articles (not exceeding two printed pages) will also be considered. At the discretion of the editor, rebuttals to commentaries may be invited. Occasional papers of a theoretical nature will be considered.