{"title":"Does generative artificial intelligence pose a risk to performance validity test security?","authors":"Shannon Lavigne, Anthony Rios, Jeremy J Davis","doi":"10.1080/13854046.2024.2379023","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>We examined the performance validity test (PVT) security risk presented by artificial intelligence (AI) chatbots asking questions about neuropsychological evaluation and PVTs on two popular generative AI sites.</p><p><strong>Method: </strong>In 2023 and 2024, multiple questions were posed to ChatGPT-3 and Bard (now Gemini). One set started generally and refined follow-up questions based on AI responses. A second set asked how to feign, fake, or cheat. Responses were aggregated and independently rated for inaccuracy and threat. Responses not identified as inaccurate were assigned a four-level threat rating (no, mild, moderate, or high threat). Combined inaccuracy and threat ratings were examined cross-sectionally and longitudinally.</p><p><strong>Results: </strong>Combined inaccuracy rating percentages were 35 to 42% in 2023 and 16 to 28% in 2024. Combined moderate/high threat ratings were observed in 24 to 41% of responses in 2023 and in 17 to 31% of responses in 2024. More ChatGPT-3 responses were rated moderate or high threat compared to Bard/Gemini responses. Over time, ChatGPT-3 responses became more accurate with a similar threat level, but Bard/Gemini responses did not change in accuracy or threat. Responses to how to feign queries demonstrated ethical opposition to feigning. Responses to similar queries in 2024 showed even stronger ethical opposition.</p><p><strong>Conclusions: </strong>AI chatbots are a threat to PVT test security. A proportion of responses were rated as moderate or high threat. Although ethical opposition to feigning guidance increased over time, the natural language interface and the volume of AI chatbot responses represent a potentially greater threat than traditional search engines.</p>","PeriodicalId":55250,"journal":{"name":"Clinical Neuropsychologist","volume":" ","pages":"1-14"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Neuropsychologist","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1080/13854046.2024.2379023","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: We examined the performance validity test (PVT) security risk presented by artificial intelligence (AI) chatbots asking questions about neuropsychological evaluation and PVTs on two popular generative AI sites.
Method: In 2023 and 2024, multiple questions were posed to ChatGPT-3 and Bard (now Gemini). One set started generally and refined follow-up questions based on AI responses. A second set asked how to feign, fake, or cheat. Responses were aggregated and independently rated for inaccuracy and threat. Responses not identified as inaccurate were assigned a four-level threat rating (no, mild, moderate, or high threat). Combined inaccuracy and threat ratings were examined cross-sectionally and longitudinally.
Results: Combined inaccuracy rating percentages were 35 to 42% in 2023 and 16 to 28% in 2024. Combined moderate/high threat ratings were observed in 24 to 41% of responses in 2023 and in 17 to 31% of responses in 2024. More ChatGPT-3 responses were rated moderate or high threat compared to Bard/Gemini responses. Over time, ChatGPT-3 responses became more accurate with a similar threat level, but Bard/Gemini responses did not change in accuracy or threat. Responses to how to feign queries demonstrated ethical opposition to feigning. Responses to similar queries in 2024 showed even stronger ethical opposition.
Conclusions: AI chatbots are a threat to PVT test security. A proportion of responses were rated as moderate or high threat. Although ethical opposition to feigning guidance increased over time, the natural language interface and the volume of AI chatbot responses represent a potentially greater threat than traditional search engines.
期刊介绍:
The Clinical Neuropsychologist (TCN) serves as the premier forum for (1) state-of-the-art clinically-relevant scientific research, (2) in-depth professional discussions of matters germane to evidence-based practice, and (3) clinical case studies in neuropsychology. Of particular interest are papers that can make definitive statements about a given topic (thereby having implications for the standards of clinical practice) and those with the potential to expand today’s clinical frontiers. Research on all age groups, and on both clinical and normal populations, is considered.