ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.

IF 1.4 Q3 PEDIATRICS Global Pediatric Health Pub Date : 2024-03-24 eCollection Date: 2024-01-01 DOI:10.1177/2333794X241240327
Mindy Le, Michael Davis
{"title":"ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.","authors":"Mindy Le, Michael Davis","doi":"10.1177/2333794X241240327","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI's ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP<sup>®</sup> Self-Assessment, American Academy of Pediatrics (AAP).</p><p><strong>Methods: </strong>We entered 245 questions and answer choices from the Pediatrics 2021 PREP<sup>®</sup> Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP<sup>®</sup> Self-Assessment into OpenAI's ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP<sup>®</sup> exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers.</p><p><strong>Results: </strong>For the AAP 2021 and 2022 PREP<sup>®</sup> Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly.</p><p><strong>Conclusion: </strong>Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot's ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.</p>","PeriodicalId":12576,"journal":{"name":"Global Pediatric Health","volume":"11 ","pages":"2333794X241240327"},"PeriodicalIF":1.4000,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10962030/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Pediatric Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/2333794X241240327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI's ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP® Self-Assessment, American Academy of Pediatrics (AAP).

Methods: We entered 245 questions and answer choices from the Pediatrics 2021 PREP® Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP® Self-Assessment into OpenAI's ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP® exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers.

Results: For the AAP 2021 and 2022 PREP® Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly.

Conclusion: Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot's ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT 在儿科医师资格预备考试中获得及格分数,但却引起了警惕。
研究目的我们旨在评估公开在线人工智能程序(OpenAI的ChatGPT-3.5和-4.0,8月3日版本)在美国儿科学会(AAP)2021年和2022年PREP®自我评估儿科委员会预备考试中的表现:我们于 2023 年 9 月将儿科 2021 年 PREP® 自我评估中的 245 个问题和答案选择以及儿科 2022 年 PREP® 自我评估中的 247 个问题和答案选择输入 OpenAI 的 ChatGPT-3.5 和 ChatGPT-4.0(8 月 3 日版本)。将 ChatGPT-3.5 和 4.0 分数与 PREP® 考试的广告合格分数(70% 以上)以及所有 10 715 名和 6825 名首次参加人类考试者的平均分数(74.09%)和(75.71%)进行了比较:对于 AAP 2021 和 2022 PREP® 自我评估,ChatGPT-3.5 一次就正确回答了 243 道题中的 143 道题(58.85%)和 247 道题中的 137 道题(55.46%)。ChatGPT-4.0 能正确回答 243 个问题中的 193 个(79.84%)和 247 个问题中的 208 个(84.21%):使用公开的在线聊天机器人回答儿科医师资格考试的预备考试问题可获得及格分数,但聊天机器人在评估儿童某些复杂医疗情况的能力方面存在很大的局限性,这给这一弱势群体带来了潜在的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Global Pediatric Health
Global Pediatric Health Nursing-Pediatrics
CiteScore
2.20
自引率
0.00%
发文量
105
审稿时长
12 weeks
期刊最新文献
Prevalence of Undernutrition Among Children and Adolescents with Cancer Living in Sub-Saharan African Countries: A Systematic Review and Meta-analysis. A Boy with an Itch: A Case Report of Genital Nodular Scabies. Microbiological Spectrum of Osteoarticular Infections and Their Management in Mongolian Children. Clinical Characteristics and Valve Lesions in Rheumatic Heart Disease Among Children at Hiwot Fana Comprehensive Specialized Hospital: A Comparative Study of Newly Diagnosed and Known Cases. Prevalence of Respiratory Syncytial Virus Among Children Under 5 Years of Age in Sub-Saharan Africa.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1