Performance of artificial intelligence on a simulated Canadian urology board exam

N. Touma, Jessica E. Caterini, Kiera Liblk
{"title":"Performance of artificial intelligence on a simulated Canadian urology board exam","authors":"N. Touma, Jessica E. Caterini, Kiera Liblk","doi":"10.5489/cuaj.8800","DOIUrl":null,"url":null,"abstract":"Introduction: Generative artificial intelligence (AI) has proven to be a powerful tool with increasing applications in clinical care and medical education. CHATGPT has performed adequately on many specialty certification and knowledge assessment exams. The objective of this study was to assess the performance of CHATGPT 4 on a multiple-choice exam meant to simulate the Canadian urology board exam.\nMethods: Graduating urology residents representing all Canadian training programs gather yearly for a mock exam that simulates their upcoming board-certifying exam. The exam consists of written multiple-choice questions (MCQs) and an oral objective structured clinical examination (OSCE). The 2022 exam was taken by 29 graduating residents and was administered to CHATGPT 4.\nResults: CHATGPT 4 scored 46% on the MCQ exam, whereas the mean and median scores of graduating urology residents were 62.6%, and 62.7%, respectively. This would place CHATGPT's score 1.8 standard deviations from the median. The percentile rank of CHATGPT would be in the sixth percentile. CHATGPT scores on different topics of the exam were as follows: oncology 35%, andrology/benign prostatic hyperplasia 62%, physiology/anatomy 67%, incontinence/female urology 23%, infections 71%, urolithiasis 57%, and trauma/reconstruction 17%, with ChatGPT 4’s oncology performance being significantly below that of postgraduate year 5 residents.\nConclusions: CHATGPT 4 underperforms on an MCQ exam meant to simulate the Canadian board exam. Ongoing assessments of the capability of generative AI is needed as these models evolve and are trained on additional urology content.","PeriodicalId":38001,"journal":{"name":"Canadian Urological Association Journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Urological Association Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5489/cuaj.8800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Generative artificial intelligence (AI) has proven to be a powerful tool with increasing applications in clinical care and medical education. CHATGPT has performed adequately on many specialty certification and knowledge assessment exams. The objective of this study was to assess the performance of CHATGPT 4 on a multiple-choice exam meant to simulate the Canadian urology board exam. Methods: Graduating urology residents representing all Canadian training programs gather yearly for a mock exam that simulates their upcoming board-certifying exam. The exam consists of written multiple-choice questions (MCQs) and an oral objective structured clinical examination (OSCE). The 2022 exam was taken by 29 graduating residents and was administered to CHATGPT 4. Results: CHATGPT 4 scored 46% on the MCQ exam, whereas the mean and median scores of graduating urology residents were 62.6%, and 62.7%, respectively. This would place CHATGPT's score 1.8 standard deviations from the median. The percentile rank of CHATGPT would be in the sixth percentile. CHATGPT scores on different topics of the exam were as follows: oncology 35%, andrology/benign prostatic hyperplasia 62%, physiology/anatomy 67%, incontinence/female urology 23%, infections 71%, urolithiasis 57%, and trauma/reconstruction 17%, with ChatGPT 4’s oncology performance being significantly below that of postgraduate year 5 residents. Conclusions: CHATGPT 4 underperforms on an MCQ exam meant to simulate the Canadian board exam. Ongoing assessments of the capability of generative AI is needed as these models evolve and are trained on additional urology content.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能在模拟加拿大泌尿外科委员会考试中的表现
引言事实证明,人工智能(AI)是一种强大的工具,在临床护理和医学教育中的应用日益广泛。CHATGPT 在许多专科认证和知识评估考试中表现出色。本研究的目的是评估 CHATGPT 4 在模拟加拿大泌尿外科医师资格考试的多项选择考试中的表现:方法:代表加拿大所有培训项目的泌尿科住院医师毕业生每年都会聚集在一起参加模拟考试,模拟即将到来的委员会认证考试。考试包括笔试选择题(MCQ)和口试客观结构化临床考试(OSCE)。29 名即将毕业的住院医师参加了 2022 年的考试,并对 CHATGPT 4.Results 进行了测试:结果:CHATGPT 4 的 MCQ 考试得分率为 46%,而泌尿外科毕业住院医师的平均得分率和中位得分率分别为 62.6% 和 62.7%。因此,CHATGPT 的分数与中位数相差 1.8 个标准差。CHATGPT 的百分位数排在第六位。CHATGPT 在不同考试题目上的得分如下:肿瘤 35%、泌尿外科/良性前列腺增生 62%、生理学/解剖学 67%、尿失禁/女性泌尿外科 23%、感染 71%、泌尿系结石 57%、创伤/重建 17%,其中 ChatGPT 4 在肿瘤方面的表现明显低于研究生五年级住院医师:结论:CHATGPT 4 在模拟加拿大执业医师考试的 MCQ 考试中表现不佳。结论:CHATGPT 4 在模拟加拿大执业医师考试的 MCQ 考试中表现不佳。随着这些模型的发展和在更多泌尿外科内容上的训练,需要对生成式人工智能的能力进行持续评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
167
期刊介绍: Published by the Canadian Urological Association, the Canadian Urological Association Journal (CUAJ) released its first issue in March 2007, and was published four times that year under the guidance of founding editor (Editor Emeritus as of 2012), Dr. Laurence H. Klotz. In 2008, CUAJ became a bimonthly publication. As of 2013, articles have been published monthly, alternating between print and online-only versions (print issues are available in February, April, June, August, October, and December; online-only issues are produced in January, March, May, July, September, and November). In 2017, the journal launched an ahead-of-print publishing strategy, in which accepted manuscripts are published electronically on our website and cited on PubMed ahead of their official issue-based publication date. By significantly shortening the time to article availability, we offer our readers more flexibility in the way they engage with our content: as a continuous stream, or in a monthly “package,” or both. CUAJ covers a broad range of urological topics — oncology, pediatrics, transplantation, endourology, female urology, infertility, and more. We take pride in showcasing the work of some of Canada’s top investigators and providing our readers with the latest relevant evidence-based research, and on being the primary repository for major guidelines and other important practice recommendations. Our long-term vision is to become an essential destination for urology-based research, education, and advocacy for both physicians and patients, and to act as a springboard for discussions within the urologic community.
期刊最新文献
Safety and efficacy of ultrasound-assisted bedside ureteric stent placement 2024 Canadian Urological Association endorsement of an expert report: Kidney involvement in tuberous sclerosis complex Sacral neuromodulation in pediatric refractory bladder and bowel dysfunction Fostering the continued growth of our association On vacation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1