Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations.

IF 1.2 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL Journal of Nippon Medical School Pub Date : 2024-05-21 Epub Date: 2024-03-02 DOI:10.1272/jnms.JNMS.2024_91-205
Yutaka Igarashi, Kyoichi Nakahara, Tatsuya Norii, Nodoka Miyake, Takashi Tagami, Shoji Yokobori
{"title":"Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations.","authors":"Yutaka Igarashi, Kyoichi Nakahara, Tatsuya Norii, Nodoka Miyake, Takashi Tagami, Shoji Yokobori","doi":"10.1272/jnms.JNMS.2024_91-205","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Emergency physicians need a broad range of knowledge and skills to address critical medical, traumatic, and environmental conditions. Artificial intelligence (AI), including large language models (LLMs), has potential applications in healthcare settings; however, the performance of LLMs in emergency medicine remains unclear.</p><p><strong>Methods: </strong>To evaluate the reliability of information provided by ChatGPT, an LLM was given the questions set by the Japanese Association of Acute Medicine in its board certification examinations over a period of 5 years (2018-2022) and programmed to answer them twice. Statistical analysis was used to assess agreement of the two responses.</p><p><strong>Results: </strong>The LLM successfully answered 465 of the 475 text-based questions, achieving an overall correct response rate of 62.3%. For questions without images, the rate of correct answers was 65.9%. For questions with images that were not explained to the LLM, the rate of correct answers was only 52.0%. The annual rates of correct answers to questions without images ranged from 56.3% to 78.8%. Accuracy was better for scenario-based questions (69.1%) than for stand-alone questions (62.1%). Agreement between the two responses was substantial (kappa = 0.70). Factual error accounted for 82% of the incorrectly answered questions.</p><p><strong>Conclusion: </strong>An LLM performed satisfactorily on an emergency medicine board certification examination in Japanese and without images. However, factual errors in the responses highlight the need for physician oversight when using LLMs.</p>","PeriodicalId":56076,"journal":{"name":"Journal of Nippon Medical School","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nippon Medical School","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1272/jnms.JNMS.2024_91-205","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Emergency physicians need a broad range of knowledge and skills to address critical medical, traumatic, and environmental conditions. Artificial intelligence (AI), including large language models (LLMs), has potential applications in healthcare settings; however, the performance of LLMs in emergency medicine remains unclear.

Methods: To evaluate the reliability of information provided by ChatGPT, an LLM was given the questions set by the Japanese Association of Acute Medicine in its board certification examinations over a period of 5 years (2018-2022) and programmed to answer them twice. Statistical analysis was used to assess agreement of the two responses.

Results: The LLM successfully answered 465 of the 475 text-based questions, achieving an overall correct response rate of 62.3%. For questions without images, the rate of correct answers was 65.9%. For questions with images that were not explained to the LLM, the rate of correct answers was only 52.0%. The annual rates of correct answers to questions without images ranged from 56.3% to 78.8%. Accuracy was better for scenario-based questions (69.1%) than for stand-alone questions (62.1%). Agreement between the two responses was substantial (kappa = 0.70). Factual error accounted for 82% of the incorrectly answered questions.

Conclusion: An LLM performed satisfactorily on an emergency medicine board certification examination in Japanese and without images. However, factual errors in the responses highlight the need for physician oversight when using LLMs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型在日本急诊医学委员会认证考试中的表现。
背景 急诊医生需要广泛的知识和技能来应对危急的医疗、创伤和环境状况。人工智能(AI),包括大型语言模型(LLMs),在医疗环境中具有潜在的应用价值;然而,LLMs 在急诊医学中的表现仍不明确。方法 为了评估 ChatGPT 所提供信息的可靠性,向一名 LLM 提供了日本急诊医学协会在其董事会认证考试中设置的问题,为期 5 年(2018-2022 年),并通过编程让其回答两次。结果 在 475 道基于文本的问题中,法学硕士成功回答了 465 道,总体正确率为 62.3%。对于没有图片的问题,正确率为 65.9%。对于有图像但未向 LLM 解释的问题,正确率仅为 52.0%。无图像问题的年正确率为 56.3% 至 78.8%。基于情景的问题(69.1%)的正确率高于独立问题(62.1%)。两种回答之间的一致性很高(kappa = 0.70)。在回答错误的问题中,事实错误占 82%。然而,答题中的事实错误凸显了医生在使用 LLM 时进行监督的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Nippon Medical School
Journal of Nippon Medical School MEDICINE, GENERAL & INTERNAL-
CiteScore
1.80
自引率
10.00%
发文量
118
期刊介绍: The international effort to understand, treat and control disease involve clinicians and researchers from many medical and biological science disciplines. The Journal of Nippon Medical School (JNMS) is the official journal of the Medical Association of Nippon Medical School and is dedicated to furthering international exchange of medical science experience and opinion. It provides an international forum for researchers in the fields of bascic and clinical medicine to introduce, discuss and exchange thier novel achievements in biomedical science and a platform for the worldwide dissemination and steering of biomedical knowledge for the benefit of human health and welfare. Properly reasoned discussions disciplined by appropriate references to existing bodies of knowledge or aimed at motivating the creation of such knowledge is the aim of the journal.
期刊最新文献
Use of a rigid curved laryngoscope for observation and debridement of degenerated cricoid cartilage in nasogastric tube syndrome: A case report. 24-Hour Intraocular Pressure Fluctuation Suppressed by Microhook Trabeculotomy in Ocular Hypertension: A Case Report. Acute focal bacterial nephritis in an infant referred with apnea caused by mixed infection with Enterococcus raffinosus and Escherichia coli. Early Laparoscopic Colostomy in Advanced Cancer Patients with Rectovaginal Fistula: Results of Seven Patients. Early and Post-Treatment Imaging Findings in Perineural Spread: A Pathway to Diffuse Muscle Metastasis in Recurrent Bladder Carcinoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1