语言模型在全科医学实习考试中的表现。

IF 1.8 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL Family Medicine Pub Date : 2024-10-01 Epub Date: 2024-08-12 DOI:10.22454/FamMed.2024.233738
Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna
{"title":"语言模型在全科医学实习考试中的表现。","authors":"Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna","doi":"10.22454/FamMed.2024.233738","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).</p><p><strong>Methods: </strong>The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.</p><p><strong>Results: </strong>ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.</p><p><strong>Conclusions: </strong>ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.</p>","PeriodicalId":50456,"journal":{"name":"Family Medicine","volume":" ","pages":"555-560"},"PeriodicalIF":1.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493131/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of Language Models on the Family Medicine In-Training Exam.\",\"authors\":\"Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna\",\"doi\":\"10.22454/FamMed.2024.233738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).</p><p><strong>Methods: </strong>The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.</p><p><strong>Results: </strong>ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.</p><p><strong>Conclusions: </strong>ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.</p>\",\"PeriodicalId\":50456,\"journal\":{\"name\":\"Family Medicine\",\"volume\":\" \",\"pages\":\"555-560\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493131/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Family Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.22454/FamMed.2024.233738\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Family Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.22454/FamMed.2024.233738","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景和目的:人工智能(AI),如 ChatGPT 和 Bard,作为医学教育中的一种工具已越来越受欢迎。人工智能在全科医学中的应用尚未得到评估。本研究旨在比较三种大型语言模型(LLMs;ChatGPT 3.5、ChatGPT 4.0 和 Google Bard)在全科医学内训考试(ITE)中的表现:在 ChatGPT 3.5、ChatGPT 4.0 和 Bard 中输入由美国全科医学委员会编写的 2022 年 ITE 的 193 道选择题。然后对法学硕士的成绩进行评分和标度:ChatGPT 4.0 得分为 167/193 (86.5%),满分 800 分,得分 730 分。根据贝叶斯分数预测法,ChatGPT 4.0 通过家庭医学委员会考试的几率为 100%。ChatGPT 3.5 的得分率为 66.3%,换算成比例分数为 400 分,通过全科医学执业医师考试的几率为 88%。Bard 的得分率为 64.2%,比例分数为 380 分,通过委员会考试的几率为 85%。与全国研究生三年级住院医师的平均水平相比,只有 ChatGPT 4.0 超过了住院医师的平均水平 68.4%:ChatGPT 4.0 是唯一一个在 2022 年 ITE 考试中成绩超过全科医学研究生三年级住院医师全国平均水平的 LLM,它提供了可靠的解释,并证明了其在提供董事会考试中出现的常见医学概念背景信息方面的潜在用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance of Language Models on the Family Medicine In-Training Exam.

Background and objectives: Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).

Methods: The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.

Results: ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.

Conclusions: ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Family Medicine
Family Medicine 医学-医学:内科
CiteScore
2.40
自引率
21.10%
发文量
0
审稿时长
6-12 weeks
期刊介绍: Family Medicine, the official journal of the Society of Teachers of Family Medicine, publishes original research, systematic reviews, narrative essays, and policy analyses relevant to the discipline of family medicine, particularly focusing on primary care medical education, health workforce policy, and health services research. Journal content is not limited to educational research from family medicine educators; and we welcome innovative, high-quality contributions from authors in a variety of specialties and academic fields.
期刊最新文献
Authors' Response to "The Growing Divide Between Teaching Empathy and Being Empathetic". The Growing Divide Between Teaching Empathy and Being Empathetic. An Exploratory Study of Published Case Reports Using a Systematic Typology. Supporting International Medical Graduate Workforce Integration in the 2024 US Election. Three Types of Uncertainty: A Qualitative Study of Family Medicine Residents.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1