全科医学在训考试中的生成预训练转换器 (GPT-4) 性能评估。

IF 2.4 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL Journal of the American Board of Family Medicine Pub Date : 2024-10-25 DOI:10.3122/jabfm.2023.230433R1
Ting Wang, Arch G Mainous, Keith Stelter, Thomas R O'Neill, Warren P Newton
{"title":"全科医学在训考试中的生成预训练转换器 (GPT-4) 性能评估。","authors":"Ting Wang, Arch G Mainous, Keith Stelter, Thomas R O'Neill, Warren P Newton","doi":"10.3122/jabfm.2023.230433R1","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>In this study, we sought to comprehensively evaluate GPT-4 (Generative Pre-trained Transformer)'s performance on the 2022 American Board of Family Medicine's (ABFM) In-Training Examination (ITE), compared with its predecessor, GPT-3.5, and the national family residents' performance on the same examination.</p><p><strong>Methods: </strong>We utilized both quantitative and qualitative analyses. First, a quantitative analysis was employed to evaluate the model's performance metrics using zero-shot prompt (where only examination questions were provided without any additional information). After this, qualitative analysis was executed to understand the nature of the model's responses, the depth of its medical knowledge, and its ability to comprehend contextual or new information through chain-of-thoughts prompts (interactive conversation) with the model.</p><p><strong>Results: </strong>This study demonstrated that GPT-4 made significant improvement in accuracy compared with GPT-3.5 over a 4-month interval between their respective release dates. The correct percentage with zero-shot prompt increased from 56% to 84%, which translates to a scaled score growth from 280 to 690, a 410-point increase. Most notably, further chain-of-thought investigation revealed GPT-4's ability to integrate new information and make self-correction when needed.</p><p><strong>Conclusions: </strong>In this study, GPT-4 has demonstrated notably high accuracy, as well as rapid reading and learning capabilities. These results are consistent with previous research indicating GPT-4's significant potential to assist in clinical decision making. Furthermore, the study highlights the essential role of physicians' critical thinking and lifelong learning skills, particularly evident through the analysis of GPT-4's incorrect responses. This emphasizes the indispensable human element in effectively implementing and using AI technologies in medical settings.</p>","PeriodicalId":50018,"journal":{"name":"Journal of the American Board of Family Medicine","volume":" ","pages":"528-582"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination.\",\"authors\":\"Ting Wang, Arch G Mainous, Keith Stelter, Thomas R O'Neill, Warren P Newton\",\"doi\":\"10.3122/jabfm.2023.230433R1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>In this study, we sought to comprehensively evaluate GPT-4 (Generative Pre-trained Transformer)'s performance on the 2022 American Board of Family Medicine's (ABFM) In-Training Examination (ITE), compared with its predecessor, GPT-3.5, and the national family residents' performance on the same examination.</p><p><strong>Methods: </strong>We utilized both quantitative and qualitative analyses. First, a quantitative analysis was employed to evaluate the model's performance metrics using zero-shot prompt (where only examination questions were provided without any additional information). After this, qualitative analysis was executed to understand the nature of the model's responses, the depth of its medical knowledge, and its ability to comprehend contextual or new information through chain-of-thoughts prompts (interactive conversation) with the model.</p><p><strong>Results: </strong>This study demonstrated that GPT-4 made significant improvement in accuracy compared with GPT-3.5 over a 4-month interval between their respective release dates. The correct percentage with zero-shot prompt increased from 56% to 84%, which translates to a scaled score growth from 280 to 690, a 410-point increase. Most notably, further chain-of-thought investigation revealed GPT-4's ability to integrate new information and make self-correction when needed.</p><p><strong>Conclusions: </strong>In this study, GPT-4 has demonstrated notably high accuracy, as well as rapid reading and learning capabilities. These results are consistent with previous research indicating GPT-4's significant potential to assist in clinical decision making. Furthermore, the study highlights the essential role of physicians' critical thinking and lifelong learning skills, particularly evident through the analysis of GPT-4's incorrect responses. This emphasizes the indispensable human element in effectively implementing and using AI technologies in medical settings.</p>\",\"PeriodicalId\":50018,\"journal\":{\"name\":\"Journal of the American Board of Family Medicine\",\"volume\":\" \",\"pages\":\"528-582\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Board of Family Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3122/jabfm.2023.230433R1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Board of Family Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3122/jabfm.2023.230433R1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

目的:在本研究中,我们试图全面评估 GPT-4(预培训生成器)在 2022 年美国全科医学委员会(ABFM)培训考试(ITE)中的表现,并与其前身 GPT-3.5 和全国全科住院医师在同一考试中的表现进行比较:我们采用了定量和定性分析。首先,我们采用了定量分析,利用零点提示(只提供考题,不提供任何其他信息)来评估模型的性能指标。然后进行定性分析,通过与模型的思维链提示(互动对话),了解模型回答的性质、医学知识的深度以及理解上下文或新信息的能力:研究结果表明,与 GPT-3.5 相比,GPT-4 在发布后的 4 个月内,准确率有了显著提高。零枪提示的正确率从 56% 提高到 84%,这意味着评分从 280 分提高到 690 分,提高了 410 分。最值得注意的是,进一步的思维链调查显示,GPT-4 能够整合新信息,并在必要时进行自我修正:在这项研究中,GPT-4 表现出了显著的高准确性、快速阅读和学习能力。这些结果与之前的研究一致,表明 GPT-4 在协助临床决策方面具有巨大潜力。此外,该研究还强调了医生的批判性思维和终身学习能力的重要作用,这一点在分析 GPT-4 的错误回答时尤为明显。这强调了在医疗环境中有效实施和使用人工智能技术不可或缺的人为因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination.

Objective: In this study, we sought to comprehensively evaluate GPT-4 (Generative Pre-trained Transformer)'s performance on the 2022 American Board of Family Medicine's (ABFM) In-Training Examination (ITE), compared with its predecessor, GPT-3.5, and the national family residents' performance on the same examination.

Methods: We utilized both quantitative and qualitative analyses. First, a quantitative analysis was employed to evaluate the model's performance metrics using zero-shot prompt (where only examination questions were provided without any additional information). After this, qualitative analysis was executed to understand the nature of the model's responses, the depth of its medical knowledge, and its ability to comprehend contextual or new information through chain-of-thoughts prompts (interactive conversation) with the model.

Results: This study demonstrated that GPT-4 made significant improvement in accuracy compared with GPT-3.5 over a 4-month interval between their respective release dates. The correct percentage with zero-shot prompt increased from 56% to 84%, which translates to a scaled score growth from 280 to 690, a 410-point increase. Most notably, further chain-of-thought investigation revealed GPT-4's ability to integrate new information and make self-correction when needed.

Conclusions: In this study, GPT-4 has demonstrated notably high accuracy, as well as rapid reading and learning capabilities. These results are consistent with previous research indicating GPT-4's significant potential to assist in clinical decision making. Furthermore, the study highlights the essential role of physicians' critical thinking and lifelong learning skills, particularly evident through the analysis of GPT-4's incorrect responses. This emphasizes the indispensable human element in effectively implementing and using AI technologies in medical settings.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.90
自引率
6.90%
发文量
168
审稿时长
4-8 weeks
期刊介绍: Published since 1988, the Journal of the American Board of Family Medicine ( JABFM ) is the official peer-reviewed journal of the American Board of Family Medicine (ABFM). Believing that the public and scientific communities are best served by open access to information, JABFM makes its articles available free of charge and without registration at www.jabfm.org. JABFM is indexed by Medline, Index Medicus, and other services.
期刊最新文献
Answering the "100 Most Important Family Medicine Research Questions" from the 1985 Hames Consortium. CERA: A Vehicle for Facilitating Research in Family Medicine. Current and Future Challenges to Publishing Family Medicine Research. Diversity in Family Medicine Research. Leveraging the All of Us Database for Primary Care Research with Large Datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1