ChatGPT 和 Bard 在肾脏病委员会换届自我评估问题中的表现。

IF 2.2 4区 医学 Q2 UROLOGY & NEPHROLOGY Clinical and Experimental Nephrology Pub Date : 2024-05-01 Epub Date: 2024-02-14 DOI:10.1007/s10157-023-02451-w
Ryunosuke Noda, Yuto Izaki, Fumiya Kitano, Jun Komatsu, Daisuke Ichikawa, Yugo Shibagaki
{"title":"ChatGPT 和 Bard 在肾脏病委员会换届自我评估问题中的表现。","authors":"Ryunosuke Noda, Yuto Izaki, Fumiya Kitano, Jun Komatsu, Daisuke Ichikawa, Yugo Shibagaki","doi":"10.1007/s10157-023-02451-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.</p><p><strong>Methods: </strong>Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.</p><p><strong>Results: </strong>The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents.</p><p><strong>Conclusions: </strong>GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.</p>","PeriodicalId":10349,"journal":{"name":"Clinical and Experimental Nephrology","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.\",\"authors\":\"Ryunosuke Noda, Yuto Izaki, Fumiya Kitano, Jun Komatsu, Daisuke Ichikawa, Yugo Shibagaki\",\"doi\":\"10.1007/s10157-023-02451-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.</p><p><strong>Methods: </strong>Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.</p><p><strong>Results: </strong>The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents.</p><p><strong>Conclusions: </strong>GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.</p>\",\"PeriodicalId\":10349,\"journal\":{\"name\":\"Clinical and Experimental Nephrology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Experimental Nephrology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10157-023-02451-w\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/2/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Experimental Nephrology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10157-023-02451-w","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:大型语言模型(LLMs)影响了人工智能的发展。虽然 LLM 在普通医学检查中表现出了很高的性能,但它们在肾脏病学等专业领域的性能还不清楚。本研究旨在评估 ChatGPT 和 Bard 在肾脏病学领域的潜在应用:来自 2018 年至 2022 年肾脏病学委员会换届自我评估问题的 99 个问题被提交给两个版本的 ChatGPT(GPT-3.5 和 GPT-4)和 Bard。我们计算了五年、每一年和问题类别的正确答案率,并检查它们是否超过及格标准。我们将正确答题率与肾内科住院医师的正确答题率进行了比较:GPT-3.5、GPT-4和Bard的总正确率分别为31.3%(31/99)、54.5%(54/99)和32.3%(32/99),因此GPT-4的成绩明显优于GPT-3.5(P 结论:GPT-3.5、GPT-4和Bard的总正确率分别为31.3%(31/99)、54.5%(54/99)和32.3%(32/99):GPT-4 的表现优于 GPT-3.5 和 Bard,并在特定年份达到了肾脏病学委员会的更新标准,尽管幅度很小。这些结果凸显了 LLM 在肾脏病学领域的潜力和局限性。随着 LLM 的发展,肾病学家应了解其性能,以便将来应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Background: Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.

Methods: Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.

Results: The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents.

Conclusions: GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical and Experimental Nephrology
Clinical and Experimental Nephrology UROLOGY & NEPHROLOGY-
CiteScore
4.10
自引率
4.30%
发文量
135
审稿时长
4-8 weeks
期刊介绍: Clinical and Experimental Nephrology is a peer-reviewed monthly journal, officially published by the Japanese Society of Nephrology (JSN) to provide an international forum for the discussion of research and issues relating to the study of nephrology. Out of respect for the founders of the JSN, the title of this journal uses the term “nephrology,” a word created and brought into use with the establishment of the JSN (Japanese Journal of Nephrology, Vol. 2, No. 1, 1960). The journal publishes articles on all aspects of nephrology, including basic, experimental, and clinical research, so as to share the latest research findings and ideas not only with members of the JSN, but with all researchers who wish to contribute to a better understanding of recent advances in nephrology. The journal is unique in that it introduces to an international readership original reports from Japan and also the clinical standards discussed and agreed by JSN.
期刊最新文献
Clinical outcomes in peritoneal dialysis with refractory peritonitis: significance of the day 5 cell count. Estimating the prevalence of chronic kidney disease in the older population using health screening data in Japan. Predictors of encapsulated peritoneal sclerosis in patients undergoing peritoneal dialysis using neutral-pH dialysate. Utility of ultrasound in measuring quadriceps muscle thickness in patients receiving maintenance hemodialysis: comprehensive systematic review and meta-analysis. Annual change in eGFR in renal hypouricemia: a retrospective pilot study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1