大语言模型在韩国牙医执照考试中的表现:比较研究。

IF 3.2 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE International dental journal Pub Date : 2025-02-01 DOI:10.1016/j.identj.2024.09.002
Woojun Kim , Bong Chul Kim , Han-Gyeol Yeom
{"title":"大语言模型在韩国牙医执照考试中的表现:比较研究。","authors":"Woojun Kim ,&nbsp;Bong Chul Kim ,&nbsp;Han-Gyeol Yeom","doi":"10.1016/j.identj.2024.09.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.</div></div><div><h3>Methods</h3><div>This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.</div></div><div><h3>Results</h3><div>Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.</div></div><div><h3>Conclusion</h3><div>Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.</div></div><div><h3>Clinical Relevance</h3><div>Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.</div></div>","PeriodicalId":13785,"journal":{"name":"International dental journal","volume":"75 1","pages":"Pages 176-184"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of Large Language Models on the Korean Dental Licensing Examination: A Comparative Study\",\"authors\":\"Woojun Kim ,&nbsp;Bong Chul Kim ,&nbsp;Han-Gyeol Yeom\",\"doi\":\"10.1016/j.identj.2024.09.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.</div></div><div><h3>Methods</h3><div>This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.</div></div><div><h3>Results</h3><div>Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.</div></div><div><h3>Conclusion</h3><div>Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.</div></div><div><h3>Clinical Relevance</h3><div>Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.</div></div>\",\"PeriodicalId\":13785,\"journal\":{\"name\":\"International dental journal\",\"volume\":\"75 1\",\"pages\":\"Pages 176-184\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International dental journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020653924014928\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International dental journal","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020653924014928","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究调查了大型语言模型(LLMs)在牙科教育和实践中的潜在应用,重点是 ChatGPT 和 Claude3-Opus。我们以韩国牙科执照考试(KDLE)为基准,旨在评估这些模型在牙科领域的能力:本研究评估了三种 LLM:方法:本研究评估了三种 LLM:GPT-3.5、GPT-4(版本:2024 年 3 月)和 Claude3-Opus(版本:2024 年 3 月)。我们将 2019 年至 2023 年的 KDLE 问卷作为 LLM 的输入,然后将 LLM 的输出作为相应的答案。我们获得并比较了单个受试者的总分。我们还将 LLMs 的表现与参加考试的个人的表现进行了比较:在所考虑的 LLMs 中,Claude3-Opus 的表现最好,但在 2019 年,ChatGPT-4 的表现最好。Claude3-Opus 和 ChatGPT-4 在所考虑的所有年份都超过了截止分数;这表明 Claude3-Opus 和 ChatGPT-4 通过了 KDLE,而 ChatGPT-3.5 没有通过。然而,所有被考虑的 LLM 的表现都不如人类,在此以韩国的牙科学生为代表。平均而言,每年表现最好的 LLM 达到了人类表现的 85.4%:结论:以 KDLE 为基准,我们的研究表明,虽然 LLM 的总成绩尚未达到人类水平,但 Claude3-Opus 和 ChatGPT-4 都超过了临界分数,并在特定科目中表现优异:我们的研究结果将有助于评估将 LLM 纳入牙科的可行性,通过提供符合牙医基本能力标准的患者信息,提高牙科服务的质量和可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance of Large Language Models on the Korean Dental Licensing Examination: A Comparative Study

Purpose

This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.

Methods

This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.

Results

Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.

Conclusion

Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.

Clinical Relevance

Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International dental journal
International dental journal 医学-牙科与口腔外科
CiteScore
4.80
自引率
6.10%
发文量
159
审稿时长
63 days
期刊介绍: The International Dental Journal features peer-reviewed, scientific articles relevant to international oral health issues, as well as practical, informative articles aimed at clinicians.
期刊最新文献
Poor Oral Hygiene: A Hidden Risk Factor for Helicobacter pylori Infection. Aerosol Dispersion and Efficacy of Protective Strategies During Dental Procedures. Biomechanical Evaluation of Cantilevered 2-Unit Implant-Supported Prostheses: A 3D Finite Element Study. Correlation Between miR-27a-3p Polymorphisms and Peri-Implantitis Susceptibility: A Case-Control Study. Oral Cancer Knowledge and Screening Practices Among Dental Professionals in Yemen: a Web-Based Survey.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1