{"title":"大语言模型在韩国牙医执照考试中的表现:比较研究。","authors":"Woojun Kim , Bong Chul Kim , Han-Gyeol Yeom","doi":"10.1016/j.identj.2024.09.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.</div></div><div><h3>Methods</h3><div>This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.</div></div><div><h3>Results</h3><div>Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.</div></div><div><h3>Conclusion</h3><div>Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.</div></div><div><h3>Clinical Relevance</h3><div>Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.</div></div>","PeriodicalId":13785,"journal":{"name":"International dental journal","volume":"75 1","pages":"Pages 176-184"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of Large Language Models on the Korean Dental Licensing Examination: A Comparative Study\",\"authors\":\"Woojun Kim , Bong Chul Kim , Han-Gyeol Yeom\",\"doi\":\"10.1016/j.identj.2024.09.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.</div></div><div><h3>Methods</h3><div>This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.</div></div><div><h3>Results</h3><div>Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.</div></div><div><h3>Conclusion</h3><div>Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.</div></div><div><h3>Clinical Relevance</h3><div>Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.</div></div>\",\"PeriodicalId\":13785,\"journal\":{\"name\":\"International dental journal\",\"volume\":\"75 1\",\"pages\":\"Pages 176-184\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International dental journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020653924014928\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International dental journal","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020653924014928","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Performance of Large Language Models on the Korean Dental Licensing Examination: A Comparative Study
Purpose
This study investigated the potential application of large language models (LLMs) in dental education and practice, with a focus on ChatGPT and Claude3-Opus. Using the Korean Dental Licensing Examination (KDLE) as a benchmark, we aimed to assess the capabilities of these models in the dental field.
Methods
This study evaluated three LLMs: GPT-3.5, GPT-4 (version: March 2024), and Claude3-Opus (version: March 2024). We used the KDLE questionnaire from 2019 to 2023 as inputs to the LLMs and then used the outputs from the LLMs as the corresponding answers. The total scores for individual subjects were obtained and compared. We also compared the performance of LLMs with those of individuals who underwent the exams.
Results
Claude3-Opus performed best among the considered LLMs, except in 2019 when ChatGPT-4 performed best. Claude3-Opus and ChatGPT-4 surpassed the cut-off scores in all the years considered; this indicated that Claude3-Opus and ChatGPT-4 passed the KDLE, whereas ChatGPT-3.5 did not. However, all LLMs considered performed worse than humans, represented here by dental students in Korea. On average, the best-performing LLM annually achieved 85.4% of human performance.
Conclusion
Using the KDLE as a benchmark, our study demonstrates that although LLMs have not yet reached human-level performance in overall scores, both Claude3-Opus and ChatGPT-4 exceed the cut-off scores and perform exceptionally well in specific subjects.
Clinical Relevance
Our findings will aid in evaluating the feasibility of integrating LLMs into dentistry to improve the quality and availability of dental services by offering patient information that meets the basic competency standards of a dentist.
期刊介绍:
The International Dental Journal features peer-reviewed, scientific articles relevant to international oral health issues, as well as practical, informative articles aimed at clinicians.