Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis.

IF 3.2 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE International dental journal Pub Date : 2024-11-11 DOI:10.1016/j.identj.2024.10.014
Mingxin Liu, Tsuyoshi Okuhara, Wenbo Huang, Atsushi Ogihara, Hikari Sophia Nagao, Hiroko Okada, Takahiro Kiuchi
{"title":"Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis.","authors":"Mingxin Liu, Tsuyoshi Okuhara, Wenbo Huang, Atsushi Ogihara, Hikari Sophia Nagao, Hiroko Okada, Takahiro Kiuchi","doi":"10.1016/j.identj.2024.10.014","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction and aims: </strong>This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics.</p><p><strong>Methods: </strong>Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs.</p><p><strong>Results: </strong>Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4's performance, however, remained consistent across different regions.</p><p><strong>Conclusion: </strong>LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs' accuracy. The reliance on image-based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis.</p>","PeriodicalId":13785,"journal":{"name":"International dental journal","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International dental journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.identj.2024.10.014","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction and aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics.

Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs.

Results: Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4's performance, however, remained consistent across different regions.

Conclusion: LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs' accuracy. The reliance on image-based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
牙医执照考试中的大语言模型:系统回顾和元分析。
引言和目的:本研究系统回顾并进行了荟萃分析,以评估各种大语言模型(LLMs)在全球牙医执照考试中的表现。目的是评估这些模型在不同语言和地理环境下的准确性。这将为它们在牙科教育和诊断中的潜在应用提供信息:根据《系统综述和元分析首选报告项目》指南,我们在 PubMed、Web of Science 和 Scopus 上对 2022 年 1 月 1 日至 2024 年 5 月 1 日期间发表的研究进行了全面检索。两位作者根据纳入和排除标准独立审阅了文献,提取了数据,并根据诊断准确性研究质量评估-2对研究质量进行了评估。我们进行了定性和定量分析,以评估 LLM 的性能:有 11 项研究符合纳入标准,涵盖了 8 个国家的牙医执照考试。GPT-3.5、GPT-4 和 Bard 的综合准确率分别为 54%、72% 和 56%。GPT-4 的表现优于 GPT-3.5 和 Bard,通过了一半以上的牙医执照考试。分组分析和元回归显示,GPT-3.5 在英语国家的表现明显更好。然而,GPT-4在不同地区的表现保持一致:结论:LLMs,尤其是 GPT-4,在牙科教育和诊断方面显示出潜力,但其准确性仍低于临床应用所需的阈值。牙科缺乏足够的训练数据影响了 LLMs 的准确性。对基于图像的诊断的依赖也带来了挑战。因此,与医学执照考试相比,其在牙科考试中的准确性较低。此外,法律硕士对错误答案的解释甚至比对正确答案的解释更详细。总体而言,目前的 LLM 尚不适合用于牙科教育和临床诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International dental journal
International dental journal 医学-牙科与口腔外科
CiteScore
4.80
自引率
6.10%
发文量
159
审稿时长
63 days
期刊介绍: The International Dental Journal features peer-reviewed, scientific articles relevant to international oral health issues, as well as practical, informative articles aimed at clinicians.
期刊最新文献
Examining the Impact of Natural Teeth Trajectory on Mortality Among CLHLS. Exploring the Nonlinear Relationship Between Dietary Flavonoid Intake and Periodontitis. Identification of Potential Biomarkers and Therapeutic Targets for Periodontitis. Influence of Implant Surfaces on Peri-Implant Diseases - A Systematic Review and Meta-Analysis. Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1