{"title":"Performance of large language models in the National Dental Licensing Examination in China: a comparative analysis of ChatGPT, GPT-4, and New Bing.","authors":"Ziyang Hu, Zhe Xu, Ping Shi, Dandan Zhang, Qu Yue, Jiexia Zhang, Xin Lei, Zitong Lin","doi":"10.3290/j.ijcd.b5870240","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>The objective of the present study was to investigate the clinical understanding and reasoning abilities of large language models (LLMs); namely, ChatGPT, GPT-4, and New Bing, by evaluating their performance in the NDLE (National Dental Licensing Examination) in China.</p><p><strong>Materials and methods: </strong>Questions from the NDLE from 2020 to 2022 were selected based on subject weightings. Standardized prompts were utilized to regulate the output of LLMs for acquiring more precise answers. The performance of each model across each subject category and for the subjects overall was analyzed employing the McNemar's test.</p><p><strong>Results: </strong>The percentage scores obtained by ChatGPT, GPT-4, and New Bing were 42.6% (138/324), 63.0% (204/324), and 72.5% (235/324), respectively. Significant variance was seen between the performance of New Bing compared with ChatGPT and GPT-4. GPT-4 and New Bing outperformed ChatGPT across all subjects, with New Bing surpassing GPT-4 in most subjects.</p><p><strong>Conclusion: </strong>GPT-4 and New Bing exhibited promising capabilities in the NDLE. However, their performance in specific subjects such as prosthodontics and oral and maxillofacial surgery requires improvement. This performance gap can be attributed to limited dental training data and the inherent complexity of these subjects.</p>","PeriodicalId":48666,"journal":{"name":"International Journal of Computerized Dentistry","volume":"27 4","pages":"401-411"},"PeriodicalIF":1.8000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computerized Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3290/j.ijcd.b5870240","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: The objective of the present study was to investigate the clinical understanding and reasoning abilities of large language models (LLMs); namely, ChatGPT, GPT-4, and New Bing, by evaluating their performance in the NDLE (National Dental Licensing Examination) in China.
Materials and methods: Questions from the NDLE from 2020 to 2022 were selected based on subject weightings. Standardized prompts were utilized to regulate the output of LLMs for acquiring more precise answers. The performance of each model across each subject category and for the subjects overall was analyzed employing the McNemar's test.
Results: The percentage scores obtained by ChatGPT, GPT-4, and New Bing were 42.6% (138/324), 63.0% (204/324), and 72.5% (235/324), respectively. Significant variance was seen between the performance of New Bing compared with ChatGPT and GPT-4. GPT-4 and New Bing outperformed ChatGPT across all subjects, with New Bing surpassing GPT-4 in most subjects.
Conclusion: GPT-4 and New Bing exhibited promising capabilities in the NDLE. However, their performance in specific subjects such as prosthodontics and oral and maxillofacial surgery requires improvement. This performance gap can be attributed to limited dental training data and the inherent complexity of these subjects.
期刊介绍:
This journal explores the myriad innovations in the emerging field of computerized dentistry and how to integrate them into clinical practice. The bulk of the journal is devoted to the science of computer-assisted dentistry, with research articles and clinical reports on all aspects of computer-based diagnostic and therapeutic applications, with special emphasis placed on CAD/CAM and image-processing systems. Articles also address the use of computer-based communication to support patient care, assess the quality of care, and enhance clinical decision making. The journal is presented in a bilingual format, with each issue offering three types of articles: science-based, application-based, and national society reports.