韩国国家牙科保健师考试韩语和英语试题中大语言模型应答准确性的比较分析。

IF 1.6 4区 医学 Q3 DENTISTRY, ORAL SURGERY & MEDICINE International journal of dental hygiene Pub Date : 2024-10-16 DOI:10.1111/idh.12848
Eun Sun Song, Seung-Pyo Lee
{"title":"韩国国家牙科保健师考试韩语和英语试题中大语言模型应答准确性的比较分析。","authors":"Eun Sun Song, Seung-Pyo Lee","doi":"10.1111/idh.12848","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Large language models such as Gemini, GPT-3.5, and GPT-4 have demonstrated significant potential in the medical field. Their performance in medical licensing examinations globally has highlighted their capabilities in understanding and processing specialized medical knowledge. This study aimed to evaluate and compare the performance of Gemini, GPT-3.5, and GPT-4 in the Korean National Dental Hygienist Examination. The accuracy of answering the examination questions in both Korean and English was assessed.</p><p><strong>Methods: </strong>This study used a dataset comprising questions from the Korean National Dental Hygienist Examination over 5 years (2019-2023). A two-way analysis of variance (ANOVA) test was employed to investigate the impacts of model type and language on the accuracy of the responses. Questions were input into each model under standardized conditions, and responses were classified as correct or incorrect based on predefined criteria.</p><p><strong>Results: </strong>GPT-4 consistently outperformed the other models, achieving the highest accuracy rates across both language versions annually. In particular, it showed superior performance in English, suggesting advancements in its training algorithms for language processing. However, all models demonstrated variable accuracies in subjects with localized characteristics, such as health and medical law.</p><p><strong>Conclusions: </strong>These findings indicate that GPT-4 holds significant promise for application in medical education and standardized testing, especially in English. However, the variability in performance across different subjects and languages underscores the need for ongoing improvements and the inclusion of more diverse and localized training datasets to enhance the models' effectiveness in multilingual and multicultural contexts.</p>","PeriodicalId":13791,"journal":{"name":"International journal of dental hygiene","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions.\",\"authors\":\"Eun Sun Song, Seung-Pyo Lee\",\"doi\":\"10.1111/idh.12848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Large language models such as Gemini, GPT-3.5, and GPT-4 have demonstrated significant potential in the medical field. Their performance in medical licensing examinations globally has highlighted their capabilities in understanding and processing specialized medical knowledge. This study aimed to evaluate and compare the performance of Gemini, GPT-3.5, and GPT-4 in the Korean National Dental Hygienist Examination. The accuracy of answering the examination questions in both Korean and English was assessed.</p><p><strong>Methods: </strong>This study used a dataset comprising questions from the Korean National Dental Hygienist Examination over 5 years (2019-2023). A two-way analysis of variance (ANOVA) test was employed to investigate the impacts of model type and language on the accuracy of the responses. Questions were input into each model under standardized conditions, and responses were classified as correct or incorrect based on predefined criteria.</p><p><strong>Results: </strong>GPT-4 consistently outperformed the other models, achieving the highest accuracy rates across both language versions annually. In particular, it showed superior performance in English, suggesting advancements in its training algorithms for language processing. However, all models demonstrated variable accuracies in subjects with localized characteristics, such as health and medical law.</p><p><strong>Conclusions: </strong>These findings indicate that GPT-4 holds significant promise for application in medical education and standardized testing, especially in English. However, the variability in performance across different subjects and languages underscores the need for ongoing improvements and the inclusion of more diverse and localized training datasets to enhance the models' effectiveness in multilingual and multicultural contexts.</p>\",\"PeriodicalId\":13791,\"journal\":{\"name\":\"International journal of dental hygiene\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of dental hygiene\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/idh.12848\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of dental hygiene","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/idh.12848","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

导言:Gemini、GPT-3.5 和 GPT-4 等大型语言模型已在医学领域展现出巨大潜力。它们在全球医疗执照考试中的表现突显了它们在理解和处理专业医学知识方面的能力。本研究旨在评估和比较 Gemini、GPT-3.5 和 GPT-4 在韩国全国牙科保健师考试中的表现。研究还评估了用韩语和英语回答考题的准确性:本研究使用的数据集包括 5 年内(2019-2023 年)韩国国家牙科保健师考试的试题。采用双向方差分析(ANOVA)测试来研究模型类型和语言对答题准确性的影响。在标准化条件下将问题输入每个模型,并根据预定标准将回答分为正确或错误:结果:GPT-4 的表现一直优于其他模型,在每年的两个语言版本中都达到了最高的准确率。特别是,它在英语方面的表现更为出色,这表明其语言处理训练算法取得了进步。不过,所有模型在具有本地化特征的受试者(如卫生和医疗法律)中都表现出了不同的准确率:这些研究结果表明,GPT-4 在医学教育和标准化测试中的应用前景广阔,尤其是在英语方面。然而,在不同科目和语言中的表现差异突出表明,需要不断改进并纳入更多样化和本地化的训练数据集,以提高模型在多语言和多文化背景下的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions.

Introduction: Large language models such as Gemini, GPT-3.5, and GPT-4 have demonstrated significant potential in the medical field. Their performance in medical licensing examinations globally has highlighted their capabilities in understanding and processing specialized medical knowledge. This study aimed to evaluate and compare the performance of Gemini, GPT-3.5, and GPT-4 in the Korean National Dental Hygienist Examination. The accuracy of answering the examination questions in both Korean and English was assessed.

Methods: This study used a dataset comprising questions from the Korean National Dental Hygienist Examination over 5 years (2019-2023). A two-way analysis of variance (ANOVA) test was employed to investigate the impacts of model type and language on the accuracy of the responses. Questions were input into each model under standardized conditions, and responses were classified as correct or incorrect based on predefined criteria.

Results: GPT-4 consistently outperformed the other models, achieving the highest accuracy rates across both language versions annually. In particular, it showed superior performance in English, suggesting advancements in its training algorithms for language processing. However, all models demonstrated variable accuracies in subjects with localized characteristics, such as health and medical law.

Conclusions: These findings indicate that GPT-4 holds significant promise for application in medical education and standardized testing, especially in English. However, the variability in performance across different subjects and languages underscores the need for ongoing improvements and the inclusion of more diverse and localized training datasets to enhance the models' effectiveness in multilingual and multicultural contexts.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International journal of dental hygiene
International journal of dental hygiene DENTISTRY, ORAL SURGERY & MEDICINE-
CiteScore
4.00
自引率
8.30%
发文量
78
审稿时长
>12 weeks
期刊介绍: International Journal of Dental Hygiene is the official scientific peer-reviewed journal of the International Federation of Dental Hygienists (IFDH). The journal brings the latest scientific news, high quality commissioned reviews as well as clinical, professional and educational developmental and legislative news to the profession world-wide. Thus, it acts as a forum for exchange of relevant information and enhancement of the profession with the purpose of promoting oral health for patients and communities. The aim of the International Journal of Dental Hygiene is to provide a forum for exchange of scientific knowledge in the field of oral health and dental hygiene. A further aim is to support and facilitate the application of new knowledge into clinical practice. The journal welcomes original research, reviews and case reports as well as clinical, professional, educational and legislative news to the profession world-wide.
期刊最新文献
Construct Validity of the Orientation to Life Questionnaire in a General Adult Population in Norway and Its Association with Self-Reported General and Oral Health. Dental Hygienists' Readiness to Perform Resin Infiltrations: A Qualitative Study From Finland. Gingivitis Control in Children, Adolescents and Young Adults With Chronic Kidney Disease by a Need-Related Programme: A Randomised Clinical Trial. Sex Differences in Health-Related Quality of Life in Patients With Head and Neck Cancer-A Prospective Study. The Experiences of Implementing a Near-Peer Teaching Scheme Into an Undergraduate Dental Hygiene and Dental Therapy Programme at the University of Sheffield in the United Kingdom.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1