Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

IF 2.5 Q2 DENTISTRY, ORAL SURGERY & MEDICINE BDJ Open Pub Date : 2024-06-12 DOI:10.1038/s41405-024-00226-3

Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer

{"title":"Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.","authors":"Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer","doi":"10.1038/s41405-024-00226-3","DOIUrl":null,"url":null,"abstract":"Objective: This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.Material and methods: An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.Results: The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.Conclusion: In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.","PeriodicalId":36997,"journal":{"name":"BDJ Open","volume":"10 1","pages":"48"},"PeriodicalIF":2.5000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169374/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BDJ Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41405-024-00226-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.

Material and methods: An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.

Results: The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.

Conclusion: In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用大型语言模型提供牙科术后护理：嵌入式 GPT 模型与 ChatGPT 的比较。

研究目的本研究强调了人工智能（AI）在医疗保健领域的变革作用，特别是大语言模型（LLM）在牙科术后护理中的应用前景。目的是评估嵌入式 GPT 模型的性能及其与 ChatGPT-3.5 turbo 的比较。评估的重点是回答的准确性、清晰度、相关性以及在解决患者问题和促进知情决策方面的最新知识：通过 GPT-trainer 制作了一个采用 GPT-3.5-16k 的嵌入式 GPT 模型，用于回答四个牙科专业的术后问题，包括牙科手术和牙髓病学、牙周病学、口腔颌面外科和口腔修复学。所生成的回答由 36 位牙科专家（每个专业 9 位）采用李克特量表进行验证，为嵌入式 GPT 模型的性能及其与 GPT3.5 turbo 的比较提供了全面的见解。在内容验证方面，采用了定量的内容效度指数（CVI）。CVI 同时在项目层面（I-CVI）和量表层面（S-CVI/Ave）进行计算。为了调整 I-CVI 的偶然一致性，计算了修正卡帕统计量（K*）：通过嵌入式 GPT 模型和 ChatGPT 生成的回答的总体内容效度分别为 65.62% 和 61.87%。此外，嵌入式 GPT 模型的准确率为 62.5%，清晰度为 72.5%，表现优于 ChatGPT。相比之下，通过 ChatGPT 生成的回复得分略低，准确率为 52.5%，清晰度为 67.5%。不过，这两种模型在相关性和最新知识方面的表现同样出色：总之，与 ChatGPT 相比，嵌入式 GPT 模型在提供牙科术后护理方面显示出更好的效果，强调了嵌入式和及时工程的好处，为未来医疗保健应用的进步铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊