Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients’ frequently asked questions in prosthodontics

IF 4.8 2区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE Journal of Prosthetic Dentistry Pub Date : 2025-04-07 DOI:10.1016/j.prosdent.2025.03.009
Maryam Gheisarifar DDS, MS , Marwa Shembesh BDS, MSc, DScD , Merve Koseoglu DDS , Qiao Fang DDS, MSD , Fatemeh Solmaz Afshari DMD, MS , Judy Chia-Chun Yuan DDS, MS MAS , Cortino Sukotjo DDS, PhD, MMSc, MHPE
{"title":"Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients’ frequently asked questions in prosthodontics","authors":"Maryam Gheisarifar DDS, MS ,&nbsp;Marwa Shembesh BDS, MSc, DScD ,&nbsp;Merve Koseoglu DDS ,&nbsp;Qiao Fang DDS, MSD ,&nbsp;Fatemeh Solmaz Afshari DMD, MS ,&nbsp;Judy Chia-Chun Yuan DDS, MS MAS ,&nbsp;Cortino Sukotjo DDS, PhD, MMSc, MHPE","doi":"10.1016/j.prosdent.2025.03.009","DOIUrl":null,"url":null,"abstract":"<div><h3>Statement of problem</h3><div>Healthcare-related information provided by artificial intelligence (AI) chatbots may pose challenges such as inaccuracies, lack of empathy, biases, over-reliance, limited scope, and ethical concerns.</div></div><div><h3>Purpose</h3><div>The purpose of this study was to evaluate and compare the validity and consistency of responses to prosthodontics-related frequently asked questions (FAQ) generated by 4 different chatbot systems.</div></div><div><h3>Material and methods</h3><div>Four prosthodontics domains were evaluated: implant, fixed prosthodontics, complete denture (CD), and removable partial denture (RPD). Within each domain, 10 questions were prepared by full-time prosthodontic faculty members, and 10 questions were generated by GPT-3.5, representing its top frequently asked questions in each domain. The validity and consistency of responses provided by 4 chatbots: GPT-3.5, GPT-4, Gemini, and Bing were evaluated. The chi-squared test with the Yates correction was used to compare the validity of responses between different chatbots (α=.05). The Cronbach alpha was calculated for 3 sets of responses collected in the morning, afternoon, and evening to evaluate the consistency of the responses.</div></div><div><h3>Results</h3><div>According to the low threshold validity test, the chatbots’ answers to ChatGPT’s implant-related, ChatGPT’s RPD-related, and prosthodontists’ CD-related FAQs were statistically different (<em>P</em>&lt;.001, <em>P</em>&lt;.001, and <em>P=.</em>004, respectively), with Bing being the lowest. At the high threshold validity test, the chatbots’ answers to ChatGPT’s implant-related and RPD-related FAQs and ChatGPT’s and prosthodontists’ fixed prosthetics-related and CD-related FAQs were statistically different (<em>P</em>&lt;.001, <em>P</em>&lt;.001, <em>P</em>=.004, <em>P</em>=.002, and <em>P</em>=.003, respectively), with Bing being the lowest. Overall, all 4 chatbots demonstrated lower validity at the high threshold than the low threshold. Bing, Gemini, and ChatGPT-4 chatbots displayed an acceptable level of consistency, while ChatGPT-3.5 did not.</div></div><div><h3>Conclusions</h3><div>Currently, AI chatbots show limitations in delivering answers to patients’ prosthodontic-related FAQs with high validity and consistency.</div></div>","PeriodicalId":16866,"journal":{"name":"Journal of Prosthetic Dentistry","volume":"134 1","pages":"Pages 199-206"},"PeriodicalIF":4.8000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthetic Dentistry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022391325002434","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Statement of problem

Healthcare-related information provided by artificial intelligence (AI) chatbots may pose challenges such as inaccuracies, lack of empathy, biases, over-reliance, limited scope, and ethical concerns.

Purpose

The purpose of this study was to evaluate and compare the validity and consistency of responses to prosthodontics-related frequently asked questions (FAQ) generated by 4 different chatbot systems.

Material and methods

Four prosthodontics domains were evaluated: implant, fixed prosthodontics, complete denture (CD), and removable partial denture (RPD). Within each domain, 10 questions were prepared by full-time prosthodontic faculty members, and 10 questions were generated by GPT-3.5, representing its top frequently asked questions in each domain. The validity and consistency of responses provided by 4 chatbots: GPT-3.5, GPT-4, Gemini, and Bing were evaluated. The chi-squared test with the Yates correction was used to compare the validity of responses between different chatbots (α=.05). The Cronbach alpha was calculated for 3 sets of responses collected in the morning, afternoon, and evening to evaluate the consistency of the responses.

Results

According to the low threshold validity test, the chatbots’ answers to ChatGPT’s implant-related, ChatGPT’s RPD-related, and prosthodontists’ CD-related FAQs were statistically different (P<.001, P<.001, and P=.004, respectively), with Bing being the lowest. At the high threshold validity test, the chatbots’ answers to ChatGPT’s implant-related and RPD-related FAQs and ChatGPT’s and prosthodontists’ fixed prosthetics-related and CD-related FAQs were statistically different (P<.001, P<.001, P=.004, P=.002, and P=.003, respectively), with Bing being the lowest. Overall, all 4 chatbots demonstrated lower validity at the high threshold than the low threshold. Bing, Gemini, and ChatGPT-4 chatbots displayed an acceptable level of consistency, while ChatGPT-3.5 did not.

Conclusions

Currently, AI chatbots show limitations in delivering answers to patients’ prosthodontic-related FAQs with high validity and consistency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估人工智能聊天机器人回答患者口腔修复常见问题的有效性和一致性。
问题陈述:人工智能(AI)聊天机器人提供的医疗保健相关信息可能会带来一些挑战,如不准确、缺乏同理心、偏见、过度依赖、范围有限和道德问题。目的:本研究的目的是评估和比较4种不同的聊天机器人系统对修复相关常见问题(FAQ)的回答的有效性和一致性。材料和方法:研究了种植、固定、全口义齿和可摘局部义齿四个修复领域。在每个领域中,10个问题由专职修复学教师准备,10个问题由GPT-3.5生成,代表了每个领域中最常见的问题。对GPT-3.5、GPT-4、Gemini和Bing 4个聊天机器人提供的回答的有效性和一致性进行了评估。采用Yates校正的卡方检验比较不同聊天机器人之间的应答效度(α= 0.05)。对上午、下午和晚上收集的3组回复进行Cronbach alpha计算,以评估回复的一致性。结果:根据低阈值效度检验,聊天机器人对ChatGPT的种植体相关、ChatGPT的rpd相关和修复师cd相关的常见问题的回答有统计学差异(p)结论:目前,AI聊天机器人对患者修复相关的高效度和一致性的常见问题的回答存在局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Prosthetic Dentistry
Journal of Prosthetic Dentistry 医学-牙科与口腔外科
CiteScore
7.00
自引率
13.00%
发文量
599
审稿时长
69 days
期刊介绍: The Journal of Prosthetic Dentistry is the leading professional journal devoted exclusively to prosthetic and restorative dentistry. The Journal is the official publication for 24 leading U.S. international prosthodontic organizations. The monthly publication features timely, original peer-reviewed articles on the newest techniques, dental materials, and research findings. The Journal serves prosthodontists and dentists in advanced practice, and features color photos that illustrate many step-by-step procedures. The Journal of Prosthetic Dentistry is included in Index Medicus and CINAHL.
期刊最新文献
A complete-mouth rehabilitation through the resin stamp technique by using a complete digital workflow: Intraoral scanning under complete isolation combined with a printed index guide. Impact of possible bruxism on mandibular structure through fractal analysis and radiomorphometric indices in panoramic radiographs: A cross-sectional study. Fabrication of facially guided computer-aided design and computer-aided manufacturing complete dentures incorporating dynamic occlusion management with mandibular trackers: A dental technique. Complete arch accuracy of six intraoral scanning devices: A clinical study. Bonding performance of surface treatments for luting semidirect composite resin restorations: An in vitro study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1