Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery.

Aesthetic surgery journal. Open forum Pub Date : 2024-08-13 eCollection Date: 2024-01-01 DOI:10.1093/asjof/ojae058
Jad Abi-Rafeh, Brian Bassiri-Tehrani, Roy Kazan, Steven A Hanna, Jonathan Kanevsky, Foad Nahai
{"title":"Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery.","authors":"Jad Abi-Rafeh, Brian Bassiri-Tehrani, Roy Kazan, Steven A Hanna, Jonathan Kanevsky, Foad Nahai","doi":"10.1093/asjof/ojae058","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery.</p><p><strong>Objectives: </strong>The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application.</p><p><strong>Methods: </strong>Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared.</p><p><strong>Results: </strong>ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein.</p><p><strong>Conclusions: </strong>ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice.</p><p><strong>Level of evidence 5: </strong></p>","PeriodicalId":72118,"journal":{"name":"Aesthetic surgery journal. Open forum","volume":"6 ","pages":"ojae058"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371156/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aesthetic surgery journal. Open forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/asjof/ojae058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery.

Objectives: The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application.

Methods: Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared.

Results: ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein.

Conclusions: ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice.

Level of evidence 5:

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
当前患者可访问的人工智能大语言模型在面部美容手术患者术前教育中的性能比较。
背景:人工智能大型语言模型(LLMs)是美容外科患者指导和教育的重要资源:本研究直接比较了 OpenAI 的 ChatGPT(加利福尼亚州旧金山)与谷歌的 Bard(加利福尼亚州山景城)在这一与患者相关的临床应用中的表现:方法:从对拉皮、鼻整形和提眉术感兴趣的模拟患者的角度出发,生成标准化问题并向 ChatGPT 和 Bard 提问。问题涵盖了与术前患者教育过程相关的所有要素,包括询问针对患者报告的美学问题的适当手术;手术候选资格和手术适应症;手术安全性和风险;手术信息、步骤和技术;患者评估;手术准备;恢复和术后指导;手术费用和外科医生建议。随后对回复进行了客观评估,并对两个 LLM 的性能指标进行了比较:结果:ChatGPT 在所有问题类别、评估标准和检查程序方面的得分均为 8.1/10,而 Bard 的得分为 7.4/10。ChatGPT 的总体信息准确性为 6.7/10 ± 3.5,而 Bard 为 6.5/10 ± 2.3;全面性为 6.6/10 ± 3.5 vs 6.3/10 ± 2.6;客观性为 8.2/10 ± 1.0 vs 7.2/10 ± 0.8;安全性为 8.2/10 ± 1.0 vs 7.2/10 ± 0.8。2/10 ± 0.8,安全性为 8.8/10 ± 0.4 vs 7.8/10 ± 0.7,沟通清晰度为 9.3/10 ± 0.6 vs 8.5/10 ± 0.3,承认局限性为 8.9/10 ± 0.2 vs 8.1/10 ± 0.5。本文详细介绍了所有 8 个标准化问题类别、6 个评估标准和 3 个面部美容手术程序的表现:结论:ChatGPT 在所有评估类别中的表现都优于 Bard,所提供的回答更加准确、全面、客观、安全和清晰。Bard 的响应时间明显快于 ChatGPT,但随着机器学习能力的提高,ChatGPT 的响应时间有了显著改善,而 Bard 则没有。虽然目前的研究结果代表了这一快速发展的技术的一个缩影,但这两种模型的不完美表现表明,在美容外科实践中推荐使用这些模型之前,需要进一步开发、改进,并对与患者共享的信息进行循证鉴定:
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
4 weeks
期刊最新文献
Correction to: Commentary on: The Role of Nasal Fat Preservation in Upper Lid Surgery and Assessment With the FACE-Q Questionnaire: Innovations in Upper Blepharoplasty. A Systematic Review and Meta-Analysis of Autologous vs Irradiated Homologous Costal Cartilage Grafts for Dorsal Augmentation Rhinoplasty. Commentary on: The Gargano Yin Yang Breast Reduction Technique: How to Obtain Better Breast Shape, Volume Distribution, and Size With Long-Lasting Results. Bibliometric Analysis of the Highest Cited Cosmetic Upper Facial Plastic Surgery Articles Over 50 Years. Might Topical Heparin Help With Occlusion Emergencies After Accidental Intra-Arterial Hyaluronic Acid Injections?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1