Jad Abi-Rafeh, Brian Bassiri-Tehrani, Roy Kazan, Steven A Hanna, Jonathan Kanevsky, Foad Nahai
{"title":"当前患者可访问的人工智能大语言模型在面部美容手术患者术前教育中的性能比较。","authors":"Jad Abi-Rafeh, Brian Bassiri-Tehrani, Roy Kazan, Steven A Hanna, Jonathan Kanevsky, Foad Nahai","doi":"10.1093/asjof/ojae058","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery.</p><p><strong>Objectives: </strong>The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application.</p><p><strong>Methods: </strong>Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared.</p><p><strong>Results: </strong>ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein.</p><p><strong>Conclusions: </strong>ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice.</p><p><strong>Level of evidence 5: </strong></p>","PeriodicalId":72118,"journal":{"name":"Aesthetic surgery journal. Open forum","volume":"6 ","pages":"ojae058"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371156/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery.\",\"authors\":\"Jad Abi-Rafeh, Brian Bassiri-Tehrani, Roy Kazan, Steven A Hanna, Jonathan Kanevsky, Foad Nahai\",\"doi\":\"10.1093/asjof/ojae058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery.</p><p><strong>Objectives: </strong>The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application.</p><p><strong>Methods: </strong>Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared.</p><p><strong>Results: </strong>ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein.</p><p><strong>Conclusions: </strong>ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice.</p><p><strong>Level of evidence 5: </strong></p>\",\"PeriodicalId\":72118,\"journal\":{\"name\":\"Aesthetic surgery journal. Open forum\",\"volume\":\"6 \",\"pages\":\"ojae058\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371156/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Aesthetic surgery journal. Open forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/asjof/ojae058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aesthetic surgery journal. Open forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/asjof/ojae058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery.
Background: Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery.
Objectives: The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application.
Methods: Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared.
Results: ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein.
Conclusions: ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice.