Neha Garg, Daniel J Campbell, Angela Yang, Adam McCann, Annie E Moroco, Leonard E Estephan, William J Palmer, Howard Krein, Ryan Heffelfinger
{"title":"聊天机器人作为面部整形美容手术的患者教育资源:对 ChatGPT 和 Google Bard 响应的评估。","authors":"Neha Garg, Daniel J Campbell, Angela Yang, Adam McCann, Annie E Moroco, Leonard E Estephan, William J Palmer, Howard Krein, Ryan Heffelfinger","doi":"10.1089/fpsam.2023.0368","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> ChatGPT and Google Bard™ are popular artificial intelligence chatbots with utility for patients, including those undergoing aesthetic facial plastic surgery. <b>Objective:</b> To compare the accuracy and readability of chatbot-generated responses to patient education questions regarding aesthetic facial plastic surgery using a response accuracy scale and readability testing. <b>Method:</b> ChatGPT and Google Bard™ were asked 28 identical questions using four prompts: none, patient friendly, eighth-grade level, and references. Accuracy was assessed using Global Quality Scale (range: 1-5). Flesch-Kincaid grade level was calculated, and chatbot-provided references were analyzed for veracity. <b>Results:</b> Although 59.8% of responses were good quality (Global Quality Scale ≥4), ChatGPT generated more accurate responses than Google Bard™ on patient-friendly prompting (<i>p</i> < 0.001). Google Bard™ responses were of a significantly lower grade level than ChatGPT for all prompts (<i>p</i> < 0.05). Despite eighth-grade prompting, response grade level for both chatbots was high: ChatGPT (10.5 ± 1.8) and Google Bard™ (9.6 ± 1.3). Prompting for references yielded 108/108 of chatbot-generated references. Forty-one (38.0%) citations were legitimate. Twenty (18.5%) provided accurately reported information from the reference. <b>Conclusion:</b> Although ChatGPT produced more accurate responses and at a higher education level than Google Bard™, both chatbots provided responses above recommended grade levels for patients and failed to provide accurate references.</p>","PeriodicalId":48487,"journal":{"name":"Facial Plastic Surgery & Aesthetic Medicine","volume":" ","pages":"665-673"},"PeriodicalIF":1.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses.\",\"authors\":\"Neha Garg, Daniel J Campbell, Angela Yang, Adam McCann, Annie E Moroco, Leonard E Estephan, William J Palmer, Howard Krein, Ryan Heffelfinger\",\"doi\":\"10.1089/fpsam.2023.0368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> ChatGPT and Google Bard™ are popular artificial intelligence chatbots with utility for patients, including those undergoing aesthetic facial plastic surgery. <b>Objective:</b> To compare the accuracy and readability of chatbot-generated responses to patient education questions regarding aesthetic facial plastic surgery using a response accuracy scale and readability testing. <b>Method:</b> ChatGPT and Google Bard™ were asked 28 identical questions using four prompts: none, patient friendly, eighth-grade level, and references. Accuracy was assessed using Global Quality Scale (range: 1-5). Flesch-Kincaid grade level was calculated, and chatbot-provided references were analyzed for veracity. <b>Results:</b> Although 59.8% of responses were good quality (Global Quality Scale ≥4), ChatGPT generated more accurate responses than Google Bard™ on patient-friendly prompting (<i>p</i> < 0.001). Google Bard™ responses were of a significantly lower grade level than ChatGPT for all prompts (<i>p</i> < 0.05). Despite eighth-grade prompting, response grade level for both chatbots was high: ChatGPT (10.5 ± 1.8) and Google Bard™ (9.6 ± 1.3). Prompting for references yielded 108/108 of chatbot-generated references. Forty-one (38.0%) citations were legitimate. Twenty (18.5%) provided accurately reported information from the reference. <b>Conclusion:</b> Although ChatGPT produced more accurate responses and at a higher education level than Google Bard™, both chatbots provided responses above recommended grade levels for patients and failed to provide accurate references.</p>\",\"PeriodicalId\":48487,\"journal\":{\"name\":\"Facial Plastic Surgery & Aesthetic Medicine\",\"volume\":\" \",\"pages\":\"665-673\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Facial Plastic Surgery & Aesthetic Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/fpsam.2023.0368\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/1 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Facial Plastic Surgery & Aesthetic Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/fpsam.2023.0368","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/1 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
摘要
背景介绍ChatGPT 和 Google Bard™ 是广受欢迎的人工智能聊天机器人,对患者(包括接受面部美容整形手术的患者)很有用。目的使用回复准确性量表和可读性测试比较聊天机器人生成的有关面部美容整形手术的患者教育问题回复的准确性和可读性。方法使用四种提示向 ChatGPT 和 Google Bard™ 提出了 28 个相同的问题:无、患者友好、八年级水平和参考文献。准确性采用全球质量量表(范围:1-5)进行评估。计算了 Flesch-Kincaid 分级,并分析了聊天机器人提供的参考资料的真实性。结果虽然 59.8% 的回复质量良好(全局质量量表≥4),但在患者友好提示方面,ChatGPT 生成的回复比 Google Bard™ 更准确(p < 0.001)。在所有提示中,Google Bard™ 的回答水平明显低于 ChatGPT(p < 0.05)。尽管有八年级的提示,但两个聊天机器人的回复等级都很高:ChatGPT (10.5 ± 1.8) 和 Google Bard™ (9.6 ± 1.3)。在聊天机器人生成的参考文献中,提示参考文献的比例为 108/108。41条(38.0%)引用是合法的。20条(18.5%)提供了准确的参考文献信息。结论:虽然 ChatGPT 比 Google Bard™ 生成的回复更准确,教育程度也更高,但这两个聊天机器人提供的回复都高于建议的患者年级水平,并且未能提供准确的参考文献。
Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses.
Background: ChatGPT and Google Bard™ are popular artificial intelligence chatbots with utility for patients, including those undergoing aesthetic facial plastic surgery. Objective: To compare the accuracy and readability of chatbot-generated responses to patient education questions regarding aesthetic facial plastic surgery using a response accuracy scale and readability testing. Method: ChatGPT and Google Bard™ were asked 28 identical questions using four prompts: none, patient friendly, eighth-grade level, and references. Accuracy was assessed using Global Quality Scale (range: 1-5). Flesch-Kincaid grade level was calculated, and chatbot-provided references were analyzed for veracity. Results: Although 59.8% of responses were good quality (Global Quality Scale ≥4), ChatGPT generated more accurate responses than Google Bard™ on patient-friendly prompting (p < 0.001). Google Bard™ responses were of a significantly lower grade level than ChatGPT for all prompts (p < 0.05). Despite eighth-grade prompting, response grade level for both chatbots was high: ChatGPT (10.5 ± 1.8) and Google Bard™ (9.6 ± 1.3). Prompting for references yielded 108/108 of chatbot-generated references. Forty-one (38.0%) citations were legitimate. Twenty (18.5%) provided accurately reported information from the reference. Conclusion: Although ChatGPT produced more accurate responses and at a higher education level than Google Bard™, both chatbots provided responses above recommended grade levels for patients and failed to provide accurate references.