TrachGPT: Appraisal of tracheostomy care recommendations from an artificial intelligent Chatbot

IF 1.7 4区医学 Q2 OTORHINOLARYNGOLOGY Laryngoscope Investigative Otolaryngology Pub Date : 2024-07-16 DOI:10.1002/lio2.1300

Oluwatobiloba Ayo-Ajibola BS, Ryan J. Davis BS, Matthew E. Lin MD, Neelaysh Vukkadala MD, Karla O'Dell MD, Mark S. Swanson MD, Michael M. Johns III MD, Elizabeth A. Shuman MD

{"title":"TrachGPT: Appraisal of tracheostomy care recommendations from an artificial intelligent Chatbot","authors":"Oluwatobiloba Ayo-Ajibola BS, Ryan J. Davis BS, Matthew E. Lin MD, Neelaysh Vukkadala MD, Karla O'Dell MD, Mark S. Swanson MD, Michael M. Johns III MD, Elizabeth A. Shuman MD","doi":"10.1002/lio2.1300","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>Safe home tracheostomy care requires engagement and troubleshooting by patients, who may turn to online, AI-generated information sources. This study assessed the quality of ChatGPT responses to such queries.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>In this cross-sectional study, ChatGPT was prompted with 10 hypothetical tracheostomy care questions in three domains (complication management, self-care advice, and lifestyle adjustment). Responses were graded by four otolaryngologists for appropriateness, accuracy, and overall score. The readability of responses was evaluated using the Flesch Reading Ease (FRE) and Flesch–Kincaid Reading Grade Level (FKRGL). Descriptive statistics and ANOVA testing were performed with statistical significance set to <i>p</i> < .05.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>On a scale of 1–5, with 5 representing the greatest appropriateness or overall score and a 4-point scale with 4 representing the highest accuracy, the responses exhibited moderately high appropriateness (mean = 4.10, SD = 0.90), high accuracy (mean = 3.55, SD = 0.50), and moderately high overall scores (mean = 4.02, SD = 0.86). Scoring between response categories (self-care recommendations, complication recommendations, lifestyle adjustments, and special device considerations) revealed no significant scoring differences. Suboptimal responses lacked nuance and contained incorrect information and recommendations. Readability indicated college and advanced levels for FRE (Mean = 39.5, SD = 7.17) and FKRGL (Mean = 13.1, SD = 1.47), higher than the sixth-grade level recommended for patient-targeted resources by the NIH.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>While ChatGPT-generated tracheostomy care responses may exhibit acceptable appropriateness, incomplete or misleading information may have dire clinical consequences. Further, inappropriately high reading levels may limit patient comprehension and accessibility. At this point in its technological infancy, AI-generated information should not be solely relied upon as a direct patient care resource.</p>\n </section>\n </div>","PeriodicalId":48529,"journal":{"name":"Laryngoscope Investigative Otolaryngology","volume":"9 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250132/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laryngoscope Investigative Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lio2.1300","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

Safe home tracheostomy care requires engagement and troubleshooting by patients, who may turn to online, AI-generated information sources. This study assessed the quality of ChatGPT responses to such queries.

Methods

In this cross-sectional study, ChatGPT was prompted with 10 hypothetical tracheostomy care questions in three domains (complication management, self-care advice, and lifestyle adjustment). Responses were graded by four otolaryngologists for appropriateness, accuracy, and overall score. The readability of responses was evaluated using the Flesch Reading Ease (FRE) and Flesch–Kincaid Reading Grade Level (FKRGL). Descriptive statistics and ANOVA testing were performed with statistical significance set to p < .05.

Results

On a scale of 1–5, with 5 representing the greatest appropriateness or overall score and a 4-point scale with 4 representing the highest accuracy, the responses exhibited moderately high appropriateness (mean = 4.10, SD = 0.90), high accuracy (mean = 3.55, SD = 0.50), and moderately high overall scores (mean = 4.02, SD = 0.86). Scoring between response categories (self-care recommendations, complication recommendations, lifestyle adjustments, and special device considerations) revealed no significant scoring differences. Suboptimal responses lacked nuance and contained incorrect information and recommendations. Readability indicated college and advanced levels for FRE (Mean = 39.5, SD = 7.17) and FKRGL (Mean = 13.1, SD = 1.47), higher than the sixth-grade level recommended for patient-targeted resources by the NIH.

Conclusion

While ChatGPT-generated tracheostomy care responses may exhibit acceptable appropriateness, incomplete or misleading information may have dire clinical consequences. Further, inappropriately high reading levels may limit patient comprehension and accessibility. At this point in its technological infancy, AI-generated information should not be solely relied upon as a direct patient care resource.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TrachGPT：人工智能聊天机器人对气管造口护理建议的评估。

目的：安全的家庭气管造口护理需要患者的参与和故障排除，患者可能会求助于人工智能生成的在线信息源。本研究评估了 ChatGPT 对此类询问的回复质量：在这项横断面研究中，ChatGPT 在三个领域（并发症管理、自我护理建议和生活方式调整）中提出了 10 个假设性气管造口术护理问题。由四位耳鼻喉科专家对回答的适当性、准确性和总分进行评分。回答的可读性采用 Flesch 阅读容易度 (FRE) 和 Flesch-Kincaid 阅读等级 (FKRGL) 进行评估。进行了描述性统计和方差分析测试，统计显著性设置为 p 结果：按 1-5 分制（5 分代表最合适或总分最高，4 分代表最准确），回答的合适度（平均 = 4.10，标准差 = 0.90）、准确度（平均 = 3.55，标准差 = 0.50）和总分（平均 = 4.02，标准差 = 0.86）均为中等偏上。不同回答类别（自我护理建议、并发症建议、生活方式调整和特殊设备注意事项）之间的评分差异不大。次优回答缺乏细微差别，包含不正确的信息和建议。可读性表明，FRE（平均值 = 39.5，标准差 = 7.17）和 FKRGL（平均值 = 13.1，标准差 = 1.47）达到了大专和高级水平，高于美国国立卫生研究院（NIH）建议的针对患者资源的六年级水平：结论：虽然 ChatGPT 生成的气管造口护理回复可能表现出可接受的适当性，但不完整或误导性信息可能会造成严重的临床后果。此外，过高的阅读水平可能会限制患者的理解能力和使用能力。目前，人工智能技术尚处于起步阶段，不应完全依赖人工智能生成的信息作为直接的患者护理资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊