生成人工智能响应常见的以患者为中心的手和手腕手术问题:质量和可用性分析。

Benjamin Pautler, Charles Marchese, Makayla Swancutt, Bryan G Beutel
{"title":"生成人工智能响应常见的以患者为中心的手和手腕手术问题:质量和可用性分析。","authors":"Benjamin Pautler, Charles Marchese, Makayla Swancutt, Bryan G Beutel","doi":"10.1142/S2424835525500171","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Due to the rapid evolution of generative artificial intelligence (AI) and its implications on patient education, there is a pressing need to evaluate AI responses to patients' medical questions. This study assessed the quality and usability of responses received from two prominent AI platforms to common patient-centric hand and wrist surgery questions. <b>Methods:</b> Twelve commonly encountered hand and wrist surgery patient questions were inputted twice into both Gemini and ChatGPT, generating 48 responses. Each response underwent a content analysis, followed by assessment for quality and usability with three scoring tools: DISCERN, Suitability Assessment of Materials (SAM) and the AI Response Metric (AIRM). Statistical analyses compared the features and scores of the outputs when stratified by platform, question type and response order. <b>Results:</b> Responses earned mean overall scores of 55.7 ('good'), 57.2% ('adequate') and 4.4 for DISCERN, SAM and AIRM, respectively. No responses provided citations. Wrist question responses had significantly higher DISCERN (<i>p</i> < 0.01) and AIRM (<i>p</i> = 0.02) scores compared to hand responses. Second responses had significantly higher AIRM (<i>p</i> < 0.01), but similar DISCERN (<i>p</i> = 0.76) and SAM (<i>p</i> = 0.11), scores compared to the first responses. Gemini's DISCERN (<i>p</i> = 0.04) and SAM (<i>p</i> < 0.01) scores were significantly higher than ChatGPT's corresponding metrics. <b>Conclusions:</b> Although responses are generally 'good' and 'adequate', there is variable quality with respect to platform used, type of question and response order. Given the diversity of publicly available AI platforms, it is important to understand the quality and usability of information patients may encounter during their search for answers to common hand and wrist surgery questions. <b>Level of Evidence:</b> Level IV (Therapeutic).</p>","PeriodicalId":51689,"journal":{"name":"Journal of Hand Surgery-Asian-Pacific Volume","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generative Artificial Intelligence Responses to Common Patient-Centric Hand and Wrist Surgery Questions: A Quality and Usability Analysis.\",\"authors\":\"Benjamin Pautler, Charles Marchese, Makayla Swancutt, Bryan G Beutel\",\"doi\":\"10.1142/S2424835525500171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> Due to the rapid evolution of generative artificial intelligence (AI) and its implications on patient education, there is a pressing need to evaluate AI responses to patients' medical questions. This study assessed the quality and usability of responses received from two prominent AI platforms to common patient-centric hand and wrist surgery questions. <b>Methods:</b> Twelve commonly encountered hand and wrist surgery patient questions were inputted twice into both Gemini and ChatGPT, generating 48 responses. Each response underwent a content analysis, followed by assessment for quality and usability with three scoring tools: DISCERN, Suitability Assessment of Materials (SAM) and the AI Response Metric (AIRM). Statistical analyses compared the features and scores of the outputs when stratified by platform, question type and response order. <b>Results:</b> Responses earned mean overall scores of 55.7 ('good'), 57.2% ('adequate') and 4.4 for DISCERN, SAM and AIRM, respectively. No responses provided citations. Wrist question responses had significantly higher DISCERN (<i>p</i> < 0.01) and AIRM (<i>p</i> = 0.02) scores compared to hand responses. Second responses had significantly higher AIRM (<i>p</i> < 0.01), but similar DISCERN (<i>p</i> = 0.76) and SAM (<i>p</i> = 0.11), scores compared to the first responses. Gemini's DISCERN (<i>p</i> = 0.04) and SAM (<i>p</i> < 0.01) scores were significantly higher than ChatGPT's corresponding metrics. <b>Conclusions:</b> Although responses are generally 'good' and 'adequate', there is variable quality with respect to platform used, type of question and response order. Given the diversity of publicly available AI platforms, it is important to understand the quality and usability of information patients may encounter during their search for answers to common hand and wrist surgery questions. <b>Level of Evidence:</b> Level IV (Therapeutic).</p>\",\"PeriodicalId\":51689,\"journal\":{\"name\":\"Journal of Hand Surgery-Asian-Pacific Volume\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hand Surgery-Asian-Pacific Volume\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S2424835525500171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hand Surgery-Asian-Pacific Volume","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S2424835525500171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

背景:由于生成式人工智能(AI)的快速发展及其对患者教育的影响,迫切需要评估AI对患者医疗问题的反应。本研究评估了从两个著名的人工智能平台收到的以患者为中心的常见手部和手腕手术问题的回答的质量和可用性。方法:在Gemini和ChatGPT中两次输入12个手部和手腕手术患者常见的问题,得到48个回复。每个回复都进行了内容分析,然后使用三种评分工具进行质量和可用性评估:DISCERN,材料适用性评估(SAM)和人工智能响应度量(AIRM)。统计分析比较了按平台、问题类型和回答顺序分层的输出的特征和分数。结果:回答的平均总分分别为55.7分(“好”),57.2%(“足够”)和4.4分,分别为DISCERN, SAM和AIRM。没有答复提供引文。腕部问题回答的DISCERN (p < 0.01)和AIRM (p = 0.02)得分显著高于手部回答。与第一反应相比,第二反应的AIRM得分显著高于第一反应(p < 0.01),而DISCERN (p = 0.76)和SAM (p = 0.11)得分相似。Gemini的DISCERN (p = 0.04)和SAM (p < 0.01)得分显著高于ChatGPT的相应指标。结论:虽然回答通常是“好”和“适当”的,但就使用的平台、问题类型和回答顺序而言,质量是可变的。鉴于公开可用的人工智能平台的多样性,了解患者在寻找常见手部和手腕手术问题的答案时可能遇到的信息的质量和可用性非常重要。证据等级:IV级(治疗性)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generative Artificial Intelligence Responses to Common Patient-Centric Hand and Wrist Surgery Questions: A Quality and Usability Analysis.

Background: Due to the rapid evolution of generative artificial intelligence (AI) and its implications on patient education, there is a pressing need to evaluate AI responses to patients' medical questions. This study assessed the quality and usability of responses received from two prominent AI platforms to common patient-centric hand and wrist surgery questions. Methods: Twelve commonly encountered hand and wrist surgery patient questions were inputted twice into both Gemini and ChatGPT, generating 48 responses. Each response underwent a content analysis, followed by assessment for quality and usability with three scoring tools: DISCERN, Suitability Assessment of Materials (SAM) and the AI Response Metric (AIRM). Statistical analyses compared the features and scores of the outputs when stratified by platform, question type and response order. Results: Responses earned mean overall scores of 55.7 ('good'), 57.2% ('adequate') and 4.4 for DISCERN, SAM and AIRM, respectively. No responses provided citations. Wrist question responses had significantly higher DISCERN (p < 0.01) and AIRM (p = 0.02) scores compared to hand responses. Second responses had significantly higher AIRM (p < 0.01), but similar DISCERN (p = 0.76) and SAM (p = 0.11), scores compared to the first responses. Gemini's DISCERN (p = 0.04) and SAM (p < 0.01) scores were significantly higher than ChatGPT's corresponding metrics. Conclusions: Although responses are generally 'good' and 'adequate', there is variable quality with respect to platform used, type of question and response order. Given the diversity of publicly available AI platforms, it is important to understand the quality and usability of information patients may encounter during their search for answers to common hand and wrist surgery questions. Level of Evidence: Level IV (Therapeutic).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
304
期刊最新文献
Palmar Z-Osteotomy for Distal Radius Fractures. Pinning the Thumb Carpometacarpal Joint in Abduction Can Reduce Skin Irritation from Suzuki Frame Treatment for Metacarpophalangeal Joint Fractures. Risk Factors Associated with Collapse of Distal Radius Fractures after Volar Locking Plate Fixation in Older Adults. Sensory Nerve Transfer for Intractable Neuropathic Pain in a Case of C8-T1 Root Avulsion in Brachial Plexus Injury. Silicone Locking-Liner Socket with a Lightweight Aesthetic Prosthesis for Short Congenital Forearm Stumps: A Report of Two Patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1