High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease.

IF 2.2 3区 医学 Q2 ORTHOPEDICS BMC Musculoskeletal Disorders Pub Date : 2024-11-04 DOI:10.1186/s12891-024-07983-0
Zeynel Mert Asfuroğlu, Hilal Yağar, Ender Gümüşoğlu
{"title":"High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease.","authors":"Zeynel Mert Asfuroğlu, Hilal Yağar, Ender Gümüşoğlu","doi":"10.1186/s12891-024-07983-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aimed to assess the quality and readability of large language model-generated responses to frequently asked questions (FAQs) about Kienböck's disease (KD).</p><p><strong>Methods: </strong>Nineteen FAQs about KD were selected, and the questions were divided into three categories: general knowledge, diagnosis, and treatment. The questions were inputted into the Chat Generative Pre-trained Transformer 4 (ChatGPT4) webpage using the zero-shot prompting method, and the responses were recorded. Hand surgeons with at least 5 years of experience and advanced English proficiency were individually contacted over instant WhatsApp messaging and requested to assess the responses. The quality of each response was analyzed by 33 experienced hand surgeons using the Global Quality Scale (GQS). The readability was assessed with the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES).</p><p><strong>Results: </strong>The mean GQS score was 4.28 out of a maximum of 5 points. Most raters assessed the quality as good (270 of 627 responses; 43.1%) or excellent (260 of 627 responses; 41.5%). The mean FKGL was 15.5, and the mean FRES was 23.4, both of which are considered above the college graduate level. No statistically significant differences were found in the quality and readability of responses provided for questions related to general knowledge, diagnosis, and treatment.</p><p><strong>Conclusions: </strong>ChatGPT-4 provided high-quality responses to FAQs about KD. However, the primary drawback was the poor readability of these responses. By improving the readability of ChatGPT's output, we can transform it into a valuable information resource for individuals with KD.</p><p><strong>Level of evidence: </strong>Level IV, Observational study.</p>","PeriodicalId":9189,"journal":{"name":"BMC Musculoskeletal Disorders","volume":"25 1","pages":"879"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536837/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Musculoskeletal Disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12891-024-07983-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: This study aimed to assess the quality and readability of large language model-generated responses to frequently asked questions (FAQs) about Kienböck's disease (KD).

Methods: Nineteen FAQs about KD were selected, and the questions were divided into three categories: general knowledge, diagnosis, and treatment. The questions were inputted into the Chat Generative Pre-trained Transformer 4 (ChatGPT4) webpage using the zero-shot prompting method, and the responses were recorded. Hand surgeons with at least 5 years of experience and advanced English proficiency were individually contacted over instant WhatsApp messaging and requested to assess the responses. The quality of each response was analyzed by 33 experienced hand surgeons using the Global Quality Scale (GQS). The readability was assessed with the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES).

Results: The mean GQS score was 4.28 out of a maximum of 5 points. Most raters assessed the quality as good (270 of 627 responses; 43.1%) or excellent (260 of 627 responses; 41.5%). The mean FKGL was 15.5, and the mean FRES was 23.4, both of which are considered above the college graduate level. No statistically significant differences were found in the quality and readability of responses provided for questions related to general knowledge, diagnosis, and treatment.

Conclusions: ChatGPT-4 provided high-quality responses to FAQs about KD. However, the primary drawback was the poor readability of these responses. By improving the readability of ChatGPT's output, we can transform it into a valuable information resource for individuals with KD.

Level of evidence: Level IV, Observational study.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对有关基恩博克病的常见问题,由大型语言模型生成的回答具有较高的准确性,但可读性有限。
背景:本研究旨在评估由大型语言模型生成的有关基恩博克病(KD)常见问题(FAQs)的回答质量和可读性:本研究旨在评估大语言模型生成的有关基恩博克病(KD)的常见问题(FAQs)回答的质量和可读性:选择了 19 个有关 KD 的常见问题,并将问题分为三类:常识、诊断和治疗。使用零点提示法将问题输入到 Chat Generative Pre-trained Transformer 4(ChatGPT4)网页中,并记录回答情况。我们通过即时 WhatsApp 消息单独联系了至少有 5 年经验且英语水平较高的手外科医生,并要求他们对回答进行评估。33 名经验丰富的手外科医生使用全球质量量表 (GQS) 分析了每个回答的质量。可读性采用弗莱什-金凯德等级评分(FKGL)和弗莱什阅读容易程度评分(FRES)进行评估:结果:GQS 平均分为 4.28 分(满分 5 分)。大多数评分者认为质量良好(627 份答卷中的 270 份;43.1%)或优秀(627 份答卷中的 260 份;41.5%)。FKGL 平均值为 15.5,FRES 平均值为 23.4,均高于大学毕业生水平。在常识、诊断和治疗相关问题的回答质量和可读性方面,没有发现明显的统计学差异:结论:ChatGPT-4 为有关 KD 的常见问题提供了高质量的回答。结论:ChatGPT-4 提供了高质量的 KD 常见问题回复,但其主要缺点是回复的可读性较差。通过提高 ChatGPT 输出的可读性,我们可以将其转化为对 KD 患者有价值的信息资源:证据等级:IV 级,观察性研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Musculoskeletal Disorders
BMC Musculoskeletal Disorders 医学-风湿病学
CiteScore
3.80
自引率
8.70%
发文量
1017
审稿时长
3-6 weeks
期刊介绍: BMC Musculoskeletal Disorders is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of musculoskeletal disorders, as well as related molecular genetics, pathophysiology, and epidemiology. The scope of the Journal covers research into rheumatic diseases where the primary focus relates specifically to a component(s) of the musculoskeletal system.
期刊最新文献
Associations between folate intake and knee pain, inflammation mediators and comorbid conditions in patients with symptomatic knee osteoarthritis. Correction: Early versus delayed mobilization for arthroscopic rotator cuff repair (small to large sized tear): a meta-analysis of randomized controlled trials. Correlations of strength, proprioception, and dynamic balance to the Cumberland Ankle Instability Tool Score among patients with chronic ankle instability: a cross-sectional study. Effects of two posterior procedures for treatment of cervical hyperextension injury with multilevel spinal stenosis: A retrospective study. The association of different types of stress, and stress accumulation with low back pain in call-center workers - a cross-sectional observational study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1