Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES DIGITAL HEALTH Pub Date : 2024-08-28 eCollection Date: 2024-01-01 DOI:10.1177/20552076241278692
Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz
{"title":"Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.","authors":"Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz","doi":"10.1177/20552076241278692","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.</p><p><strong>Methods: </strong>ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).</p><p><strong>Results: </strong>No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.</p><p><strong>Conclusion: </strong>In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241278692","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.

Methods: ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).

Results: No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.

Conclusion: In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估在多学科内分泌学会议上讨论的甲状旁腺功能亢进患者 ChatGPT-4 反应的准确性和质量。
目的:聊天生成预训练变换器(ChatGPT)目前已被用于医疗保健的各个领域,以获取与医疗保健相关问题的答案并评估可用信息。原发性甲状旁腺功能亢进症是一种常见的内分泌疾病。我们的目的是评估 ChatGPT 对多学科内分泌学会议上讨论的甲状旁腺功能亢进病例的具体问题所做回答的准确性和质量:方法:要求 ChatGPT-4 回答多学科内分泌学会议上评估的 10 个甲状旁腺功能亢进病例。由两名内分泌专家对回复的准确性、完整性和质量进行独立评分。准确性和完整性采用李克特量表进行评估,质量采用总体质量量表(GQS)进行评估:结果:在回答中没有发现误导性信息。在诊断方面,准确性的平均得分(从 1 到 5 分不等)为 4.9 ± 0.1,完整性的平均得分(从 1 到 3 分不等)为 3.0。在进一步检查方面,准确性和完整性的平均得分分别为 4.8 ± 0.13 和 2.6 ± 0.16。治疗建议的平均准确度和完整度分别为 4.9 ± 0.1 分和 2.4 ± 0.16 分。GQS评估结果为80%高质量,20%中等质量:在这项研究中,ChatGPT-4 在回答甲状旁腺功能亢进症患者的问题时,准确率和质量普遍较高。由此可以得出结论,人工智能可能会成为医疗保健领域的一种有价值的工具。不过,也应评估 ChatGPT 的局限性和风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
期刊最新文献
A feasibility study on utilizing machine learning technology to reduce the costs of gastric cancer screening in Taizhou, China. Ageing well with tech: Exploring the determinants of e-healthcare services adoption in an emerging economy. Chinese colposcopists' attitudes toward the colposcopic artificial intelligence auxiliary diagnostic system (CAIADS): A nation-wide, multi-center survey. Digital leadership: Norwegian healthcare managers' attitudes towards using digital tools. Disease characteristics influence the privacy calculus to adopt electronic health records: A survey study in Germany.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1