Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.

IF 2.9 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES DIGITAL HEALTH Pub Date : 2024-08-28 eCollection Date: 2024-01-01 DOI:10.1177/20552076241278692

Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz

{"title":"Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.","authors":"Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz","doi":"10.1177/20552076241278692","DOIUrl":null,"url":null,"abstract":"Purpose: Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.Methods: ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).Results: No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.Conclusion: In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241278692","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.

Methods: ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).

Results: No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.

Conclusion: In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估在多学科内分泌学会议上讨论的甲状旁腺功能亢进患者 ChatGPT-4 反应的准确性和质量。

目的：聊天生成预训练变换器（ChatGPT）目前已被用于医疗保健的各个领域，以获取与医疗保健相关问题的答案并评估可用信息。原发性甲状旁腺功能亢进症是一种常见的内分泌疾病。我们的目的是评估 ChatGPT 对多学科内分泌学会议上讨论的甲状旁腺功能亢进病例的具体问题所做回答的准确性和质量：方法：要求 ChatGPT-4 回答多学科内分泌学会议上评估的 10 个甲状旁腺功能亢进病例。由两名内分泌专家对回复的准确性、完整性和质量进行独立评分。准确性和完整性采用李克特量表进行评估，质量采用总体质量量表（GQS）进行评估：结果：在回答中没有发现误导性信息。在诊断方面，准确性的平均得分（从 1 到 5 分不等）为 4.9 ± 0.1，完整性的平均得分（从 1 到 3 分不等）为 3.0。在进一步检查方面，准确性和完整性的平均得分分别为 4.8 ± 0.13 和 2.6 ± 0.16。治疗建议的平均准确度和完整度分别为 4.9 ± 0.1 分和 2.4 ± 0.16 分。GQS评估结果为80%高质量，20%中等质量：在这项研究中，ChatGPT-4 在回答甲状旁腺功能亢进症患者的问题时，准确率和质量普遍较高。由此可以得出结论，人工智能可能会成为医疗保健领域的一种有价值的工具。不过，也应评估 ChatGPT 的局限性和风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

DIGITAL HEALTH Multiple-

CiteScore

2.90

自引率

7.70%

发文量

302