评估作为最常见 PET-CT 扫描患者信息资源的聊天机器人回复的可靠性和可读性。

N. Aydinbelge-Dizdar , K. Dizdar
{"title":"评估作为最常见 PET-CT 扫描患者信息资源的聊天机器人回复的可靠性和可读性。","authors":"N. Aydinbelge-Dizdar ,&nbsp;K. Dizdar","doi":"10.1016/j.remnie.2024.500065","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, ‘ChatGPT-4.0’ and ‘Google Gemini’, to potential patient questions about PET/CT scans.</div></div><div><h3>Materials and methods</h3><div>Thirty potential questions for each of [<sup>18</sup>F]FDG and [<sup>68</sup>Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [<sup>68</sup>Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed.</div></div><div><h3>Results</h3><div>The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2–4], 3 [3–4], 3 [3–4] for ChatPT-4 and 4 [2–5], 4 [2–5], 3.5 [3–5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3–5], 3 [3–4], 3 [2–3] for ChatGPT-4, and 4 [3–5], 4 [3–5], 4 [3–5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2–4], 2 [2–4], 3 [2–4] for ChatGPT-4, and 3 [2–5], 3 [1–5], 3 [2–5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32−0,812), 0.707 (95% CI = 0.458−0.853) and 0.738 (95% CI = 0.519−0.866), respectively (p &lt; 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677−0.910), 0.881 (95% CI = 0.78−0.94) and 0.847 (95% CI = 0.719−0.922), respectively (p &lt; 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p &lt; 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p &lt; 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses.</div></div><div><h3>Conclusion</h3><div>There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.</div></div>","PeriodicalId":94197,"journal":{"name":"Revista espanola de medicina nuclear e imagen molecular","volume":"44 1","pages":"Article 500065"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes\",\"authors\":\"N. Aydinbelge-Dizdar ,&nbsp;K. Dizdar\",\"doi\":\"10.1016/j.remnie.2024.500065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, ‘ChatGPT-4.0’ and ‘Google Gemini’, to potential patient questions about PET/CT scans.</div></div><div><h3>Materials and methods</h3><div>Thirty potential questions for each of [<sup>18</sup>F]FDG and [<sup>68</sup>Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [<sup>68</sup>Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed.</div></div><div><h3>Results</h3><div>The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2–4], 3 [3–4], 3 [3–4] for ChatPT-4 and 4 [2–5], 4 [2–5], 3.5 [3–5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3–5], 3 [3–4], 3 [2–3] for ChatGPT-4, and 4 [3–5], 4 [3–5], 4 [3–5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2–4], 2 [2–4], 3 [2–4] for ChatGPT-4, and 3 [2–5], 3 [1–5], 3 [2–5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32−0,812), 0.707 (95% CI = 0.458−0.853) and 0.738 (95% CI = 0.519−0.866), respectively (p &lt; 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677−0.910), 0.881 (95% CI = 0.78−0.94) and 0.847 (95% CI = 0.719−0.922), respectively (p &lt; 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p &lt; 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p &lt; 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses.</div></div><div><h3>Conclusion</h3><div>There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.</div></div>\",\"PeriodicalId\":94197,\"journal\":{\"name\":\"Revista espanola de medicina nuclear e imagen molecular\",\"volume\":\"44 1\",\"pages\":\"Article 500065\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Revista espanola de medicina nuclear e imagen molecular\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2253808924000934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista espanola de medicina nuclear e imagen molecular","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2253808924000934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在评估两个流行的人工智能聊天机器人 "ChatGPT-4.0 "和 "谷歌双子座 "对患者可能提出的 PET/CT 扫描问题所做回答的可靠性和可读性:2024 年 5 月,向 ChatGPT-4 和 Gemini 分别提出了 30 个关于 [18F]FDG 和 [68Ga]Ga-DOTA-STR PET/CT 的潜在问题,以及 29 个关于 [68Ga]Ga-PSMA PET/CT 的潜在问题。使用改良 DISCERN(mDISCERN)量表、Flesch Reading Ease(FRE)、Gunning Fog Index(GFI)和 Flesch-Kincaid Reading Grade Level(FKRGL)对回答的可靠性和可读性进行了评估。对三位评分者(ChatGPT-4、Gemini 和一位核医学医生)提供的 mDISCERN 评分的评分者间可靠性进行了评估:结果:医生对 FDG、PSMA 和 DOTA PET/CT 扫描答复的 mDISCERN 评分中位数[最小-最大]分别为:ChatPT-4 为 3.5 [2-4]、3 [3-4]、3 [3-4];Gemini 为 4 [2-5]、4 [2-5]、3.5 [3-5]。使用 ChatGPT-4 评估有关 FDG、PSMA 和 DOTA-SSTR PET/CT 扫描的 mDISCERN 分数,ChatGPT-4 分别为 3.5 [3-5]、3 [3-4]、3 [2-3],Gemini 分别为 4 [3-5]、4 [3-5]、4 [3-5]。使用 Gemini 评估 FDG、PSMA 和 DOTA-SSTR PET/CT 反应的 mDISCERN 分数,ChatGPT-4 分别为 3 [2-4]、2 [2-4]、3 [2-4],Gemini 分别为 3 [2-5]、3 [1-5]、3 [2-5]。ChatGPT-4 对 FDG、PSMA 和 DOTA-SSTR PET/CT 扫描反应的 mDISCERN 评分的评分者间可靠性相关系数分别为 0.629(95% CI = 0,32-0,812)、0.707(95% CI = 0.458-0.853)和 0.738(95% CI = 0.519-0.866)(P 结论):评分者之间对 mDISCERN 分数的一致性达到了可接受的水平,这表明他们对回答的整体可靠性表示同意。然而,人工智能聊天机器人提供的信息不容易被公众读取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes

Purpose

This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, ‘ChatGPT-4.0’ and ‘Google Gemini’, to potential patient questions about PET/CT scans.

Materials and methods

Thirty potential questions for each of [18F]FDG and [68Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [68Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed.

Results

The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2–4], 3 [3–4], 3 [3–4] for ChatPT-4 and 4 [2–5], 4 [2–5], 3.5 [3–5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3–5], 3 [3–4], 3 [2–3] for ChatGPT-4, and 4 [3–5], 4 [3–5], 4 [3–5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2–4], 2 [2–4], 3 [2–4] for ChatGPT-4, and 3 [2–5], 3 [1–5], 3 [2–5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32−0,812), 0.707 (95% CI = 0.458−0.853) and 0.738 (95% CI = 0.519−0.866), respectively (p < 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677−0.910), 0.881 (95% CI = 0.78−0.94) and 0.847 (95% CI = 0.719−0.922), respectively (p < 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p < 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p < 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses.

Conclusion

There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The synergic effect of Multiparametric MRI and [18F]PSMA-1007 PET/CT imaging in recurrence work-up of locally advanced prostate adenocarcinoma. Correlation between metabolic response determined with [18F]FDG PET/CT and pathological response after neoadjuvant treatment and surgery in patients with esophageal cancer. Rare imaging features of adult chronic recurrent multifocal osteomyelitis on PET/CT. "Utility of PET/CT with [18F] F-fluorocholine in assessing the response to antiandrogenic therapy in patients with prostate cancer." Atypical muscle metastatic dissemination detected by [18F]FDG PET/CT in high-grade retroperitoneal leiomyosarcoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1