大语言模型的临床应用潜力:基于甲状腺结节的研究。

IF 3.7 3区 医学 Q2 Medicine Endocrine Pub Date : 2025-01-01 Epub Date: 2024-07-30 DOI:10.1007/s12020-024-03981-3
Shujun Xia, Qing Hua, Zihan Mei, Wenwen Xu, Limei Lai, Minyan Wei, Yu Qin, Lin Luo, Changhua Wang, ShengNan Huo, Lijun Fu, Feidu Zhou, Jiang Wu, Li Zhang, De Lv, Jianxin Li, Xin Wang, Ning Li, Yanyan Song, Jianqiao Zhou
{"title":"大语言模型的临床应用潜力:基于甲状腺结节的研究。","authors":"Shujun Xia, Qing Hua, Zihan Mei, Wenwen Xu, Limei Lai, Minyan Wei, Yu Qin, Lin Luo, Changhua Wang, ShengNan Huo, Lijun Fu, Feidu Zhou, Jiang Wu, Li Zhang, De Lv, Jianxin Li, Xin Wang, Ning Li, Yanyan Song, Jianqiao Zhou","doi":"10.1007/s12020-024-03981-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Limited data indicated the performance of large language model (LLM) taking on the role of doctors. We aimed to investigate the potential for ChatGPT-3.5 and New Bing Chat acting as doctors using thyroid nodules as an example.</p><p><strong>Methods: </strong>A total of 145 patients with thyroid nodules were included for generating questions. Each question was entered into chatbot of ChatGPT-3.5 and New Bing Chat five times and five responses were acquired respectively. These responses were compared with answers given by five junior doctors. Responses from five senior doctors were regarded as gold standard. Accuracy and reproducibility of responses from ChatGPT-3.5 and New Bing Chat were evaluated.</p><p><strong>Results: </strong>The accuracy of ChatGPT-3.5 and New Bing Chat in answering Q2, Q3, Q5 were lower than that of junior doctors (all P < 0.05), while both LLMs were comparable to junior doctors when answering Q4 and Q6. In terms of \"high reproducibility and accuracy\", ChatGPT-3.5 outperformed New Bing Chat in Q1 and Q5 (P < 0.001 and P = 0.008, respectively), but showed no significant difference in Q2, Q3, Q4, and Q6 (P > 0.05 for all). New Bing Chat generated higher accuracy than ChatGPT-3.5 (72.41% vs 58.62%) (P = 0.003) in decision making of thyroid nodules, and both were less accurate than junior doctors (89.66%, P < 0.001 for both).</p><p><strong>Conclusions: </strong>The exploration of ChatGPT-3.5 and New Bing Chat in the diagnosis and management of thyroid nodules illustrates that LLMs currently demonstrate the potential for medical applications, but do not yet reach the clinical decision-making capacity of doctors.</p>","PeriodicalId":11572,"journal":{"name":"Endocrine","volume":" ","pages":"206-213"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clinical application potential of large language model: a study based on thyroid nodules.\",\"authors\":\"Shujun Xia, Qing Hua, Zihan Mei, Wenwen Xu, Limei Lai, Minyan Wei, Yu Qin, Lin Luo, Changhua Wang, ShengNan Huo, Lijun Fu, Feidu Zhou, Jiang Wu, Li Zhang, De Lv, Jianxin Li, Xin Wang, Ning Li, Yanyan Song, Jianqiao Zhou\",\"doi\":\"10.1007/s12020-024-03981-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Limited data indicated the performance of large language model (LLM) taking on the role of doctors. We aimed to investigate the potential for ChatGPT-3.5 and New Bing Chat acting as doctors using thyroid nodules as an example.</p><p><strong>Methods: </strong>A total of 145 patients with thyroid nodules were included for generating questions. Each question was entered into chatbot of ChatGPT-3.5 and New Bing Chat five times and five responses were acquired respectively. These responses were compared with answers given by five junior doctors. Responses from five senior doctors were regarded as gold standard. Accuracy and reproducibility of responses from ChatGPT-3.5 and New Bing Chat were evaluated.</p><p><strong>Results: </strong>The accuracy of ChatGPT-3.5 and New Bing Chat in answering Q2, Q3, Q5 were lower than that of junior doctors (all P < 0.05), while both LLMs were comparable to junior doctors when answering Q4 and Q6. In terms of \\\"high reproducibility and accuracy\\\", ChatGPT-3.5 outperformed New Bing Chat in Q1 and Q5 (P < 0.001 and P = 0.008, respectively), but showed no significant difference in Q2, Q3, Q4, and Q6 (P > 0.05 for all). New Bing Chat generated higher accuracy than ChatGPT-3.5 (72.41% vs 58.62%) (P = 0.003) in decision making of thyroid nodules, and both were less accurate than junior doctors (89.66%, P < 0.001 for both).</p><p><strong>Conclusions: </strong>The exploration of ChatGPT-3.5 and New Bing Chat in the diagnosis and management of thyroid nodules illustrates that LLMs currently demonstrate the potential for medical applications, but do not yet reach the clinical decision-making capacity of doctors.</p>\",\"PeriodicalId\":11572,\"journal\":{\"name\":\"Endocrine\",\"volume\":\" \",\"pages\":\"206-213\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Endocrine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s12020-024-03981-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12020-024-03981-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

背景:有限的数据显示了大型语言模型(LLM)扮演医生角色的性能。我们旨在以甲状腺结节为例,研究 ChatGPT-3.5 和 New Bing Chat 扮演医生角色的潜力:方法:共有 145 名甲状腺结节患者参与了问题生成。每个问题在 ChatGPT-3.5 和 New Bing Chat 的聊天机器人中输入五次,并分别获得五次回复。这些回答与五位初级医生的回答进行了比较。五位资深医生的回答被视为黄金标准。对来自 ChatGPT-3.5 和 New Bing Chat 的回答的准确性和可重复性进行了评估:结果:ChatGPT-3.5 和 New Bing Chat 在回答 Q2、Q3 和 Q5 时的准确性低于初级医生(均为 P 0.05)。在甲状腺结节的决策方面,新版必应聊天工具的准确率高于 ChatGPT-3.5(72.41% vs 58.62%)(P = 0.003),而两者的准确率均低于初级医生(89.66%,P 结论:ChatGPT-3.5 和新版必应聊天工具在甲状腺结节的决策方面均有较高的准确率,但两者的准确率均低于初级医生(P = 0.05):ChatGPT-3.5 和 New Bing Chat 在甲状腺结节诊断和管理方面的探索表明,LLM 目前显示出医疗应用的潜力,但尚未达到医生的临床决策能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Clinical application potential of large language model: a study based on thyroid nodules.

Background: Limited data indicated the performance of large language model (LLM) taking on the role of doctors. We aimed to investigate the potential for ChatGPT-3.5 and New Bing Chat acting as doctors using thyroid nodules as an example.

Methods: A total of 145 patients with thyroid nodules were included for generating questions. Each question was entered into chatbot of ChatGPT-3.5 and New Bing Chat five times and five responses were acquired respectively. These responses were compared with answers given by five junior doctors. Responses from five senior doctors were regarded as gold standard. Accuracy and reproducibility of responses from ChatGPT-3.5 and New Bing Chat were evaluated.

Results: The accuracy of ChatGPT-3.5 and New Bing Chat in answering Q2, Q3, Q5 were lower than that of junior doctors (all P < 0.05), while both LLMs were comparable to junior doctors when answering Q4 and Q6. In terms of "high reproducibility and accuracy", ChatGPT-3.5 outperformed New Bing Chat in Q1 and Q5 (P < 0.001 and P = 0.008, respectively), but showed no significant difference in Q2, Q3, Q4, and Q6 (P > 0.05 for all). New Bing Chat generated higher accuracy than ChatGPT-3.5 (72.41% vs 58.62%) (P = 0.003) in decision making of thyroid nodules, and both were less accurate than junior doctors (89.66%, P < 0.001 for both).

Conclusions: The exploration of ChatGPT-3.5 and New Bing Chat in the diagnosis and management of thyroid nodules illustrates that LLMs currently demonstrate the potential for medical applications, but do not yet reach the clinical decision-making capacity of doctors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Endocrine
Endocrine 医学-内分泌学与代谢
CiteScore
6.40
自引率
5.40%
发文量
0
期刊介绍: Well-established as a major journal in today’s rapidly advancing experimental and clinical research areas, Endocrine publishes original articles devoted to basic (including molecular, cellular and physiological studies), translational and clinical research in all the different fields of endocrinology and metabolism. Articles will be accepted based on peer-reviews, priority, and editorial decision. Invited reviews, mini-reviews and viewpoints on relevant pathophysiological and clinical topics, as well as Editorials on articles appearing in the Journal, are published. Unsolicited Editorials will be evaluated by the editorial team. Outcomes of scientific meetings, as well as guidelines and position statements, may be submitted. The Journal also considers special feature articles in the field of endocrine genetics and epigenetics, as well as articles devoted to novel methods and techniques in endocrinology. Endocrine covers controversial, clinical endocrine issues. Meta-analyses on endocrine and metabolic topics are also accepted. Descriptions of single clinical cases and/or small patients studies are not published unless of exceptional interest. However, reports of novel imaging studies and endocrine side effects in single patients may be considered. Research letters and letters to the editor related or unrelated to recently published articles can be submitted. Endocrine covers leading topics in endocrinology such as neuroendocrinology, pituitary and hypothalamic peptides, thyroid physiological and clinical aspects, bone and mineral metabolism and osteoporosis, obesity, lipid and energy metabolism and food intake control, insulin, Type 1 and Type 2 diabetes, hormones of male and female reproduction, adrenal diseases pediatric and geriatric endocrinology, endocrine hypertension and endocrine oncology.
期刊最新文献
Nonlinear association between liver fat content and lumbar bone mineral density in overweight and obese individuals: evidence from a large-scale health screening data in China. Stress regulatory hormones and cancer: the contribution of epinephrine and cancer therapeutic value of beta blockers. Capecitabine and temozolomide or temozolomide alone in patients with atypical carcinoids. Impact of psychiatric disorders on the risk of diabetic ketoacidosis in adults with type 1 diabetes mellitus: a propensity score matching case-control study. Association between platelet-to-lymphocyte ratio and immune checkpoint inhibitor-induced thyroid dysfunction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1