白内障护理信息提供中的大型语言模型评估:定量比较。

IF 2.6 3区 医学 Q2 OPHTHALMOLOGY Ophthalmology and Therapy Pub Date : 2024-11-08 DOI:10.1007/s40123-024-01066-y
Zichang Su, Kai Jin, Hongkang Wu, Ziyao Luo, Andrzej Grzybowski, Juan Ye
{"title":"白内障护理信息提供中的大型语言模型评估:定量比较。","authors":"Zichang Su, Kai Jin, Hongkang Wu, Ziyao Luo, Andrzej Grzybowski, Juan Ye","doi":"10.1007/s40123-024-01066-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Cataracts are a significant cause of blindness. While individuals frequently turn to the Internet for medical advice, distinguishing reliable information can be challenging. Large language models (LLMs) have attracted attention for generating accurate, human-like responses that may be used for medical consultation. However, a comprehensive assessment of LLMs' accuracy within specific medical domains is still lacking.</p><p><strong>Methods: </strong>We compiled 46 commonly inquired questions related to cataract care, categorized into six domains. Each question was presented to the LLMs, and three consultant-level ophthalmologists independently assessed the accuracy of their responses on a three-point scale (poor, borderline, good) and their comprehensiveness on a five-point scale. A majority consensus approach established the final rating for each response. Responses rated as 'Poor' were prompted for self-correction and reassessed.</p><p><strong>Results: </strong>For accuracy, ChatGPT-4o and Google Bard both achieved average sum scores of 8.7 (out of 9), followed by ChatGPT-3.5, Bing Chat, Llama 2, and Wenxin Yiyan. In consensus-based ratings, ChatGPT-4o outperformed Google Bard in the 'Good' rating. For completeness, ChatGPT-4o had the highest average sum score of 13.22 (out of 15), followed by Google Bard, ChatGPT-3.5, Llama 2, Bing Chat, and Wenxin Yiyan. Detailed performance data reveal nuanced differences in model capabilities. In the 'Prevention' domain, apart from Wenxin Yiyan, all other models were rated as 'Good'. All models showed improvement in self-correction. Bard and Bing improved 1/1 from 'Poor' to better, Llama improved 3/4, and Wenxin Yiyan improved 4/5.</p><p><strong>Conclusions: </strong>Our findings emphasize the potential of LLMs, particularly ChatGPT-4o, to deliver accurate and comprehensive responses to cataract-related queries, especially in prevention, indicating potential for medical consultation. Continuous efforts to enhance LLMs' accuracy through ongoing strategies and evaluations are essential.</p>","PeriodicalId":19623,"journal":{"name":"Ophthalmology and Therapy","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.\",\"authors\":\"Zichang Su, Kai Jin, Hongkang Wu, Ziyao Luo, Andrzej Grzybowski, Juan Ye\",\"doi\":\"10.1007/s40123-024-01066-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Cataracts are a significant cause of blindness. While individuals frequently turn to the Internet for medical advice, distinguishing reliable information can be challenging. Large language models (LLMs) have attracted attention for generating accurate, human-like responses that may be used for medical consultation. However, a comprehensive assessment of LLMs' accuracy within specific medical domains is still lacking.</p><p><strong>Methods: </strong>We compiled 46 commonly inquired questions related to cataract care, categorized into six domains. Each question was presented to the LLMs, and three consultant-level ophthalmologists independently assessed the accuracy of their responses on a three-point scale (poor, borderline, good) and their comprehensiveness on a five-point scale. A majority consensus approach established the final rating for each response. Responses rated as 'Poor' were prompted for self-correction and reassessed.</p><p><strong>Results: </strong>For accuracy, ChatGPT-4o and Google Bard both achieved average sum scores of 8.7 (out of 9), followed by ChatGPT-3.5, Bing Chat, Llama 2, and Wenxin Yiyan. In consensus-based ratings, ChatGPT-4o outperformed Google Bard in the 'Good' rating. For completeness, ChatGPT-4o had the highest average sum score of 13.22 (out of 15), followed by Google Bard, ChatGPT-3.5, Llama 2, Bing Chat, and Wenxin Yiyan. Detailed performance data reveal nuanced differences in model capabilities. In the 'Prevention' domain, apart from Wenxin Yiyan, all other models were rated as 'Good'. All models showed improvement in self-correction. Bard and Bing improved 1/1 from 'Poor' to better, Llama improved 3/4, and Wenxin Yiyan improved 4/5.</p><p><strong>Conclusions: </strong>Our findings emphasize the potential of LLMs, particularly ChatGPT-4o, to deliver accurate and comprehensive responses to cataract-related queries, especially in prevention, indicating potential for medical consultation. Continuous efforts to enhance LLMs' accuracy through ongoing strategies and evaluations are essential.</p>\",\"PeriodicalId\":19623,\"journal\":{\"name\":\"Ophthalmology and Therapy\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmology and Therapy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s40123-024-01066-y\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology and Therapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s40123-024-01066-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

引言白内障是导致失明的重要原因。虽然人们经常在互联网上寻求医疗建议,但辨别可靠信息的难度很大。大语言模型(LLMs)因能生成准确的、类似人类的回复而备受关注,这些回复可用于医疗咨询。然而,目前仍缺乏对大语言模型在特定医疗领域准确性的全面评估:我们汇编了 46 个与白内障护理相关的常见问题,并将其分为六个领域。我们将每个问题都呈现给当地联络员,然后由三位眼科顾问级别的医生按照三分制(差、边缘、好)和五分制独立评估他们回答问题的准确性和全面性。每个回答的最终评分由多数共识法确定。被评为 "差 "的回答会被提示进行自我纠正并重新评估:在准确性方面,ChatGPT-4o 和 Google Bard 的平均总分为 8.7(满分 9 分),其次是 ChatGPT-3.5、Bing Chat、Llama 2 和文心雕龙。在基于共识的评分中,ChatGPT-4o 在 "好 "的评分中优于 Google Bard。就完整性而言,ChatGPT-4o 的平均总得分最高,为 13.22(满分 15 分),其次是 Google Bard、ChatGPT-3.5、Llama 2、必应聊天和文信一言。详细的性能数据显示了模型能力的细微差别。在 "预防 "领域,除 "文心雕龙 "外,其他模型均被评为 "良好"。所有模型在自我修正方面都有所改进。巴德和宾从 "差 "到更好的改进了 1/1,拉玛改进了 3/4,文心雕龙改进了 4/5:我们的研究结果强调了 LLM(尤其是 ChatGPT-4o)在准确、全面地回答白内障相关询问方面的潜力,尤其是在预防方面,这表明了其在医疗咨询方面的潜力。通过持续的策略和评估不断努力提高 LLMs 的准确性至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.

Introduction: Cataracts are a significant cause of blindness. While individuals frequently turn to the Internet for medical advice, distinguishing reliable information can be challenging. Large language models (LLMs) have attracted attention for generating accurate, human-like responses that may be used for medical consultation. However, a comprehensive assessment of LLMs' accuracy within specific medical domains is still lacking.

Methods: We compiled 46 commonly inquired questions related to cataract care, categorized into six domains. Each question was presented to the LLMs, and three consultant-level ophthalmologists independently assessed the accuracy of their responses on a three-point scale (poor, borderline, good) and their comprehensiveness on a five-point scale. A majority consensus approach established the final rating for each response. Responses rated as 'Poor' were prompted for self-correction and reassessed.

Results: For accuracy, ChatGPT-4o and Google Bard both achieved average sum scores of 8.7 (out of 9), followed by ChatGPT-3.5, Bing Chat, Llama 2, and Wenxin Yiyan. In consensus-based ratings, ChatGPT-4o outperformed Google Bard in the 'Good' rating. For completeness, ChatGPT-4o had the highest average sum score of 13.22 (out of 15), followed by Google Bard, ChatGPT-3.5, Llama 2, Bing Chat, and Wenxin Yiyan. Detailed performance data reveal nuanced differences in model capabilities. In the 'Prevention' domain, apart from Wenxin Yiyan, all other models were rated as 'Good'. All models showed improvement in self-correction. Bard and Bing improved 1/1 from 'Poor' to better, Llama improved 3/4, and Wenxin Yiyan improved 4/5.

Conclusions: Our findings emphasize the potential of LLMs, particularly ChatGPT-4o, to deliver accurate and comprehensive responses to cataract-related queries, especially in prevention, indicating potential for medical consultation. Continuous efforts to enhance LLMs' accuracy through ongoing strategies and evaluations are essential.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ophthalmology and Therapy
Ophthalmology and Therapy OPHTHALMOLOGY-
CiteScore
4.20
自引率
3.00%
发文量
157
审稿时长
6 weeks
期刊介绍: Aims and Scope Ophthalmology and Therapy is an international, open access, peer-reviewed (single-blind), and rapid publication journal. The scope of the journal is broad and will consider all scientifically sound research from preclinical, clinical (all phases), observational, real-world, and health outcomes research around the use of ophthalmological therapies, devices, and surgical techniques. The journal is of interest to a broad audience of pharmaceutical and healthcare professionals and publishes original research, reviews, case reports/series, trial protocols and short communications such as commentaries and editorials. Ophthalmology and Therapy will consider all scientifically sound research be it positive, confirmatory or negative data. Submissions are welcomed whether they relate to an international and/or a country-specific audience, something that is crucially important when researchers are trying to target more specific patient populations. This inclusive approach allows the journal to assist in the dissemination of quality research, which may be considered of insufficient interest by other journals. Rapid Publication The journal’s publication timelines aim for a rapid peer review of 2 weeks. If an article is accepted it will be published 3–4 weeks from acceptance. The rapid timelines are achieved through the combination of a dedicated in-house editorial team, who manage article workflow, and an extensive Editorial and Advisory Board who assist with peer review. This allows the journal to support the rapid dissemination of research, whilst still providing robust peer review. Combined with the journal’s open access model this allows for the rapid, efficient communication of the latest research and reviews, fostering the advancement of ophthalmic therapies. Open Access All articles published by Ophthalmology and Therapy are open access. Personal Service The journal’s dedicated in-house editorial team offer a personal “concierge service” meaning authors will always have an editorial contact able to update them on the status of their manuscript. The editorial team check all manuscripts to ensure that articles conform to the most recent COPE, GPP and ICMJE publishing guidelines. This supports the publication of ethically sound and transparent research. Digital Features and Plain Language Summaries Ophthalmology and Therapy offers a range of additional features designed to increase the visibility, readership and educational value of the journal’s content. Each article is accompanied by key summary points, giving a time-efficient overview of the content to a wide readership. Articles may be accompanied by plain language summaries to assist readers who have some knowledge of, but not in-depth expertise in, the area to understand the scientific content and overall implications of the article. The journal also provides the option to include various types of digital features including animated abstracts, video abstracts, slide decks, audio slides, instructional videos, infographics, podcasts and animations. All additional features are peer reviewed to the same high standard as the article itself. If you consider that your paper would benefit from the inclusion of a digital feature, please let us know. Our editorial team are able to create high-quality slide decks and infographics in-house, and video abstracts through our partner Research Square, and would be happy to assist in any way we can. For further information about digital features, please contact the journal editor (see ‘Contact the Journal’ for email address), and see the ‘Guidelines for digital features and plain language summaries’ document under ‘Submission guidelines’. For examples of digital features please visit our showcase page https://springerhealthcare.com/expertise/publishing-digital-features/ Publication Fees Upon acceptance of an article, authors will be required to pay the mandatory Rapid Service Fee of €5250/$6000/£4300. The journal will consider fee discounts and waivers for developing countries and this is decided on a case by case basis. Peer Review Process Upon submission, manuscripts are assessed by the editorial team to ensure they fit within the aims and scope of the journal and are also checked for plagiarism. All suitable submissions are then subject to a comprehensive single-blind peer review. Reviewers are selected based on their relevant expertise and publication history in the subject area. The journal has an extensive pool of editorial and advisory board members who have been selected to assist with peer review based on the afore-mentioned criteria. At least two extensive reviews are required to make the editorial decision, with the exception of some article types such as Commentaries, Editorials, and Letters which are generally reviewed by one member of the Editorial Board. Where reviewer recommendations are conflicted, the editorial board will be contacted for further advice and a presiding decision. Manuscripts are then either accepted, rejected or authors are required to make major or minor revisions (both reviewer comments and editorial comments may need to be addressed). Once a revised manuscript is re-submitted, it is assessed along with the responses to reviewer comments and if it has been adequately revised it will be accepted for publication. Accepted manuscripts are then copyedited and typeset by the production team before online publication. Appeals against decisions following peer review are considered on a case-by-case basis and should be sent to the journal editor. Preprints We encourage posting of preprints of primary research manuscripts on preprint servers, authors’ or institutional websites, and open communications between researchers whether on community preprint servers or preprint commenting platforms. Posting of preprints is not considered prior publication and will not jeopardize consideration in our journals. Authors should disclose details of preprint posting during the submission process or at any other point during consideration in one of our journals. Once the manuscript is published, it is the author’s responsibility to ensure that the preprint record is updated with a publication reference, including the DOI and a URL link to the published version of the article on the journal website. Please follow the link for further information on preprint sharing: https://www.springer.com/gp/authors-editors/journal-author/journal-author-helpdesk/submission/1302#c16721550 Copyright Ophthalmology and Therapy''s content is published open access under the Creative Commons Attribution-Noncommercial License, which allows users to read, copy, distribute, and make derivative works for non-commercial purposes from the material, as long as the author of the original work is cited. The author assigns the exclusive right to any commercial use of the article to Springer. For more information about the Creative Commons Attribution-Noncommercial License, click here: http://creativecommons.org/licenses/by-nc/4.0. Contact For more information about the journal, including pre-submission enquiries, please contact christopher.vautrinot@springer.com.
期刊最新文献
Comparison of Corneal Epitheliotrophic Factors of Undiluted Autologous Platelet-Rich Plasma and Autologous Serum Eye Drops for Dry Eye Disease. A Multicenter Study on Clinical Outcomes of Simultaneous Implantable Collamer Lens Removal and Phacoemulsification with Intraocular Lens Implantation in Eyes Developing Cataract. Randomized Clinical Trial of Intraocular Pressure-Lowering Medications on Preventing Spikes in Intraocular Pressure Following Intravitreal Anti-Vascular Endothelial Growth Factor Injections. Long-Term Treatment Outcomes of Micropulse Transscleral Cyclophotocoagulation in Primary and Secondary Glaucoma: A 5-Year Analysis. The Relevance and Potential Role of Orbital Fat in Inflammatory Orbital Diseases: Implications for Diagnosis and Treatment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1