评估大语言模型在结膜炎患者教育中的有效性

IF 3.7 2区 医学 Q1 OPHTHALMOLOGY British Journal of Ophthalmology Pub Date : 2024-08-30 DOI:10.1136/bjo-2024-325599
Jingyuan Wang, Runhan Shi, Qihua Le, Kun Shan, Zhi Chen, Xujiao Zhou, Yao He, Jiaxu Hong
{"title":"评估大语言模型在结膜炎患者教育中的有效性","authors":"Jingyuan Wang, Runhan Shi, Qihua Le, Kun Shan, Zhi Chen, Xujiao Zhou, Yao He, Jiaxu Hong","doi":"10.1136/bjo-2024-325599","DOIUrl":null,"url":null,"abstract":"Aims To evaluate the quality of responses from large language models (LLMs) to patient-generated conjunctivitis questions. Methods A two-phase, cross-sectional study was conducted at the Eye and ENT Hospital of Fudan University. In phase 1, four LLMs (GPT-4, Qwen, Baichuan 2 and PaLM 2) responded to 22 frequently asked conjunctivitis questions. Six expert ophthalmologists assessed these responses using a 5-point Likert scale for correctness, completeness, readability, helpfulness and safety, supplemented by objective readability analysis. Phase 2 involved 30 conjunctivitis patients who interacted with GPT-4 or Qwen, evaluating the LLM-generated responses based on satisfaction, humanisation, professionalism and the same dimensions except for correctness from phase 1. Three ophthalmologists assessed responses using phase 1 criteria, allowing for a comparative analysis between medical and patient evaluations, probing the study’s practical significance. Results In phase 1, GPT-4 excelled across all metrics, particularly in correctness (4.39±0.76), completeness (4.31±0.96) and readability (4.65±0.59) while Qwen showed similarly strong performance in helpfulness (4.37±0.93) and safety (4.25±1.03). Baichuan 2 and PaLM 2 were effective but trailed behind GPT-4 and Qwen. The objective readability analysis revealed GPT-4’s responses as the most detailed, with PaLM 2’s being the most succinct. Phase 2 demonstrated GPT-4 and Qwen’s robust performance, with high satisfaction levels and consistent evaluations from both patients and professionals. Conclusions Our study showed LLMs effectively improve patient education in conjunctivitis. These models showed considerable promise in real-world patient interactions. Despite encouraging results, further refinement, particularly in personalisation and handling complex inquiries, is essential prior to the clinical integration of these LLMs. All data relevant to the study are included in the article or uploaded as online supplemental information.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the effectiveness of large language models in patient education for conjunctivitis\",\"authors\":\"Jingyuan Wang, Runhan Shi, Qihua Le, Kun Shan, Zhi Chen, Xujiao Zhou, Yao He, Jiaxu Hong\",\"doi\":\"10.1136/bjo-2024-325599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aims To evaluate the quality of responses from large language models (LLMs) to patient-generated conjunctivitis questions. Methods A two-phase, cross-sectional study was conducted at the Eye and ENT Hospital of Fudan University. In phase 1, four LLMs (GPT-4, Qwen, Baichuan 2 and PaLM 2) responded to 22 frequently asked conjunctivitis questions. Six expert ophthalmologists assessed these responses using a 5-point Likert scale for correctness, completeness, readability, helpfulness and safety, supplemented by objective readability analysis. Phase 2 involved 30 conjunctivitis patients who interacted with GPT-4 or Qwen, evaluating the LLM-generated responses based on satisfaction, humanisation, professionalism and the same dimensions except for correctness from phase 1. Three ophthalmologists assessed responses using phase 1 criteria, allowing for a comparative analysis between medical and patient evaluations, probing the study’s practical significance. Results In phase 1, GPT-4 excelled across all metrics, particularly in correctness (4.39±0.76), completeness (4.31±0.96) and readability (4.65±0.59) while Qwen showed similarly strong performance in helpfulness (4.37±0.93) and safety (4.25±1.03). Baichuan 2 and PaLM 2 were effective but trailed behind GPT-4 and Qwen. The objective readability analysis revealed GPT-4’s responses as the most detailed, with PaLM 2’s being the most succinct. Phase 2 demonstrated GPT-4 and Qwen’s robust performance, with high satisfaction levels and consistent evaluations from both patients and professionals. Conclusions Our study showed LLMs effectively improve patient education in conjunctivitis. These models showed considerable promise in real-world patient interactions. Despite encouraging results, further refinement, particularly in personalisation and handling complex inquiries, is essential prior to the clinical integration of these LLMs. All data relevant to the study are included in the article or uploaded as online supplemental information.\",\"PeriodicalId\":9313,\"journal\":{\"name\":\"British Journal of Ophthalmology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/bjo-2024-325599\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-325599","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的 评估大语言模型(LLM)对患者提出的结膜炎问题的回答质量。方法 在复旦大学附属眼耳鼻喉科医院进行了一项分两个阶段的横断面研究。在第一阶段,四种 LLM(GPT-4、Qwen、Baichuan 2 和 PaLM 2)回答了 22 个结膜炎常见问题。六位眼科专家采用 5 分李克特量表对这些回答的正确性、完整性、可读性、有用性和安全性进行评估,并辅以客观的可读性分析。第 2 阶段有 30 名结膜炎患者与 GPT-4 或 Qwen 进行了互动,根据满意度、人性化、专业性以及除正确性外与第 1 阶段相同的维度对 LLM 生成的回复进行了评估。三位眼科医生根据第一阶段的标准对回答进行了评估,以便对医学和患者的评价进行比较分析,从而探究研究的实际意义。结果 在第 1 阶段,GPT-4 在所有指标上都表现出色,尤其是在正确性(4.39±0.76)、完整性(4.31±0.96)和可读性(4.65±0.59)方面,而 Qwen 在有用性(4.37±0.93)和安全性(4.25±1.03)方面表现同样出色。百川 2 号 "和 "PaLM 2 号 "虽然有效,但落后于 "GPT-4 号 "和 "Qwen 号"。客观可读性分析表明,GPT-4 的回答最为详细,而 PaLM 2 的回答最为简洁。第二阶段的研究表明,GPT-4 和 Qwen 表现出色,患者和专业人员的满意度很高,评价一致。结论 我们的研究表明,LLM 能有效改善结膜炎患者的教育。这些模型在现实世界的患者互动中显示出相当大的前景。尽管结果令人鼓舞,但在将这些 LLMs 应用于临床之前,还必须进一步完善,尤其是在个性化和处理复杂问题方面。与该研究相关的所有数据均包含在文章中或作为在线补充信息上传。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating the effectiveness of large language models in patient education for conjunctivitis
Aims To evaluate the quality of responses from large language models (LLMs) to patient-generated conjunctivitis questions. Methods A two-phase, cross-sectional study was conducted at the Eye and ENT Hospital of Fudan University. In phase 1, four LLMs (GPT-4, Qwen, Baichuan 2 and PaLM 2) responded to 22 frequently asked conjunctivitis questions. Six expert ophthalmologists assessed these responses using a 5-point Likert scale for correctness, completeness, readability, helpfulness and safety, supplemented by objective readability analysis. Phase 2 involved 30 conjunctivitis patients who interacted with GPT-4 or Qwen, evaluating the LLM-generated responses based on satisfaction, humanisation, professionalism and the same dimensions except for correctness from phase 1. Three ophthalmologists assessed responses using phase 1 criteria, allowing for a comparative analysis between medical and patient evaluations, probing the study’s practical significance. Results In phase 1, GPT-4 excelled across all metrics, particularly in correctness (4.39±0.76), completeness (4.31±0.96) and readability (4.65±0.59) while Qwen showed similarly strong performance in helpfulness (4.37±0.93) and safety (4.25±1.03). Baichuan 2 and PaLM 2 were effective but trailed behind GPT-4 and Qwen. The objective readability analysis revealed GPT-4’s responses as the most detailed, with PaLM 2’s being the most succinct. Phase 2 demonstrated GPT-4 and Qwen’s robust performance, with high satisfaction levels and consistent evaluations from both patients and professionals. Conclusions Our study showed LLMs effectively improve patient education in conjunctivitis. These models showed considerable promise in real-world patient interactions. Despite encouraging results, further refinement, particularly in personalisation and handling complex inquiries, is essential prior to the clinical integration of these LLMs. All data relevant to the study are included in the article or uploaded as online supplemental information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
2.40%
发文量
213
审稿时长
3-6 weeks
期刊介绍: The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.
期刊最新文献
Nyctohemeral effects of topical beta-adrenoceptor blocking agents measured with an intraocular telemetry sensor Intracellular dark spots are associated with endothelial cell loss after Descemet’s stripping automated endothelial keratoplasty At a glance Deep learning-based normative database of anterior chamber dimensions for angle closure assessment: the Singapore Chinese Eye Study In vivo lacrimal gland imaging artefact assessment based on swept-source optical coherence tomography for dry eye disease
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1