Qian Ling, Zi-Song Xu, Yan-Mei Zeng, Qi Hong, Xian-Zhe Qian, Jin-Yu Hu, Chong-Gang Pei, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Zhen-Kai Wu, Yi Shao
{"title":"Assessing the possibility of using large language models in ocular surface diseases.","authors":"Qian Ling, Zi-Song Xu, Yan-Mei Zeng, Qi Hong, Xian-Zhe Qian, Jin-Yu Hu, Chong-Gang Pei, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Zhen-Kai Wu, Yi Shao","doi":"10.18240/ijo.2025.01.01","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova.</p><p><strong>Methods: </strong>A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated.</p><p><strong>Results: </strong>GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time.</p><p><strong>Conclusion: </strong>GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.</p>","PeriodicalId":14312,"journal":{"name":"International journal of ophthalmology","volume":"18 1","pages":"1-8"},"PeriodicalIF":1.9000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672086/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.18240/ijo.2025.01.01","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova.
Methods: A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated.
Results: GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time.
Conclusion: GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.
期刊介绍:
· International Journal of Ophthalmology-IJO (English edition) is a global ophthalmological scientific publication
and a peer-reviewed open access periodical (ISSN 2222-3959 print, ISSN 2227-4898 online).
This journal is sponsored by Chinese Medical Association Xi’an Branch and obtains guidance and support from
WHO and ICO (International Council of Ophthalmology). It has been indexed in SCIE, PubMed,
PubMed-Central, Chemical Abstracts, Scopus, EMBASE , and DOAJ. IJO JCR IF in 2017 is 1.166.
IJO was established in 2008, with editorial office in Xi’an, China. It is a monthly publication. General Scientific
Advisors include Prof. Hugh Taylor (President of ICO); Prof.Bruce Spivey (Immediate Past President of ICO);
Prof.Mark Tso (Ex-Vice President of ICO) and Prof.Daiming Fan (Academician and Vice President,
Chinese Academy of Engineering.
International Scientific Advisors include Prof. Serge Resnikoff (WHO Senior Speciatist for Prevention of
blindness), Prof. Chi-Chao Chan (National Eye Institute, USA) and Prof. Richard L Abbott (Ex-President of
AAO/PAAO) et al.
Honorary Editors-in-Chief: Prof. Li-Xin Xie(Academician of Chinese Academy of
Engineering/Honorary President of Chinese Ophthalmological Society); Prof. Dennis Lam (President of APAO) and
Prof. Xiao-Xin Li (Ex-President of Chinese Ophthalmological Society).
Chief Editor: Prof. Xiu-Wen Hu (President of IJO Press).
Editors-in-Chief: Prof. Yan-Nian Hui (Ex-Director, Eye Institute of Chinese PLA) and
Prof. George Chiou (Founding chief editor of Journal of Ocular Pharmacology & Therapeutics).
Associate Editors-in-Chief include:
Prof. Ning-Li Wang (President Elect of APAO);
Prof. Ke Yao (President of Chinese Ophthalmological Society) ;
Prof.William Smiddy (Bascom Palmer Eye instituteUSA) ;
Prof.Joel Schuman (President of Association of University Professors of Ophthalmology,USA);
Prof.Yizhi Liu (Vice President of Chinese Ophtlalmology Society);
Prof.Yu-Sheng Wang (Director of Eye Institute of Chinese PLA);
Prof.Ling-Yun Cheng (Director of Ocular Pharmacology, Shiley Eye Center, USA).
IJO accepts contributions in English from all over the world. It includes mainly original articles and review articles,
both basic and clinical papers.
Instruction is Welcome Contribution is Welcome Citation is Welcome
Cooperation organization
International Council of Ophthalmology(ICO), PubMed, PMC, American Academy of Ophthalmology, Asia-Pacific, Thomson Reuters, The Charlesworth Group, Crossref,Scopus,Publons, DOAJ etc.