Inès Schumacher, Virginie Manuela Marie Bühler, Damian Jaggi, Janice Roth
{"title":"葡萄膜炎决策过程中的人工智能大语言模型。","authors":"Inès Schumacher, Virginie Manuela Marie Bühler, Damian Jaggi, Janice Roth","doi":"10.1186/s40942-024-00581-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Uveitis is the ophthalmic subfield dealing with a broad range of intraocular inflammatory diseases. With the raising importance of LLM such as ChatGPT and their potential use in the medical field, this research explores the strengths and weaknesses of its applicability in the subfield of uveitis.</p><p><strong>Methods: </strong>A series of highly clinically relevant questions were asked three consecutive times (attempts 1, 2 and 3) of the LLM regarding current uveitis cases. The answers were classified on whether they were accurate and sufficient, partially accurate and sufficient or inaccurate and insufficient. Statistical analysis included descriptive analysis, normality distribution, non-parametric test and reliability tests. References were checked for their correctness in different medical databases.</p><p><strong>Results: </strong>The data showed non-normal distribution. Data between subgroups (attempts 1, 2 and 3) was comparable (Kruskal-Wallis H test, p-value = 0.7338). There was a moderate agreement between attempt 1 and attempt 2 (Cohen's kappa, ĸ = 0.5172) as well as between attempt 2 and attempt 3 (Cohen's kappa, ĸ = 0.4913). There was a fair agreement between attempt 1 and attempt 3 (Cohen's kappa, ĸ = 0.3647). The average agreement was moderate (Cohen's kappa, ĸ = 0.4577). Between the three attempts together, there was a moderate agreement (Fleiss' kappa, ĸ = 0.4534). A total of 52 references were generated by the LLM. 22 references (42.3%) were found to be accurate and correctly cited. Another 22 references (42.3%) could not be located in any of the searched databases. The remaining 8 references (15.4%) were found to exist, but were either misinterpreted or incorrectly cited by the LLM.</p><p><strong>Conclusion: </strong>Our results demonstrate the significant potential of LLMs in uveitis. However, their implementation requires rigorous training and comprehensive testing for specific medical tasks. We also found out that the references made by ChatGPT 4.o were in most cases incorrect. LLMs are likely to become invaluable tools in shaping the future of ophthalmology, enhancing clinical decision-making and patient care.</p>","PeriodicalId":14289,"journal":{"name":"International Journal of Retina and Vitreous","volume":"10 1","pages":"63"},"PeriodicalIF":1.9000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11389245/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence derived large language model in decision-making process in uveitis.\",\"authors\":\"Inès Schumacher, Virginie Manuela Marie Bühler, Damian Jaggi, Janice Roth\",\"doi\":\"10.1186/s40942-024-00581-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Uveitis is the ophthalmic subfield dealing with a broad range of intraocular inflammatory diseases. With the raising importance of LLM such as ChatGPT and their potential use in the medical field, this research explores the strengths and weaknesses of its applicability in the subfield of uveitis.</p><p><strong>Methods: </strong>A series of highly clinically relevant questions were asked three consecutive times (attempts 1, 2 and 3) of the LLM regarding current uveitis cases. The answers were classified on whether they were accurate and sufficient, partially accurate and sufficient or inaccurate and insufficient. Statistical analysis included descriptive analysis, normality distribution, non-parametric test and reliability tests. References were checked for their correctness in different medical databases.</p><p><strong>Results: </strong>The data showed non-normal distribution. Data between subgroups (attempts 1, 2 and 3) was comparable (Kruskal-Wallis H test, p-value = 0.7338). There was a moderate agreement between attempt 1 and attempt 2 (Cohen's kappa, ĸ = 0.5172) as well as between attempt 2 and attempt 3 (Cohen's kappa, ĸ = 0.4913). There was a fair agreement between attempt 1 and attempt 3 (Cohen's kappa, ĸ = 0.3647). The average agreement was moderate (Cohen's kappa, ĸ = 0.4577). Between the three attempts together, there was a moderate agreement (Fleiss' kappa, ĸ = 0.4534). A total of 52 references were generated by the LLM. 22 references (42.3%) were found to be accurate and correctly cited. Another 22 references (42.3%) could not be located in any of the searched databases. The remaining 8 references (15.4%) were found to exist, but were either misinterpreted or incorrectly cited by the LLM.</p><p><strong>Conclusion: </strong>Our results demonstrate the significant potential of LLMs in uveitis. However, their implementation requires rigorous training and comprehensive testing for specific medical tasks. We also found out that the references made by ChatGPT 4.o were in most cases incorrect. LLMs are likely to become invaluable tools in shaping the future of ophthalmology, enhancing clinical decision-making and patient care.</p>\",\"PeriodicalId\":14289,\"journal\":{\"name\":\"International Journal of Retina and Vitreous\",\"volume\":\"10 1\",\"pages\":\"63\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11389245/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Retina and Vitreous\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40942-024-00581-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Retina and Vitreous","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40942-024-00581-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Artificial intelligence derived large language model in decision-making process in uveitis.
Background: Uveitis is the ophthalmic subfield dealing with a broad range of intraocular inflammatory diseases. With the raising importance of LLM such as ChatGPT and their potential use in the medical field, this research explores the strengths and weaknesses of its applicability in the subfield of uveitis.
Methods: A series of highly clinically relevant questions were asked three consecutive times (attempts 1, 2 and 3) of the LLM regarding current uveitis cases. The answers were classified on whether they were accurate and sufficient, partially accurate and sufficient or inaccurate and insufficient. Statistical analysis included descriptive analysis, normality distribution, non-parametric test and reliability tests. References were checked for their correctness in different medical databases.
Results: The data showed non-normal distribution. Data between subgroups (attempts 1, 2 and 3) was comparable (Kruskal-Wallis H test, p-value = 0.7338). There was a moderate agreement between attempt 1 and attempt 2 (Cohen's kappa, ĸ = 0.5172) as well as between attempt 2 and attempt 3 (Cohen's kappa, ĸ = 0.4913). There was a fair agreement between attempt 1 and attempt 3 (Cohen's kappa, ĸ = 0.3647). The average agreement was moderate (Cohen's kappa, ĸ = 0.4577). Between the three attempts together, there was a moderate agreement (Fleiss' kappa, ĸ = 0.4534). A total of 52 references were generated by the LLM. 22 references (42.3%) were found to be accurate and correctly cited. Another 22 references (42.3%) could not be located in any of the searched databases. The remaining 8 references (15.4%) were found to exist, but were either misinterpreted or incorrectly cited by the LLM.
Conclusion: Our results demonstrate the significant potential of LLMs in uveitis. However, their implementation requires rigorous training and comprehensive testing for specific medical tasks. We also found out that the references made by ChatGPT 4.o were in most cases incorrect. LLMs are likely to become invaluable tools in shaping the future of ophthalmology, enhancing clinical decision-making and patient care.
期刊介绍:
International Journal of Retina and Vitreous focuses on the ophthalmic subspecialty of vitreoretinal disorders. The journal presents original articles on new approaches to diagnosis, outcomes of clinical trials, innovations in pharmacological therapy and surgical techniques, as well as basic science advances that impact clinical practice. Topical areas include, but are not limited to: -Imaging of the retina, choroid and vitreous -Innovations in optical coherence tomography (OCT) -Small-gauge vitrectomy, retinal detachment, chromovitrectomy -Electroretinography (ERG), microperimetry, other functional tests -Intraocular tumors -Retinal pharmacotherapy & drug delivery -Diabetic retinopathy & other vascular diseases -Age-related macular degeneration (AMD) & other macular entities