{"title":"研究大型语言模型在屈光手术问题中的作用。","authors":"Suleyman Demir","doi":"10.1016/j.ijmedinf.2025.105787","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Large language models (LLMs) are becoming increasingly popular and are playing an important role in providing accurate clinical information to both patients and physicians. This study aimed to investigate the effectiveness of ChatGPT-4.0, Google Gemini, and Microsoft Copilot LLMs for responding to patient questions regarding refractive surgery.</div></div><div><h3>Methods</h3><div>The LLMs’ responses to 25 questions about refractive surgery, which are frequently asked by patients, were evaluated by two ophthalmologists using a 5-point Likert scale, with scores ranging from 1 to 5. Furthermore, the DISCERN scale was used to assess the reliability of the language models’ responses, whereas the Flesch Reading Ease and Flesch–Kincaid Grade Level indices were used to evaluate readability.</div></div><div><h3>Results</h3><div>Significant differences were found among all three LLMs in the Likert scores (p = 0.022). Pairwise comparisons revealed that ChatGPT-4.0′s Likert score was significantly higher than that of Microsoft Copilot, while no significant difference was found when compared to Google Gemini (p = 0.005 and p = 0.087, respectively). In terms of reliability, ChatGPT-4.0 stood out, receiving the highest DISCERN scores among the three LLMs. However, in terms of readability, ChatGPT-4.0 received the lowest score.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4.0′s responses to inquiries regarding refractive surgery were more intricate for patients compared to other language models; however, the information provided was more dependable and accurate.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105787"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating the role of large language models on questions about refractive surgery\",\"authors\":\"Suleyman Demir\",\"doi\":\"10.1016/j.ijmedinf.2025.105787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Large language models (LLMs) are becoming increasingly popular and are playing an important role in providing accurate clinical information to both patients and physicians. This study aimed to investigate the effectiveness of ChatGPT-4.0, Google Gemini, and Microsoft Copilot LLMs for responding to patient questions regarding refractive surgery.</div></div><div><h3>Methods</h3><div>The LLMs’ responses to 25 questions about refractive surgery, which are frequently asked by patients, were evaluated by two ophthalmologists using a 5-point Likert scale, with scores ranging from 1 to 5. Furthermore, the DISCERN scale was used to assess the reliability of the language models’ responses, whereas the Flesch Reading Ease and Flesch–Kincaid Grade Level indices were used to evaluate readability.</div></div><div><h3>Results</h3><div>Significant differences were found among all three LLMs in the Likert scores (p = 0.022). Pairwise comparisons revealed that ChatGPT-4.0′s Likert score was significantly higher than that of Microsoft Copilot, while no significant difference was found when compared to Google Gemini (p = 0.005 and p = 0.087, respectively). In terms of reliability, ChatGPT-4.0 stood out, receiving the highest DISCERN scores among the three LLMs. However, in terms of readability, ChatGPT-4.0 received the lowest score.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4.0′s responses to inquiries regarding refractive surgery were more intricate for patients compared to other language models; however, the information provided was more dependable and accurate.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"195 \",\"pages\":\"Article 105787\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625000048\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000048","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Investigating the role of large language models on questions about refractive surgery
Background
Large language models (LLMs) are becoming increasingly popular and are playing an important role in providing accurate clinical information to both patients and physicians. This study aimed to investigate the effectiveness of ChatGPT-4.0, Google Gemini, and Microsoft Copilot LLMs for responding to patient questions regarding refractive surgery.
Methods
The LLMs’ responses to 25 questions about refractive surgery, which are frequently asked by patients, were evaluated by two ophthalmologists using a 5-point Likert scale, with scores ranging from 1 to 5. Furthermore, the DISCERN scale was used to assess the reliability of the language models’ responses, whereas the Flesch Reading Ease and Flesch–Kincaid Grade Level indices were used to evaluate readability.
Results
Significant differences were found among all three LLMs in the Likert scores (p = 0.022). Pairwise comparisons revealed that ChatGPT-4.0′s Likert score was significantly higher than that of Microsoft Copilot, while no significant difference was found when compared to Google Gemini (p = 0.005 and p = 0.087, respectively). In terms of reliability, ChatGPT-4.0 stood out, receiving the highest DISCERN scores among the three LLMs. However, in terms of readability, ChatGPT-4.0 received the lowest score.
Conclusions
ChatGPT-4.0′s responses to inquiries regarding refractive surgery were more intricate for patients compared to other language models; however, the information provided was more dependable and accurate.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.